Re: [Sedna-discussion] poor performance for large collection

Олег Борисенко Mon, 14 Jan 2013 06:41:13 -0800

Yes, it would be great if you give us the collection, it's much easier to
debug with real data :) And we will try to discover the reason as soon as
possible.


P.S: Don't forget to send a copy to
sedna-discussion@lists.sourceforge.net(it's public) or
se...@ispras.ru (it's our private mail) please, it's much easier when all
the team sees what's going on. Thanks!

Best regards, Borisenko Oleg, Sedna team



On Mon, Jan 14, 2013 at 6:31 PM, Robby Pelssers <robby.pelss...@nxp.com>wrote:

>  Hi Oleg,****
>
> ** **
>
> I ran following profile statements:****
>
>
> **************************************************************************************************
> ****
>
> profile****
>
> for $i in index-scan("chemicalcontent_id", "", "GE")/@id return string($i)
> ****
>
> ** **
>
> which results in ****
>
> ** **
>
> <profile xmlns="http://www.modis.ispras.ru/sedna";>****
>
>   <total-time>14.941</total-time>****
>
> </profile>****
>
> <prolog xmlns="http://www.modis.ispras.ru/sedna"/>****
>
> <query xmlns="http://www.modis.ispras.ru/sedna";>****
>
>   <operation name="PPQueryRoot" time="14.941" calls="1">****
>
>     <operation name="PPReturn" position="2:5" time="14.910" calls="24250">
> ****
>
>       <produces>****
>
>         <variable descriptor="0" name="i"/>****
>
>       </produces>****
>
>       <operation name="PPDDO" position="2:54" time="14.678" calls="24250">
> ****
>
>         <operation name="PPAxisStep" step="attribute::attribute(id)"
> position="2:54" time="14.486" calls="24250">****
>
>           <operation name="PPSeqChecker" mode="node" position="2:54"
> time="1.043" calls="24250">****
>
>             <operation name="PPIndexScan" index-scan-condition="GE"
> position="2:11" time="1.040" calls="24250">****
>
>               <operation name="PPConst" type="xs:string"
> value="chemicalcontent_id" position="2:22" time="0.000" calls="2"/>****
>
>               <operation name="PPConst" type="xs:string" value=""
> position="2:44" time="0.000" calls="2"/>****
>
>               <operation name="PPConst" type="xs:integer" value="0"
> position="2:11" time="0.000" calls="0"/>****
>
>             </operation>****
>
>           </operation>****
>
>         </operation>****
>
>       </operation>****
>
>       <operation name="PPFnString" position="2:65" time="0.215"
> calls="48498">****
>
>         <operation name="PPVariable" descriptor="0" variable-name="i"
> position="2:72" time="0.008" calls="48498"/>****
>
>       </operation>****
>
>     </operation>****
>
>   </operation>****
>
> </query>****
>
>
> **************************************************************************************************
> ****
>
> I also ran following ****
>
> ** **
>
> profile****
>
> for $doc in collection("chemicalContent/released") return
> document-uri($doc)****
>
> ** **
>
> ** **
>
> resulting in ****
>
> ** **
>
> <profile xmlns="http://www.modis.ispras.ru/sedna";>****
>
>   <total-time>1.594</total-time>****
>
> </profile>****
>
> <prolog xmlns="http://www.modis.ispras.ru/sedna"/>****
>
> <query xmlns="http://www.modis.ispras.ru/sedna";>****
>
>   <operation name="PPQueryRoot" time="1.594" calls="1">****
>
>     <operation name="PPReturn" position="2:5" time="1.568" calls="24250">*
> ***
>
>       <produces>****
>
>         <variable descriptor="0" name="doc"/>****
>
>       </produces>****
>
>       <operation name="PPAbsPath"
> root="collection(chemicalContent/released)" position="2:13" time="1.247"
> calls="24250"/>****
>
>       <operation name="PPFnDocumentURI" position="2:59" time="0.302"
> calls="48498">****
>
>         <operation name="PPVariable" descriptor="0" variable-name="doc"
> position="2:72" time="0.012" calls="48498"/>****
>
>       </operation>****
>
>     </operation>****
>
>   </operation>****
>
> </query>****
>
> ** **
>
> ** **
>
> BUT !!! ****
>
> ** **
>
> Serializing and sending that data over the wire… takes like forever.. >> 1
> minute.  I know it’s like 24k strings in total but it still smells fishy to
> me to be honest.****
>
> ** **
>
> Thx upfront…  If you want I can actually zip the collection and make it
> available via dropbox so you can e.g. simulate my issue****
>
> ** **
>
> Robby****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Олег Борисенко [mailto:a...@somestuff.ru]
> *Sent:* Monday, January 14, 2013 2:16 PM
> *To:* Robby Pelssers
> *Subject:* Re: [Sedna-discussion] poor performance for large collection***
> *
>
> ** **
>
> It's difficult to say anything particular in that case. But we have one
> more diagnostic query named "profile", look through documentation here:
> http://www.sedna.org/progguide/ProgGuidesu10.html#x16-650002.7.4. ****
>
> Could you send us the output please?****
>
> ** **
>
> Best regards, Borisenko Oleg, Sedna team****
>
> On Fri, Jan 11, 2013 at 1:58 PM, Robby Pelssers <robby.pelss...@nxp.com>
> wrote:****
>
> Someone on the list gave me following tip:
> *********************************************************
> for $i in index-scan("chemicalcontent_id", "", "GE")/@id return string($i)
> - as a result, you will get the keyset of an index, maybe with duplicate
> keys (if they are presented in index), which can be removed with
> distinct-values function.
>
> Here, the blank key to compare with ( "" ) assumed to be less than any
> other key in index.
> *********************************************************
>
> But I tried that and it still is not responsive.  I think Sedna is not
> using the index only but still doing a full collection scan. Can someone
> shed some light on this?
>
> I was also looking a bit into the documentation for how to run an explain
> plan. But to my surprise I don't see anything back of index usage. It's
> something I would typically expect from an explain plan.
>
> http://www.sedna.org/progguide/ProgGuidesu10.html#x16-640002.7.3
>
>
>
> To give an example, I have following index and xquery module
> *******************************************************************
> create index "package_id"
>   on fn:collection("packages/released")/Package
>   by @identifier
>   as xs:string
> *******************************************************************
> module namespace packages = "http://www.nxp.com/packages";;
>
> declare function packages:getPackage($id as xs:string) as
> element(Package)? {
>     index-scan('package_id', $id, 'EQ')
> };
> *******************************************************************
>
> Now I tried to explain a method invocation that uses a index:
> *******************************************************************
> explain
> import module namespace packages = "http://www.nxp.com/packages";;
> packages:getPackage("SOT669")
> *******************************************************************
>
> It shows me following explanation, but no evidence of an index being used.
> *******************************************************************
> <prolog xmlns="http://www.modis.ispras.ru/sedna"/>
> <query xmlns="http://www.modis.ispras.ru/sedna";>
>   <operation name="PPQueryRoot">
>     <operation name="PPFunCall" id="0" function-name="packages:getPackage"
> position="3:1">
>       <operation name="PPConst" type="xs:string" value="SOT669"
> position="3:21"/>
>     </operation>
>   </operation>
> </query>
> *******************************************************************
>
>
> So any tips on debugging my performance problem are very welcome !!
>
> Thx in advance,
> Robby****
>
>
> -----Original Message-----
> From: Robby Pelssers [mailto:robby.pelss...@nxp.com]
> Sent: Thursday, January 10, 2013 12:09 PM
> To: sedna-discussion@lists.sourceforge.net
> Subject: [Sedna-discussion] poor performance for large collection
>
> Hi all,
>
> I have a single collection of 24468 documents.
>
> <count>{count(collection("chemicalContent/released")/TypeName)}</count>
>  == 24468
>
> When I just try to run below statement it takes very long to execute. The
> documents themselves are not even that big varying between 2kb and 12kb.
>
> for $i in collection("chemicalContent/released") return document-uri($i)
>
> I also have a index on that collection:
>
> create index "chemicalcontent_id"
>   on fn:collection("chemicalContent/released")/TypeName
>   by @id
>   as xs:string
>
> Is it normal for that statement to execute that long   (> 1 minute) ?
>
>
> Is there a way to perhaps speed up fetching a list of all @id's for that
> particular collection?
>
> Thx in advance,
> Robby
>
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC,
> Windows 8 Apps, JavaScript and much more. Keep your skills current with
> LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and
> experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> Sedna-discussion mailing list
> Sedna-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sedna-discussion
>
>
> ****
>
>
> ------------------------------------------------------------------------------
> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
> much more. Get web development skills now with LearnDevNow -
> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
> SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122812****
>
> _______________________________________________
> Sedna-discussion mailing list
> Sedna-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sedna-discussion****
>
> ** **
>

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412

_______________________________________________
Sedna-discussion mailing list
Sedna-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Re: [Sedna-discussion] poor performance for large collection

Reply via email to