Re: [basex-talk] improving query performance

2020-08-22 Thread Christian Grün
Yes, I see now why my query returns much more hits than yours
(including the first). As Liam already expressed, it’s not really a
nested query what you are wanting to achieve.

Oe thing you can always try is to change the order of your for clauses
and see what happens. Maybe you did that already? In all cases, no
index will be applied at the moments with these kinds of query
patterns. I might get back to you later or tomorrow once I have
another idea what could be done.


On Sat, Aug 22, 2020 at 6:42 PM Bill Osmond  wrote:
>
> This is vexing - it seems as though the mechanism that provides the necessary 
> "filtering" is the very thing that slows the execution down so much. This 
> wouldn't have been obvious from the single example document I sent earlier, 
> but each document stands alone: all of the searching and reference linking 
> done for each TrackRelease in a NewReleaseMessage should only refer to other 
> nodes in that same NewReleaseMessage.
>
> In my query, I started out with "for $r in /ernm:NewReleaseMessage" and I 
> used $r on the right hand side of the subsequent for statements. It seems 
> like without that, the execution is quick, but all the results from every 
> document are getting matched to each other. With it, the results are correct, 
> but the execution time shoots way up. In case any of you still have any 
> patience for this question (and thanks again for everything so far!), I've 
> attached a small sample set of 6 documents. The desired number of results 
> from the query is 70 (which is the number of TrackReleases from all the 
> documents combined), and the query that I've adapted from Christian's 
> ddex2.xq which returns the right number of results is the following:
>
> declare namespace ernm = 'http://ddex.net/xml/ern/411';
> (: declare context item := db:open('ddex'); :)
>
> for $r in /ernm:NewReleaseMessage
>
> for $party in $r/PartyList/Party[
>   PartyReference/text() =
>   $r/ReleaseList/TrackRelease/ReleaseLabelReference
> ]
> for $track_release in $r/ReleaseList/TrackRelease[
>   ReleaseLabelReference/text() =
>   $r/PartyList/Party/PartyReference
> ]
> for $sound_recording in $r/ResourceList/SoundRecording[
>   ResourceReference/text() =
>   $track_release/ReleaseResourceReference
> ]
> for $release in $r/ReleaseList/Release[
>   
> ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference/text()
>  =
>   $track_release/ReleaseResourceReference
> ]
> return 
>   { $track_release/ReleaseId/ISRC/text() }
>   { fn:string-join($sound_recording/DisplayArtistName, '/') }
>   { $sound_recording/DisplayTitleText/text() }
>   { $release/DisplayTitleText/text() }
>   { $release/ReleaseId/ICPN/text() }
>   { $party/PartyName/FullName/text() }
> 


Re: [basex-talk] improving query performance

2020-08-22 Thread Christian Grün
That's good to hear. My rewritten query was based on the query of your
first post, and I already guessed that all the nested loops are not really
wanted or required.

Looking forward to learning about your next insights,
Christian



Bill Osmond  schrieb am Sa., 22. Aug. 2020, 16:31:

> Great e-mail messages to wake up to! Thank you for the further explanation
> Liam, and Christian the examples you provided were considerably faster:
>
> - my fastest was 70k ms
> - your ddex.xq was 35kms
> - your ddex2.xq was 10kms!
>
> There is only one issue: both ddex.xq and ddex2.xq seem to return many
> more results than expected (cartesian product somewhere perhaps)
>
> When I run the queries against a smaller database - one with just 6 of the
> DDEX documents, my query returns 70 results which matches the number of
> TrackReleases, but both ddex.xq and ddex2.xq return 303,134 results. It
> looks like a separate "copy" of the output is being created for every Party
> in the PartyList, when really there should be only one (specified by the
> PartyReference). But this is very promising - if it takes 10 seconds to
> return a massively expanded version of the data, then perhaps this will get
> to <1000ms!
>
> On Sat, Aug 22, 2020 at 4:07 AM Christian Grün 
> wrote:
>
>> Hi Bill,
>>
>> Feel free to run the attached queries; maybe they give you a faster
>> result.
>>
>> Your use case was interesting. It gave me some additional ideas on how
>> to speed up queries (by reordering consecutive 'for' clauses that do
>> not change the result).
>>
>> Cheers,
>> Christian
>>
>>
>> On Sat, Aug 22, 2020 at 6:10 AM Liam R. E. Quin 
>> wrote:
>> >
>> > On Fri, 2020-08-21 at 17:28 -0700, Bill Osmond wrote:
>> > > I'm beginning to think that perhaps my performance hopes were a bit
>> > > too
>> > > inflated, given the size and complexity of our database. After a
>> > > fresh
>> > > optimization, and with -Xms2g -Xmx10g, the following query takes
>> > > 1492ms:
>> >
>> > [...]
>> >
>> > First note - there are in fact no loops in your query. Although "for"
>> > is used to introduce a loop in many procedural languages, it does nto
>> > do so in XQuery (nor does for-each in XSLT).
>> >
>> > In fact, it's closer to what SQL people know as a join.
>> >
>> > It's making a stream of n-tuples, and then evaluating the inner
>> > expression for each tuple, so that
>> >
>> > for $a in (  'a', 'b', 'c')
>> >   for $b in (1 to 5)
>> > return $a || '-' || $b
>> >
>> > produces 15 lines of output,
>> > a-1, a-2, 1-3, a-4, a-6, b-1, and so on.
>> >
>> > You can see the BaseX query plan for your query already moves your
>> > where clauses as i did by hand, because BaseX is awesome.
>> >
>> > To make the query fast, you either need to reduce the number of tuples,
>> > and henve the number of times the expressions are evaluated, or you
>> > need to reduce the cost of creating the tuples.
>> >
>> > Moving the where clauses was my attempt to reduce the number of tuples.
>> > Adding an index might reduce the cost of making the tuples, so i'd
>> > certainly try that.
>> >
>> > If the input document is sorted, you might be able to construct
>> > something recursively (e.g. with fold-left) or use grouping or
>> > windowing to process $parties in groups, which may help considerably.
>> >
>> > Without seeing the data, that's only a guess.
>> >
>> > Liam
>> >
>> > --
>> > Liam Quin, https://www.delightfulcomputing.com/
>> > Available for XML/Document/Information Architecture/XSLT/
>> > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
>> > Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
>> >
>>
>
>


Re: [basex-talk] improving query performance

2020-08-22 Thread Bill Osmond
Great e-mail messages to wake up to! Thank you for the further explanation
Liam, and Christian the examples you provided were considerably faster:

- my fastest was 70k ms
- your ddex.xq was 35kms
- your ddex2.xq was 10kms!

There is only one issue: both ddex.xq and ddex2.xq seem to return many more
results than expected (cartesian product somewhere perhaps)

When I run the queries against a smaller database - one with just 6 of the
DDEX documents, my query returns 70 results which matches the number of
TrackReleases, but both ddex.xq and ddex2.xq return 303,134 results. It
looks like a separate "copy" of the output is being created for every Party
in the PartyList, when really there should be only one (specified by the
PartyReference). But this is very promising - if it takes 10 seconds to
return a massively expanded version of the data, then perhaps this will get
to <1000ms!

On Sat, Aug 22, 2020 at 4:07 AM Christian Grün 
wrote:

> Hi Bill,
>
> Feel free to run the attached queries; maybe they give you a faster result.
>
> Your use case was interesting. It gave me some additional ideas on how
> to speed up queries (by reordering consecutive 'for' clauses that do
> not change the result).
>
> Cheers,
> Christian
>
>
> On Sat, Aug 22, 2020 at 6:10 AM Liam R. E. Quin 
> wrote:
> >
> > On Fri, 2020-08-21 at 17:28 -0700, Bill Osmond wrote:
> > > I'm beginning to think that perhaps my performance hopes were a bit
> > > too
> > > inflated, given the size and complexity of our database. After a
> > > fresh
> > > optimization, and with -Xms2g -Xmx10g, the following query takes
> > > 1492ms:
> >
> > [...]
> >
> > First note - there are in fact no loops in your query. Although "for"
> > is used to introduce a loop in many procedural languages, it does nto
> > do so in XQuery (nor does for-each in XSLT).
> >
> > In fact, it's closer to what SQL people know as a join.
> >
> > It's making a stream of n-tuples, and then evaluating the inner
> > expression for each tuple, so that
> >
> > for $a in (  'a', 'b', 'c')
> >   for $b in (1 to 5)
> > return $a || '-' || $b
> >
> > produces 15 lines of output,
> > a-1, a-2, 1-3, a-4, a-6, b-1, and so on.
> >
> > You can see the BaseX query plan for your query already moves your
> > where clauses as i did by hand, because BaseX is awesome.
> >
> > To make the query fast, you either need to reduce the number of tuples,
> > and henve the number of times the expressions are evaluated, or you
> > need to reduce the cost of creating the tuples.
> >
> > Moving the where clauses was my attempt to reduce the number of tuples.
> > Adding an index might reduce the cost of making the tuples, so i'd
> > certainly try that.
> >
> > If the input document is sorted, you might be able to construct
> > something recursively (e.g. with fold-left) or use grouping or
> > windowing to process $parties in groups, which may help considerably.
> >
> > Without seeing the data, that's only a guess.
> >
> > Liam
> >
> > --
> > Liam Quin, https://www.delightfulcomputing.com/
> > Available for XML/Document/Information Architecture/XSLT/
> > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> > Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
> >
>


Re: [basex-talk] documentation flaw

2020-08-22 Thread Christian Grün
Hi Rob, you are right. I have replaced "or" with "and". Thanks for the
hint. – Best, Christian

On Sat, Aug 22, 2020 at 12:57 PM RobStapper  wrote:

> Hi,
>
>
>
> I have my doubts about the expressions used at  the “tertium non
> datur”-principle in the documentation [1].
>
> Or the expression should be “$a or not($a)” or the rewritten expression
> should be “false()” .
>
>
>
> [1]: https://docs.basex.org/wiki/XQuery_Optimizations#Pure_Logic
>
>
>
> Best regards,
>
>
>
> Rob Stapper
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>
>
> 
>  Virus-free.
> www.avast.com
> 
> <#m_-2765550508791472733_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


Re: [basex-talk] improving query performance

2020-08-22 Thread Christian Grün
Hi Bill,

Feel free to run the attached queries; maybe they give you a faster result.

Your use case was interesting. It gave me some additional ideas on how
to speed up queries (by reordering consecutive 'for' clauses that do
not change the result).

Cheers,
Christian


On Sat, Aug 22, 2020 at 6:10 AM Liam R. E. Quin  wrote:
>
> On Fri, 2020-08-21 at 17:28 -0700, Bill Osmond wrote:
> > I'm beginning to think that perhaps my performance hopes were a bit
> > too
> > inflated, given the size and complexity of our database. After a
> > fresh
> > optimization, and with -Xms2g -Xmx10g, the following query takes
> > 1492ms:
>
> [...]
>
> First note - there are in fact no loops in your query. Although "for"
> is used to introduce a loop in many procedural languages, it does nto
> do so in XQuery (nor does for-each in XSLT).
>
> In fact, it's closer to what SQL people know as a join.
>
> It's making a stream of n-tuples, and then evaluating the inner
> expression for each tuple, so that
>
> for $a in (  'a', 'b', 'c')
>   for $b in (1 to 5)
> return $a || '-' || $b
>
> produces 15 lines of output,
> a-1, a-2, 1-3, a-4, a-6, b-1, and so on.
>
> You can see the BaseX query plan for your query already moves your
> where clauses as i did by hand, because BaseX is awesome.
>
> To make the query fast, you either need to reduce the number of tuples,
> and henve the number of times the expressions are evaluated, or you
> need to reduce the cost of creating the tuples.
>
> Moving the where clauses was my attempt to reduce the number of tuples.
> Adding an index might reduce the cost of making the tuples, so i'd
> certainly try that.
>
> If the input document is sorted, you might be able to construct
> something recursively (e.g. with fold-left) or use grouping or
> windowing to process $parties in groups, which may help considerably.
>
> Without seeing the data, that's only a guess.
>
> Liam
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
>


ddex2.xq
Description: Binary data


ddex.xq
Description: Binary data


[basex-talk] documentation flaw

2020-08-22 Thread RobStapper
Hi,

I have my doubts about the expressions used at  the “tertium non 
datur”-principle in the documentation [1].
Or the expression should be “$a or not($a)” or the rewritten expression should 
be “false()” .

[1]: https://docs.basex.org/wiki/XQuery_Optimizations#Pure_Logic

Best regards,

Rob Stapper

Sent from Mail for Windows 10



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus