Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-11 Thread BIRKNER Michael
Hi Christian,


thank you very much for looking into this and also for the query. I can confirm 
that by using your rewritten query the performance problem is gone!


Also thank you for taking the time to explain the technical reasons!


Best regards,

Michael


Mag. Michael Birkner
AK Wien - Bibliothek
1040, Prinz Eugen Straße 20-22
T: +43 1 501 65 12455
F: +43 1 501 65 142455
M: +43 664 88957669

michael.birk...@akwien.at<mailto:michael.birk...@akwien.at>
wien.arbeiterkammer.at<http://wien.arbeiterkammer.at/>

Besuchen Sie uns auch auf:
facebook<http://www.facebook.com/arbeiterkammer/> | 
twitter<https://twitter.com/Arbeiterkammer> | 
youtube<https://www.youtube.com/user/AKoesterreich>
--
Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein.
Damals. Heute. Für immer.

arbeiterkammer.at/100<https://arbeiterkammer.at/100><https://arbeiterkammer.at/100><https://w.ak.at/zukunftsprogramm>



Von: Christian Grün 
Gesendet: Montag, 11. Mai 2020 13:02
An: BIRKNER Michael
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when 
executing specific xQuery

Hi Michael,

I checked your use case in greater depth, and I found the change in
our code that caused the slowdown [1].

A) The nutshell answer : Just use the attached query!

B) The extensive technical answer:

• In previous versions of BaseX, most paths in FLWOR expressions were
»inlined« in the code to trigger further optimizations, such as index
rewritings.
• The enforced inlining led to cases in which the execution time was
worse than for unoptimized queries.
• As a user cannot prevent variables from being inlined, we have
switched to a more predictive pattern in our inlining heuristics:
Paths will only be moved around anymore if we can ensure that the
execution time will not suffer.

A little example:

  let $nodes := db:open('db')/to/this/only/once
  for $i in 1 to 1000
  return $nodes

If $nodes is inlined by the optimizer (i.e., if the variable reference
$nodes in the last line is replaced by the actual path), the path will
be evaluated 1000 times instead of once. The revised query optimizer
won’t inline such paths anymore.

Your particular query benefited from the offensive rewriting, though.
In the first step, "db:open('gnd-sachbegriff')/collection/record" was
inlined by the optimizer:

  let $recFromExistingData := db:open('gnd-sachbegriff')/
collection/record[controlfield[@tag = '001'] = $id]

In the second step, the path was rewritten for index access:

  let $recFromExistingData := db:text('gnd-sachbegriff', $id)/
parent::controlfield[@tag = '001']/parent::record

The index rewriting (which you can spot in the Info View by looking
for "apply text index") led to a much faster evaluation of your query
because it reduces the execution time from quadratic to linear.

If you adopt one of the code lines above, your query will be evaluated
faster again.

In the attached query, db:open is still assigned to variables. As
db:open will only be evaluated once and already at compile time, the
document nodes that will be bound to $sachbegriffe can always be
inlined.

Hope this helps,
Christian

[1] https://github.com/BaseXdb/basex/issues/1722
[https://wien.arbeiterkammer.at/ak100_maildisclaimer.png]<https://arbeiterkammer.at/100>
Beachten Sie, dass Sie uns ab sofort unter einer geänderten Rufnummer 
erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein unter 501 
65 1, gefolgt von der gewohnten Durchwahl.
Dieses Mail ist ausschließlich für die Verwendung durch die/den darin genannten 
AdressatInnen bestimmt und kann vertrauliche bzw rechtlich geschützte 
Informationen enthalten, deren Verwendung ohne Genehmigung durch den/ die 
AbsenderIn rechtswidrig sein kann.
Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns bitte und 
löschen Sie die Nachricht.
UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz


Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-11 Thread Christian Grün
Hi Michael,

I checked your use case in greater depth, and I found the change in
our code that caused the slowdown [1].

A) The nutshell answer : Just use the attached query!

B) The extensive technical answer:

• In previous versions of BaseX, most paths in FLWOR expressions were
»inlined« in the code to trigger further optimizations, such as index
rewritings.
• The enforced inlining led to cases in which the execution time was
worse than for unoptimized queries.
• As a user cannot prevent variables from being inlined, we have
switched to a more predictive pattern in our inlining heuristics:
Paths will only be moved around anymore if we can ensure that the
execution time will not suffer.

A little example:

  let $nodes := db:open('db')/to/this/only/once
  for $i in 1 to 1000
  return $nodes

If $nodes is inlined by the optimizer (i.e., if the variable reference
$nodes in the last line is replaced by the actual path), the path will
be evaluated 1000 times instead of once. The revised query optimizer
won’t inline such paths anymore.

Your particular query benefited from the offensive rewriting, though.
In the first step, "db:open('gnd-sachbegriff')/collection/record" was
inlined by the optimizer:

  let $recFromExistingData := db:open('gnd-sachbegriff')/
collection/record[controlfield[@tag = '001'] = $id]

In the second step, the path was rewritten for index access:

  let $recFromExistingData := db:text('gnd-sachbegriff', $id)/
parent::controlfield[@tag = '001']/parent::record

The index rewriting (which you can spot in the Info View by looking
for "apply text index") led to a much faster evaluation of your query
because it reduces the execution time from quadratic to linear.

If you adopt one of the code lines above, your query will be evaluated
faster again.

In the attached query, db:open is still assigned to variables. As
db:open will only be evaluated once and already at compile time, the
document nodes that will be bound to $sachbegriffe can always be
inlined.

Hope this helps,
Christian

[1] https://github.com/BaseXdb/basex/issues/1722


02-performance-loss-query-cg.xq
Description: Binary data


Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-09 Thread Christian Grün
Thanks in advance for the comprehensive test case.
I’ll probably look at it on Monday or Tuesday.


On Sat, May 9, 2020 at 8:18 AM BIRKNER Michael 
wrote:

> Hello again,
>
> I managed to test again today. Unfortunately I still observe the same
> performance problem in 9.3.2, also with the query that Christian supplied.
> I also tried in 9.3.3 snapshot - same performance loss as in 9.3.2. Still,
> everything is working fine in 9.2.4.
>
> For reproducing the problem I assembled a package with all original XML
> files, the xQueries I execute and a description of the steps I follow (see
> README file in the package). As the XML-data are licenced under CC0 there
> should be no problem in sharing them with the community. You can download
> the whole package here (.zip file with ~150MB):
>
> https://drive.google.com/open?id=1o09YZAqj5Y6ys3oE2tX8JRJ3GKoQ2xUr
>
> I hope that helps tracking down the problem.
>
> Best regards,
> Michael
>
>
>
> Mag. Michael Birkner
> AK Wien - Bibliothek
> 1040, Prinz Eugen Straße 20-22
> T: +43 1 501 65 12455
> F: +43 1 501 65 142455
> M: +43 664 88957669
>
> michael.birk...@akwien.at 
> wien.arbeiterkammer.at
>
> Besuchen Sie uns auch auf:
> facebook <http://www.facebook.com/arbeiterkammer/> | twitter
> <https://twitter.com/Arbeiterkammer> | youtube
> <https://www.youtube.com/user/AKoesterreich>
> --
>
> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute.
> Für immer.*
>
> *arbeiterkammer.at/100 <https://arbeiterkammer.at/100>**
> <https://arbeiterkammer.at/100>* <https://w.ak.at/zukunftsprogramm>
>
>
> --------------
> *Von:* Christian Grün 
> *Gesendet:* Freitag, 8. Mai 2020 14:24
> *An:* BIRKNER Michael
> *Cc:* basex-talk@mailman.uni-konstanz.de
> *Betreff:* Re: [basex-talk] Performance loss between version 9.2.4 and
> 9.3.2 when executing specific xQuery
>
> And I’m always delighted to be confronted with library use case. BaseX
> grew up with library data; at that time, mostly XML variants of MAB2.
>
> I made another intent to reproduce your setting by creating two databases
> with MARCXML data (rather small, 10.000 and 10 documents each). This is the
> query I tried:
>
> let $recsFromDb1  := db:open('db1')//*:record
> let $recsFromDb2 := db:open('db2')//*:record
> let $idsFromRecsInDb1 := distinct-values(
>   $recsFromDb1/*:controlfield[@tag = '001']
> )
> for $id in $idsFromRecsInDb1
> let $recFromDb2WithSameId := $recsFromDb2
>   [*:controlfield[@tag = '001'] = $id]
> return $recFromDb2WithSameId
>
> Both query plans and execution times are pretty much the same. Can you
> tell me what I need to change in my query to simulate the slowdown?
>
> As a preview, I already have an idea how you can boost the query
> evaluation (provided your databases have up-to-date index structures)…
>
>
>
>
> On Fri, May 8, 2020 at 1:31 PM BIRKNER Michael 
> wrote:
>
>> Hi Christian,
>>
>>
>> thank you for your answers. As you can guess the queries I sent in my
>> original email are just simplified  examples.
>>
>>
>> The real XML structure is like the following (its library data in format
>> "MarcXML", here you see an example:
>> https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml)
>>
>>
>> *db1:* each of the 7489 documents has this structure
>>
>>
>> 
>>
>>  
>>
>>ID-Number
>>
>>... [more tags named "controlfield" or "datafield"]
>>
>>  
>>
>>  ... [more records]
>>
>> 
>>
>>
>> So in db1 I have 7489 documents each with a
>> "..." structure, so I have 7489
>> "collection" nodes.
>>
>>
>> *db2:* It's the same structure as above, but there is only 1
>> "collection" and all "records" are within that "collection".
>>
>>
>> Some background information:
>>
>> In db1 I save updated versions of records (downloaded from an OAI-PMH
>> interface, which gives me only 50 records at a time, so I have to page
>> through the results and get 7489 XML-files in the end that I import into
>> db1) that also (partly) exist in db2. So there are multiple records with
>> the same ID (normally only 2 [the original and the updated one, but there
>> could be the case when there are 3 or more records with the same ID because
>> the downloaded updates could contain multiple records with the same ID [an
>> updated one and an update of the updated one and so on ... I know 

Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-08 Thread BIRKNER Michael
Ahh, good old MAB2 :-) ... but it was as complex as Marc21.


Thank you again for your tests and information. I will test that again with my 
setup and let you know if I see any differences between your query and my query 
and what could lead to the performance loss.


But probably I will not be able to do it before Monday next week. I will let 
you know as soon as possible.


Best regards and have a nice weekend!

Michael




Mag. Michael Birkner
AK Wien - Bibliothek
1040, Prinz Eugen Straße 20-22
T: +43 1 501 65 12455
F: +43 1 501 65 142455
M: +43 664 88957669

michael.birk...@akwien.at<mailto:michael.birk...@akwien.at>
wien.arbeiterkammer.at<http://wien.arbeiterkammer.at/>

Besuchen Sie uns auch auf:
facebook<http://www.facebook.com/arbeiterkammer/> | 
twitter<https://twitter.com/Arbeiterkammer> | 
youtube<https://www.youtube.com/user/AKoesterreich>
--
Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein.
Damals. Heute. Für immer.

arbeiterkammer.at/100<https://arbeiterkammer.at/100><https://arbeiterkammer.at/100><https://w.ak.at/zukunftsprogramm>



Von: Christian Grün 
Gesendet: Freitag, 8. Mai 2020 14:24
An: BIRKNER Michael
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when 
executing specific xQuery

And I’m always delighted to be confronted with library use case. BaseX grew up 
with library data; at that time, mostly XML variants of MAB2.

I made another intent to reproduce your setting by creating two databases with 
MARCXML data (rather small, 10.000 and 10 documents each). This is the query I 
tried:

let $recsFromDb1  := db:open('db1')//*:record
let $recsFromDb2 := db:open('db2')//*:record
let $idsFromRecsInDb1 := distinct-values(
  $recsFromDb1/*:controlfield[@tag = '001']
)
for $id in $idsFromRecsInDb1
let $recFromDb2WithSameId := $recsFromDb2
  [*:controlfield[@tag = '001'] = $id]
return $recFromDb2WithSameId

Both query plans and execution times are pretty much the same. Can you tell me 
what I need to change in my query to simulate the slowdown?

As a preview, I already have an idea how you can boost the query evaluation 
(provided your databases have up-to-date index structures)…




On Fri, May 8, 2020 at 1:31 PM BIRKNER Michael 
mailto:michael.birk...@akwien.at>> wrote:

Hi Christian,


thank you for your answers. As you can guess the queries I sent in my original 
email are just simplified  examples.


The real XML structure is like the following (its library data in format 
"MarcXML", here you see an example: 
https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml)


db1: each of the 7489 documents has this structure




 

   ID-Number

   ... [more tags named "controlfield" or "datafield"]

 

 ... [more records]




So in db1 I have 7489 documents each with a 
"..." structure, so I have 7489 
"collection" nodes.


db2: It's the same structure as above, but there is only 1 "collection" and all 
"records" are within that "collection".


Some background information:

In db1 I save updated versions of records (downloaded from an OAI-PMH 
interface, which gives me only 50 records at a time, so I have to page through 
the results and get 7489 XML-files in the end that I import into db1) that also 
(partly) exist in db2. So there are multiple records with the same ID (normally 
only 2 [the original and the updated one, but there could be the case when 
there are 3 or more records with the same ID because the downloaded updates 
could contain multiple records with the same ID [an updated one and an update 
of the updated one and so on ... I know ... complicated]).

One of the records with the same ID is the newest one. My goal is to find the 
newest one and delete the others (based on a timestamp that is also found in 
another  in the record). So all of this is about updating records 
in an existing database from downloaded update-files that I get via OAI.


I hope this information helps. And thank you for pointing out the new version 
9.3.3. I will try that one.


Best regards,

Michael



Mag. Michael Birkner
AK Wien - Bibliothek
1040, Prinz Eugen Straße 20-22
T: +43 1 501 65 12455
F: +43 1 501 65 142455
M: +43 664 88957669

michael.birk...@akwien.at<mailto:michael.birk...@akwien.at>
wien.arbeiterkammer.at<http://wien.arbeiterkammer.at/>

Besuchen Sie uns auch auf:
facebook<http://www.facebook.com/arbeiterkammer/> | 
twitter<https://twitter.com/Arbeiterkammer> | 
youtube<https://www.youtube.com/user/AKoesterreich>
--
Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein.
Damals. Heute. Für immer.

arbeiterkammer.at/100<https://arbeiterkammer.at/100><https://arbeiterkammer.at/100><https://w.ak.at/zu

Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-08 Thread Christian Grün
And I’m always delighted to be confronted with library use case. BaseX grew
up with library data; at that time, mostly XML variants of MAB2.

I made another intent to reproduce your setting by creating two databases
with MARCXML data (rather small, 10.000 and 10 documents each). This is the
query I tried:

let $recsFromDb1  := db:open('db1')//*:record
let $recsFromDb2 := db:open('db2')//*:record
let $idsFromRecsInDb1 := distinct-values(
  $recsFromDb1/*:controlfield[@tag = '001']
)
for $id in $idsFromRecsInDb1
let $recFromDb2WithSameId := $recsFromDb2
  [*:controlfield[@tag = '001'] = $id]
return $recFromDb2WithSameId

Both query plans and execution times are pretty much the same. Can you tell
me what I need to change in my query to simulate the slowdown?

As a preview, I already have an idea how you can boost the query
evaluation (provided your databases have up-to-date index structures)…




On Fri, May 8, 2020 at 1:31 PM BIRKNER Michael 
wrote:

> Hi Christian,
>
>
> thank you for your answers. As you can guess the queries I sent in my
> original email are just simplified  examples.
>
>
> The real XML structure is like the following (its library data in format
> "MarcXML", here you see an example:
> https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml)
>
>
> *db1:* each of the 7489 documents has this structure
>
>
> 
>
>  
>
>ID-Number
>
>... [more tags named "controlfield" or "datafield"]
>
>  
>
>  ... [more records]
>
> 
>
>
> So in db1 I have 7489 documents each with a
> "..." structure, so I have 7489
> "collection" nodes.
>
>
> *db2:* It's the same structure as above, but there is only 1 "collection"
> and all "records" are within that "collection".
>
>
> Some background information:
>
> In db1 I save updated versions of records (downloaded from an OAI-PMH
> interface, which gives me only 50 records at a time, so I have to page
> through the results and get 7489 XML-files in the end that I import into
> db1) that also (partly) exist in db2. So there are multiple records with
> the same ID (normally only 2 [the original and the updated one, but there
> could be the case when there are 3 or more records with the same ID because
> the downloaded updates could contain multiple records with the same ID [an
> updated one and an update of the updated one and so on ... I know ...
> complicated]).
>
> One of the records with the same ID is the newest one. My goal is to find
> the newest one and delete the others (based on a timestamp that is also
> found in another  in the record). So all of this is about
> updating records in an existing database from downloaded update-files that
> I get via OAI.
>
>
> I hope this information helps. And thank you for pointing out the new
> version 9.3.3. I will try that one.
>
>
> Best regards,
>
> Michael
>
>
>
>
> Mag. Michael Birkner
> AK Wien - Bibliothek
> 1040, Prinz Eugen Straße 20-22
> T: +43 1 501 65 12455
> F: +43 1 501 65 142455
> M: +43 664 88957669
>
> michael.birk...@akwien.at 
> wien.arbeiterkammer.at
>
> Besuchen Sie uns auch auf:
> facebook <http://www.facebook.com/arbeiterkammer/> | twitter
> <https://twitter.com/Arbeiterkammer> | youtube
> <https://www.youtube.com/user/AKoesterreich>
> --
>
> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute.
> Für immer.*
>
> *arbeiterkammer.at/100 <https://arbeiterkammer.at/100>**
> <https://arbeiterkammer.at/100>* <https://w.ak.at/zukunftsprogramm>
>
>
> --
> *Von:* Christian Grün 
> *Gesendet:* Freitag, 8. Mai 2020 12:37
> *An:* BIRKNER Michael
> *Cc:* basex-talk@mailman.uni-konstanz.de
> *Betreff:* Re: [basex-talk] Performance loss between version 9.2.4 and
> 9.3.2 when executing specific xQuery
>
> I tried to reproduce your use case by creating some sample data (with a
> few millions of entries), but both the query plan and the performance were
> similar in 9.2.4 and the current 9.3.3 beta version.
>
> And I am still trying to understand your example query. Is it correct that
> the attribute of your exampletag element have static ids, and the text
> value of the exampletag element contains an id as well? If you can provide
> me with some example documents of your database, that might help us to
> track down the problem.
>
> And feel free to check out the latest stable snapshot [1]. BaseX 9.3.3 is
> close, and lots of new optimizations and rewritings have been added since
> 9.3.2, so maybe the problem you encountered is already fixed.
>
> [1] http://

Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-08 Thread Marco Lettere
And even more curiously, we are also working in this very same time on 
handling fetches from OAI-PMH sources! :-D

M.

On 08/05/20 13:37, Imsieke, Gerrit, le-tex wrote:
Just saying that I find it sooo interesting to learn at which places 
and for which purposes BaseX is being employed. Have a nice weekend!


On 08.05.2020 13:31, BIRKNER Michael wrote:

Hi Christian,


thank you for your answers. As you can guess the queries I sent in my 
original email are just simplified  examples.



The real XML structure is like the following (its library data in 
format "MarcXML", here you see an example: 
https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml)




… … …



Mag. Michael Birkner
AK Wien - Bibliothek
1040, Prinz Eugen Straße 20-22
T: +43 1 501 65 12455
F: +43 1 501 65 142455
M: +43 664 88957669





Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-08 Thread Imsieke, Gerrit, le-tex
Just saying that I find it sooo interesting to learn at which places and 
for which purposes BaseX is being employed. Have a nice weekend!


On 08.05.2020 13:31, BIRKNER Michael wrote:

Hi Christian,


thank you for your answers. As you can guess the queries I sent in my 
original email are just simplified  examples.



The real XML structure is like the following (its library data in format 
"MarcXML", here you see an example: 
https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml)




… … …



Mag. Michael Birkner
AK Wien - Bibliothek
1040, Prinz Eugen Straße 20-22
T: +43 1 501 65 12455
F: +43 1 501 65 142455
M: +43 664 88957669



Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-08 Thread BIRKNER Michael
Hi Christian,


thank you for your answers. As you can guess the queries I sent in my original 
email are just simplified  examples.


The real XML structure is like the following (its library data in format 
"MarcXML", here you see an example: 
https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml)


db1: each of the 7489 documents has this structure




 

   ID-Number

   ... [more tags named "controlfield" or "datafield"]

 

 ... [more records]




So in db1 I have 7489 documents each with a 
"..." structure, so I have 7489 
"collection" nodes.


db2: It's the same structure as above, but there is only 1 "collection" and all 
"records" are within that "collection".


Some background information:

In db1 I save updated versions of records (downloaded from an OAI-PMH 
interface, which gives me only 50 records at a time, so I have to page through 
the results and get 7489 XML-files in the end that I import into db1) that also 
(partly) exist in db2. So there are multiple records with the same ID (normally 
only 2 [the original and the updated one, but there could be the case when 
there are 3 or more records with the same ID because the downloaded updates 
could contain multiple records with the same ID [an updated one and an update 
of the updated one and so on ... I know ... complicated]).

One of the records with the same ID is the newest one. My goal is to find the 
newest one and delete the others (based on a timestamp that is also found in 
another  in the record). So all of this is about updating records 
in an existing database from downloaded update-files that I get via OAI.


I hope this information helps. And thank you for pointing out the new version 
9.3.3. I will try that one.


Best regards,

Michael



Mag. Michael Birkner
AK Wien - Bibliothek
1040, Prinz Eugen Straße 20-22
T: +43 1 501 65 12455
F: +43 1 501 65 142455
M: +43 664 88957669

michael.birk...@akwien.at<mailto:michael.birk...@akwien.at>
wien.arbeiterkammer.at<http://wien.arbeiterkammer.at/>

Besuchen Sie uns auch auf:
facebook<http://www.facebook.com/arbeiterkammer/> | 
twitter<https://twitter.com/Arbeiterkammer> | 
youtube<https://www.youtube.com/user/AKoesterreich>
--
Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein.
Damals. Heute. Für immer.

arbeiterkammer.at/100<https://arbeiterkammer.at/100><https://arbeiterkammer.at/100><https://w.ak.at/zukunftsprogramm>


____
Von: Christian Grün 
Gesendet: Freitag, 8. Mai 2020 12:37
An: BIRKNER Michael
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when 
executing specific xQuery

I tried to reproduce your use case by creating some sample data (with a few 
millions of entries), but both the query plan and the performance were similar 
in 9.2.4 and the current 9.3.3 beta version.

And I am still trying to understand your example query. Is it correct that the 
attribute of your exampletag element have static ids, and the text value of the 
exampletag element contains an id as well? If you can provide me with some 
example documents of your database, that might help us to track down the 
problem.

And feel free to check out the latest stable snapshot [1]. BaseX 9.3.3 is 
close, and lots of new optimizations and rewritings have been added since 
9.3.2, so maybe the problem you encountered is already fixed.

[1] http://files.basex.org/releases/latest/




On Fri, May 8, 2020 at 10:19 AM BIRKNER Michael 
mailto:michael.birk...@akwien.at>> wrote:

Hi,

I am observing a performance loss between BaseX versions 9.2.4 (which I was 
using so far) and 9.3.2 (to which I updated recently) when executing an xQuery 
like this:

---
(: Open 2 databases and get all s :)
let $recsFromDb1  := db:open('db1')/record
let $recsFromDb2 := db:open('db2')/record

(: Get distinct IDs of all records in db1 :)
let $idsFromRecsInDb1 := 
distinct-values($recsFromDb1/exampletag[@exampleattr='id'])

(: Iterate over the distinct IDs of db1 and return the records from db2 with 
the same ID :)
for $id in $idsFromRecsInDb1
  let $recFromDb2WithSameId := $recsFromDb2[exampletag[@exampleattr='id']=$id]
  return $recFromDb2WithSameId
---

In BaseX version 9.2.4 the query executes very fast (2 - 3 seconds). In 9.3.2 I 
didn't wait to the end ... I aborted after several minutes because I suspected 
that something must be wrong.

Both BaseX instances have allocated the same amount of memory (4096MB). The 
databases (db1 and db2) were created in the respective BaseX version from 
scratch and contain attribute and text indexes. They were optimized before 
executing the query mentioned above. All options and preferences are the same 
in both BaseX instances. I am using the GUI in Ubuntu 18.04.

Here are some more details about the two dat

Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-08 Thread Christian Grün
I tried to reproduce your use case by creating some sample data (with a few
millions of entries), but both the query plan and the performance were
similar in 9.2.4 and the current 9.3.3 beta version.

And I am still trying to understand your example query. Is it correct that
the attribute of your exampletag element have static ids, and the text
value of the exampletag element contains an id as well? If you can provide
me with some example documents of your database, that might help us to
track down the problem.

And feel free to check out the latest stable snapshot [1]. BaseX 9.3.3 is
close, and lots of new optimizations and rewritings have been added since
9.3.2, so maybe the problem you encountered is already fixed.

[1] http://files.basex.org/releases/latest/




On Fri, May 8, 2020 at 10:19 AM BIRKNER Michael 
wrote:

> Hi,
>
> I am observing a performance loss between BaseX versions 9.2.4 (which I
> was using so far) and 9.3.2 (to which I updated recently) when executing an
> xQuery like this:
>
> ---
> (: Open 2 databases and get all s :)
> let $recsFromDb1  := db:open('db1')/record
> let $recsFromDb2 := db:open('db2')/record
>
> (: Get distinct IDs of all records in db1 :)
> let $idsFromRecsInDb1 :=
> distinct-values($recsFromDb1/exampletag[@exampleattr='id'])
>
> (: Iterate over the distinct IDs of db1 and return the records from db2
> with the same ID :)
> for $id in $idsFromRecsInDb1
>   let $recFromDb2WithSameId := $recsFromDb2[
> exampletag[@exampleattr='id']=$id]
>   return $recFromDb2WithSameId
> ---
>
> In BaseX version 9.2.4 the query executes very fast (2 - 3 seconds). In
> 9.3.2 I didn't wait to the end ... I aborted after several minutes because
> I suspected that something must be wrong.
>
> Both BaseX instances have allocated the same amount of memory (4096MB).
> The databases (db1 and db2) were created in the respective BaseX version
> from scratch and contain attribute and text indexes. They were optimized
> before executing the query mentioned above. All options and preferences are
> the same in both BaseX instances. I am using the GUI in Ubuntu 18.04.
>
> Here are some more details about the two databases:
>
> db1:
> - Size: 2255MB
> - Nodes: 97598775
> - Documents: 7489
> - Uptodate: true
>
> db2:
> - Size: 883MB
> - Nodes: 46317512
> - Documents: 1
> - Uptodate: true
>
> Does someone have an idea why there is such a difference in performance
> between the two BaseX versions?
>
> Thanks for any answers and hints!
>
> Best regards,
> Michael
>
>
>
> Mag. Michael Birkner
> AK Wien - Bibliothek
> 1040, Prinz Eugen Straße 20-22
> T: +43 1 501 65 12455
> F: +43 1 501 65 142455
> M: +43 664 88957669
>
> michael.birk...@akwien.at 
> wien.arbeiterkammer.at
>
> Besuchen Sie uns auch auf:
> facebook  | twitter
>  | youtube
> 
> --
>
> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute.
> Für immer.*
>
> *arbeiterkammer.at/100 **
> * 
> 
> Beachten Sie, dass Sie uns ab sofort unter einer geänderten Rufnummer
> erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein unter *501
> 65 1*, gefolgt von der gewohnten Durchwahl.
> Dieses Mail ist ausschließlich für die Verwendung durch die/den darin
> genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich
> geschützte Informationen enthalten, deren Verwendung ohne Genehmigung durch
> den/ die AbsenderIn rechtswidrig sein kann.
> Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns bitte
> und löschen Sie die Nachricht.
> UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz
>


Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery

2020-05-08 Thread Christian Grün
Thanks, Michael, for the valuable observation. It might be that another
newly integrated optimization proves to be detrimental to your existing
query. I’ll try to find the culprit.

Just a minor question: In db2, a single document seems to be stored. Does
this mean that only one record is assigned to $recsFromDb2, does the one
document have multiple roots, or should it rather be //record instead of
/record?


On Fri, May 8, 2020 at 10:19 AM BIRKNER Michael 
wrote:

> Hi,
>
> I am observing a performance loss between BaseX versions 9.2.4 (which I
> was using so far) and 9.3.2 (to which I updated recently) when executing an
> xQuery like this:
>
> ---
> (: Open 2 databases and get all s :)
> let $recsFromDb1  := db:open('db1')/record
> let $recsFromDb2 := db:open('db2')/record
>
> (: Get distinct IDs of all records in db1 :)
> let $idsFromRecsInDb1 :=
> distinct-values($recsFromDb1/exampletag[@exampleattr='id'])
>
> (: Iterate over the distinct IDs of db1 and return the records from db2
> with the same ID :)
> for $id in $idsFromRecsInDb1
>   let $recFromDb2WithSameId :=
> $recsFromDb2[exampletag[@exampleattr='id']=$id]
>   return $recFromDb2WithSameId
> ---
>
> In BaseX version 9.2.4 the query executes very fast (2 - 3 seconds). In
> 9.3.2 I didn't wait to the end ... I aborted after several minutes because
> I suspected that something must be wrong.
>
> Both BaseX instances have allocated the same amount of memory (4096MB).
> The databases (db1 and db2) were created in the respective BaseX version
> from scratch and contain attribute and text indexes. They were optimized
> before executing the query mentioned above. All options and preferences are
> the same in both BaseX instances. I am using the GUI in Ubuntu 18.04.
>
> Here are some more details about the two databases:
>
> db1:
> - Size: 2255MB
> - Nodes: 97598775
> - Documents: 7489
> - Uptodate: true
>
> db2:
> - Size: 883MB
> - Nodes: 46317512
> - Documents: 1
> - Uptodate: true
>
> Does someone have an idea why there is such a difference in performance
> between the two BaseX versions?
>
> Thanks for any answers and hints!
>
> Best regards,
> Michael
>
>
>
> Mag. Michael Birkner
> AK Wien - Bibliothek
> 1040, Prinz Eugen Straße 20-22
> T: +43 1 501 65 12455
> F: +43 1 501 65 142455
> M: +43 664 88957669
>
> michael.birk...@akwien.at 
> wien.arbeiterkammer.at
>
> Besuchen Sie uns auch auf:
> facebook  | twitter
>  | youtube
> 
> --
>
> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute.
> Für immer.*
>
> *arbeiterkammer.at/100 **
> * 
> 
> Beachten Sie, dass Sie uns ab sofort unter einer geänderten Rufnummer
> erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein unter *501
> 65 1*, gefolgt von der gewohnten Durchwahl.
> Dieses Mail ist ausschließlich für die Verwendung durch die/den darin
> genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich
> geschützte Informationen enthalten, deren Verwendung ohne Genehmigung durch
> den/ die AbsenderIn rechtswidrig sein kann.
> Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns bitte
> und löschen Sie die Nachricht.
> UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz
>