Thanks in advance for the comprehensive test case.
I’ll probably look at it on Monday or Tuesday.


On Sat, May 9, 2020 at 8:18 AM BIRKNER Michael <michael.birk...@akwien.at>
wrote:

> Hello again,
>
> I managed to test again today. Unfortunately I still observe the same
> performance problem in 9.3.2, also with the query that Christian supplied.
> I also tried in 9.3.3 snapshot - same performance loss as in 9.3.2. Still,
> everything is working fine in 9.2.4.
>
> For reproducing the problem I assembled a package with all original XML
> files, the xQueries I execute and a description of the steps I follow (see
> README file in the package). As the XML-data are licenced under CC0 there
> should be no problem in sharing them with the community. You can download
> the whole package here (.zip file with ~150MB):
>
> https://drive.google.com/open?id=1o09YZAqj5Y6ys3oE2tX8JRJ3GKoQ2xUr
>
> I hope that helps tracking down the problem.
>
> Best regards,
> Michael
>
>
>
> Mag. Michael Birkner
> AK Wien - Bibliothek
> 1040, Prinz Eugen Straße 20-22
> T: +43 1 501 65 12455
> F: +43 1 501 65 142455
> M: +43 664 88957669
>
> michael.birk...@akwien.at <michael.birk...@akwien.at>
> wien.arbeiterkammer.at
>
> Besuchen Sie uns auch auf:
> facebook <http://www.facebook.com/arbeiterkammer/> | twitter
> <https://twitter.com/Arbeiterkammer> | youtube
> <https://www.youtube.com/user/AKoesterreich>
> --------------------------------------------------
>
> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute.
> Für immer.*
>
> *arbeiterkammer.at/100 <https://arbeiterkammer.at/100>**
> <https://arbeiterkammer.at/100>* <https://w.ak.at/zukunftsprogramm>
>
>
> ------------------------------
> *Von:* Christian Grün <christian.gr...@gmail.com>
> *Gesendet:* Freitag, 8. Mai 2020 14:24
> *An:* BIRKNER Michael
> *Cc:* basex-talk@mailman.uni-konstanz.de
> *Betreff:* Re: [basex-talk] Performance loss between version 9.2.4 and
> 9.3.2 when executing specific xQuery
>
> And I’m always delighted to be confronted with library use case. BaseX
> grew up with library data; at that time, mostly XML variants of MAB2.
>
> I made another intent to reproduce your setting by creating two databases
> with MARCXML data (rather small, 10.000 and 10 documents each). This is the
> query I tried:
>
> let $recsFromDb1  := db:open('db1')//*:record
> let $recsFromDb2 := db:open('db2')//*:record
> let $idsFromRecsInDb1 := distinct-values(
>   $recsFromDb1/*:controlfield[@tag = '001']
> )
> for $id in $idsFromRecsInDb1
> let $recFromDb2WithSameId := $recsFromDb2
>   [*:controlfield[@tag = '001'] = $id]
> return $recFromDb2WithSameId
>
> Both query plans and execution times are pretty much the same. Can you
> tell me what I need to change in my query to simulate the slowdown?
>
> As a preview, I already have an idea how you can boost the query
> evaluation (provided your databases have up-to-date index structures)…
>
>
>
>
> On Fri, May 8, 2020 at 1:31 PM BIRKNER Michael <michael.birk...@akwien.at>
> wrote:
>
>> Hi Christian,
>>
>>
>> thank you for your answers. As you can guess the queries I sent in my
>> original email are just simplified  examples.
>>
>>
>> The real XML structure is like the following (its library data in format
>> "MarcXML", here you see an example:
>> https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml)
>>
>>
>> *db1:* each of the 7489 documents has this structure
>>
>>
>> <collection>
>>
>>  <record>
>>
>>    <controlfield tag="001">ID-Number</controlfield>
>>
>>    ... [more tags named "controlfield" or "datafield"]
>>
>>  </record>
>>
>>  ... [more records]
>>
>> </collection>
>>
>>
>> So in db1 I have 7489 documents each with a
>> "<collection><record>...</record></collection>" structure, so I have 7489
>> "collection" nodes.
>>
>>
>> *db2:* It's the same structure as above, but there is only 1
>> "collection" and all "records" are within that "collection".
>>
>>
>> Some background information:
>>
>> In db1 I save updated versions of records (downloaded from an OAI-PMH
>> interface, which gives me only 50 records at a time, so I have to page
>> through the results and get 7489 XML-files in the end that I import into
>> db1) that also (partly) exist in db2. So there are multiple records with
>> the same ID (normally only 2 [the original and the updated one, but there
>> could be the case when there are 3 or more records with the same ID because
>> the downloaded updates could contain multiple records with the same ID [an
>> updated one and an update of the updated one and so on ... I know ...
>> complicated]).
>>
>> One of the records with the same ID is the newest one. My goal is to find
>> the newest one and delete the others (based on a timestamp that is also
>> found in another <controlfield> in the record). So all of this is about
>> updating records in an existing database from downloaded update-files that
>> I get via OAI.
>>
>>
>> I hope this information helps. And thank you for pointing out the new
>> version 9.3.3. I will try that one.
>>
>>
>> Best regards,
>>
>> Michael
>>
>>
>>
>>
>> Mag. Michael Birkner
>> AK Wien - Bibliothek
>> 1040, Prinz Eugen Straße 20-22
>> T: +43 1 501 65 12455
>> F: +43 1 501 65 142455
>> M: +43 664 88957669
>>
>> michael.birk...@akwien.at <michael.birk...@akwien.at>
>> wien.arbeiterkammer.at
>>
>> Besuchen Sie uns auch auf:
>> facebook <http://www.facebook.com/arbeiterkammer/> | twitter
>> <https://twitter.com/Arbeiterkammer> | youtube
>> <https://www.youtube.com/user/AKoesterreich>
>> --------------------------------------------------
>>
>> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute.
>> Für immer.*
>>
>> *arbeiterkammer.at/100 <https://arbeiterkammer.at/100>**
>> <https://arbeiterkammer.at/100>* <https://w.ak.at/zukunftsprogramm>
>>
>>
>> ------------------------------
>> *Von:* Christian Grün <christian.gr...@gmail.com>
>> *Gesendet:* Freitag, 8. Mai 2020 12:37
>> *An:* BIRKNER Michael
>> *Cc:* basex-talk@mailman.uni-konstanz.de
>> *Betreff:* Re: [basex-talk] Performance loss between version 9.2.4 and
>> 9.3.2 when executing specific xQuery
>>
>> I tried to reproduce your use case by creating some sample data (with a
>> few millions of entries), but both the query plan and the performance were
>> similar in 9.2.4 and the current 9.3.3 beta version.
>>
>> And I am still trying to understand your example query. Is it correct
>> that the attribute of your exampletag element have static ids, and the text
>> value of the exampletag element contains an id as well? If you can provide
>> me with some example documents of your database, that might help us to
>> track down the problem.
>>
>> And feel free to check out the latest stable snapshot [1]. BaseX 9.3.3 is
>> close, and lots of new optimizations and rewritings have been added since
>> 9.3.2, so maybe the problem you encountered is already fixed.
>>
>> [1] http://files.basex.org/releases/latest/
>>
>>
>>
>>
>> On Fri, May 8, 2020 at 10:19 AM BIRKNER Michael <
>> michael.birk...@akwien.at> wrote:
>>
>>> Hi,
>>>
>>> I am observing a performance loss between BaseX versions 9.2.4 (which I
>>> was using so far) and 9.3.2 (to which I updated recently) when executing an
>>> xQuery like this:
>>>
>>> ---
>>> (: Open 2 databases and get all <record>s :)
>>> let $recsFromDb1  := db:open('db1')/record
>>> let $recsFromDb2 := db:open('db2')/record
>>>
>>> (: Get distinct IDs of all records in db1 :)
>>> let $idsFromRecsInDb1 :=
>>> distinct-values($recsFromDb1/exampletag[@exampleattr='id'])
>>>
>>> (: Iterate over the distinct IDs of db1 and return the records from db2
>>> with the same ID :)
>>> for $id in $idsFromRecsInDb1
>>>   let $recFromDb2WithSameId := $recsFromDb2[
>>> exampletag[@exampleattr='id']=$id]
>>>   return $recFromDb2WithSameId
>>> ---
>>>
>>> In BaseX version 9.2.4 the query executes very fast (2 - 3 seconds). In
>>> 9.3.2 I didn't wait to the end ... I aborted after several minutes because
>>> I suspected that something must be wrong.
>>>
>>> Both BaseX instances have allocated the same amount of memory (4096MB).
>>> The databases (db1 and db2) were created in the respective BaseX version
>>> from scratch and contain attribute and text indexes. They were optimized
>>> before executing the query mentioned above. All options and preferences are
>>> the same in both BaseX instances. I am using the GUI in Ubuntu 18.04.
>>>
>>> Here are some more details about the two databases:
>>>
>>> db1:
>>> - Size: 2255MB
>>> - Nodes: 97598775
>>> - Documents: 7489
>>> - Uptodate: true
>>>
>>> db2:
>>> - Size: 883MB
>>> - Nodes: 46317512
>>> - Documents: 1
>>> - Uptodate: true
>>>
>>> Does someone have an idea why there is such a difference in performance
>>> between the two BaseX versions?
>>>
>>> Thanks for any answers and hints!
>>>
>>> Best regards,
>>> Michael
>>>
>>>
>>>
>>> Mag. Michael Birkner
>>> AK Wien - Bibliothek
>>> 1040, Prinz Eugen Straße 20-22
>>> T: +43 1 501 65 12455
>>> F: +43 1 501 65 142455
>>> M: +43 664 88957669
>>>
>>> michael.birk...@akwien.at <michael.birk...@akwien.at>
>>> wien.arbeiterkammer.at
>>>
>>> Besuchen Sie uns auch auf:
>>> facebook <http://www.facebook.com/arbeiterkammer/> | twitter
>>> <https://twitter.com/Arbeiterkammer> | youtube
>>> <https://www.youtube.com/user/AKoesterreich>
>>> --------------------------------------------------
>>>
>>> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute.
>>> Für immer.*
>>>
>>> *arbeiterkammer.at/100 <https://arbeiterkammer.at/100>**
>>> <https://arbeiterkammer.at/100>* <https://w.ak.at/zukunftsprogramm>
>>> <https://arbeiterkammer.at/100>
>>> Beachten Sie, dass Sie uns ab sofort unter einer geänderten Rufnummer
>>> erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein unter 
>>> *501
>>> 65 1*, gefolgt von der gewohnten Durchwahl.
>>> Dieses Mail ist ausschließlich für die Verwendung durch die/den darin
>>> genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich
>>> geschützte Informationen enthalten, deren Verwendung ohne Genehmigung durch
>>> den/ die AbsenderIn rechtswidrig sein kann.
>>> Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns
>>> bitte und löschen Sie die Nachricht.
>>> UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz
>>>
>>

Reply via email to