[basex-talk] Is there an API that provides XQuery compilation results?
When I use the GUI application I can see some error description (even if it a little terse) when my queries are incorrect. For instance I might get: Stopped at tableFields.xq, 27/50: [XQST0118] Different start and end tag: If I submit a query like However, I cannot seem to find an API that brings me that information? I tried: xquery:parse("", map {'pass':true()}) but that did not get the result I expected. Also, while we're at at it, it would be nice to also get the optimized query. Best Regards Peter Villadsen Principal Architect Microsoft Business Applications Group
Re: [basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending
And I discovered a little bug, just after I have dispatched the message: The reversed nodes will be rewritten to document order in the $id-string function. You can circumvent this by utilizing the simple map operator: Old: $nodes/(@ID, @xml:id) New: $nodes!(@ID, @xml:id) On Tue, Nov 12, 2019 at 6:48 PM Christian Grün wrote: > Dear Omar, > > Some spontaneous ideas: > > • You could try to evaluate redundant expressions once and bind them to a > variable instead (see the attached code). > • You could save each document to a separate database via db:create > (depending on your data, this may be faster than replacements in a single > database), or save all new elements in a single document. > • Instead of creating full index structures with each update operation, > you may save a lot of time if you only update parts of the data that have > actually changed. > • If that’s close to impossible (because the types of updates are too > manifold), you could work with daily databases that only contain > incremental changes, and merge them with the main database every night. > > 2,4 million tags are a lot, though; and the string length of the created > attribute values seem to exceed 100.000 characters, which is a lot, too. > What will you do with the resulting documents? > > Best, > Christian > > > declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;; > > let $id-string := function($nodes) { > $nodes/(@ID, @xml:id) > => subsequence(1, 1) > => string-join(' ') > } > > let $db := '_qdb-TEI-02__cache' > let $nodes := db:open($db)/_:dryed[@order = 'none']/_:d > > let $vutlsk := sort($nodes, (), function($n) { $n/@vutlsk }) > let $archiv := sort($nodes, (), function($n) { $n/@vutlsk-archiv }) > > return ( > db:replace($db, 'ascending_cache.xml', > <_:dryed order="ascending" ids="{ $id-string($vutlsk) }"/>), > db:replace($db, 'descending_cache.xml', > <_:dryed order="descending" ids="{ $id-string(reverse($vutlsk)) }"/>), > db:replace($db, 'ascending-archiv_cache.xml', > <_:dryed order="ascending" ids="{ $id-string($archiv) }" > label="archiv"/>), > db:replace($db, 'descending-archiv_cache.xml', > <_:dryed order="descending" ids="{ $id-string(reverse($archiv)) }" > label="archiv"/>) > ) > > > On Tue, Nov 12, 2019 at 6:00 PM Omar Siam wrote: > >> Hi, >> >> I have a custom index that looks like this (one db, different files): >> >> <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util; >> db_name="z881_qdb-TEI-02n" order="none"> >><_:d pre="15627" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e2" >> vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/> >><_:d pre="15673" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e21" >> vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/> >> ... >> >> <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util; >> db_name="f227_qdb-TEI-02n" order="none"> >><_:d pre="467" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29398" >> vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#944.1 = >> fare0126.eck#1.1"/> >><_:d pre="591" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29438" >> vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#945.1 = >> fare0126.eck#2.1"/> >> ... >> >> >> There are about 2.4 Mio _:d tags in this db. >> >> I need to sort them by the @vutlsk* attributes alphabetically in >> ascending and descending order. >> >> With the code I have now: >> >> declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;; >> >> let $sorted-ascending := subsequence(for $d in >> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d >>order by $d/@vutlsk ascending >>return $d/(@ID, @xml:id)/data(), 1, 1) >> let $sorted-descending := subsequence(for $d in >> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d >>order by $d/@vutlsk descending >>return $d/(@ID, @xml:id)/data(), 1, 1) >> let $sorted-ascending-archiv := subsequence(for $d in >> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d >>order by $d/@vutlsk-archiv ascending >>return $d/(@ID, @xml:id)/data(), 1, 1) >> let $sorted-descending-archiv := subsequence(for $d in >> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d >>order by $d/@vutlsk-archiv descending >>return $d/(@ID, @xml:id)/data(), 1, 1) >> return (db:replace("_qdb-TEI-02__cache", 'ascending_cache.xml', <_:dryed >> order="ascending" ids="{string-join($sorted-ascending, ' ')}"/>), >> db:replace("_qdb-TEI-02__cache", 'descending_cache.xml', <_:dryed >> order="descending" ids="{string-join($sorted-descending, ' ')}"/>), >> db:replace("_qdb-TEI-02__cache", 'ascending-archiv_cache.xml', <_:dryed >> order="ascending" label="archiv" >> ids="{string-join($sorted-ascending-archiv, ' ')}"/>), >> db:replace("_qdb-TEI-02__cache", 'descending-archiv_cache.xml', <_:dryed >> order="descending" label="archiv" >> ids="{string-join($sorted-descending-archiv, ' ')}"/>)) >>
Re: [basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending
Dear Omar, Some spontaneous ideas: • You could try to evaluate redundant expressions once and bind them to a variable instead (see the attached code). • You could save each document to a separate database via db:create (depending on your data, this may be faster than replacements in a single database), or save all new elements in a single document. • Instead of creating full index structures with each update operation, you may save a lot of time if you only update parts of the data that have actually changed. • If that’s close to impossible (because the types of updates are too manifold), you could work with daily databases that only contain incremental changes, and merge them with the main database every night. 2,4 million tags are a lot, though; and the string length of the created attribute values seem to exceed 100.000 characters, which is a lot, too. What will you do with the resulting documents? Best, Christian declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;; let $id-string := function($nodes) { $nodes/(@ID, @xml:id) => subsequence(1, 1) => string-join(' ') } let $db := '_qdb-TEI-02__cache' let $nodes := db:open($db)/_:dryed[@order = 'none']/_:d let $vutlsk := sort($nodes, (), function($n) { $n/@vutlsk }) let $archiv := sort($nodes, (), function($n) { $n/@vutlsk-archiv }) return ( db:replace($db, 'ascending_cache.xml', <_:dryed order="ascending" ids="{ $id-string($vutlsk) }"/>), db:replace($db, 'descending_cache.xml', <_:dryed order="descending" ids="{ $id-string(reverse($vutlsk)) }"/>), db:replace($db, 'ascending-archiv_cache.xml', <_:dryed order="ascending" ids="{ $id-string($archiv) }" label="archiv"/>), db:replace($db, 'descending-archiv_cache.xml', <_:dryed order="descending" ids="{ $id-string(reverse($archiv)) }" label="archiv"/>) ) On Tue, Nov 12, 2019 at 6:00 PM Omar Siam wrote: > Hi, > > I have a custom index that looks like this (one db, different files): > > <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util; > db_name="z881_qdb-TEI-02n" order="none"> ><_:d pre="15627" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e2" > vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/> ><_:d pre="15673" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e21" > vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/> > ... > > <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util; > db_name="f227_qdb-TEI-02n" order="none"> ><_:d pre="467" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29398" > vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#944.1 = > fare0126.eck#1.1"/> ><_:d pre="591" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29438" > vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#945.1 = > fare0126.eck#2.1"/> > ... > > > There are about 2.4 Mio _:d tags in this db. > > I need to sort them by the @vutlsk* attributes alphabetically in > ascending and descending order. > > With the code I have now: > > declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;; > > let $sorted-ascending := subsequence(for $d in > collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d >order by $d/@vutlsk ascending >return $d/(@ID, @xml:id)/data(), 1, 1) > let $sorted-descending := subsequence(for $d in > collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d >order by $d/@vutlsk descending >return $d/(@ID, @xml:id)/data(), 1, 1) > let $sorted-ascending-archiv := subsequence(for $d in > collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d >order by $d/@vutlsk-archiv ascending >return $d/(@ID, @xml:id)/data(), 1, 1) > let $sorted-descending-archiv := subsequence(for $d in > collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d >order by $d/@vutlsk-archiv descending >return $d/(@ID, @xml:id)/data(), 1, 1) > return (db:replace("_qdb-TEI-02__cache", 'ascending_cache.xml', <_:dryed > order="ascending" ids="{string-join($sorted-ascending, ' ')}"/>), > db:replace("_qdb-TEI-02__cache", 'descending_cache.xml', <_:dryed > order="descending" ids="{string-join($sorted-descending, ' ')}"/>), > db:replace("_qdb-TEI-02__cache", 'ascending-archiv_cache.xml', <_:dryed > order="ascending" label="archiv" > ids="{string-join($sorted-ascending-archiv, ' ')}"/>), > db:replace("_qdb-TEI-02__cache", 'descending-archiv_cache.xml', <_:dryed > order="descending" label="archiv" > ids="{string-join($sorted-descending-archiv, ' ')}"/>)) > > This takes 30 s to about a minute depending on the subsequence I choose. > > I did experiments with doing multithreading and not. Multiple jobs or > fork-join make it worse. > > Worst case I need to do it every time I save a change to the original > DBs for which I maintain that index. > > Any ideas how to speed this up? > > Best regards > > Omar Siam > >
[basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending
Hi, I have a custom index that looks like this (one db, different files): <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util; db_name="z881_qdb-TEI-02n" order="none"> <_:d pre="15627" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e2" vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/> <_:d pre="15673" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e21" vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/> ... <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util; db_name="f227_qdb-TEI-02n" order="none"> <_:d pre="467" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29398" vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#944.1 = fare0126.eck#1.1"/> <_:d pre="591" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29438" vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#945.1 = fare0126.eck#2.1"/> ... There are about 2.4 Mio _:d tags in this db. I need to sort them by the @vutlsk* attributes alphabetically in ascending and descending order. With the code I have now: declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;; let $sorted-ascending := subsequence(for $d in collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d order by $d/@vutlsk ascending return $d/(@ID, @xml:id)/data(), 1, 1) let $sorted-descending := subsequence(for $d in collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d order by $d/@vutlsk descending return $d/(@ID, @xml:id)/data(), 1, 1) let $sorted-ascending-archiv := subsequence(for $d in collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d order by $d/@vutlsk-archiv ascending return $d/(@ID, @xml:id)/data(), 1, 1) let $sorted-descending-archiv := subsequence(for $d in collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d order by $d/@vutlsk-archiv descending return $d/(@ID, @xml:id)/data(), 1, 1) return (db:replace("_qdb-TEI-02__cache", 'ascending_cache.xml', <_:dryed order="ascending" ids="{string-join($sorted-ascending, ' ')}"/>), db:replace("_qdb-TEI-02__cache", 'descending_cache.xml', <_:dryed order="descending" ids="{string-join($sorted-descending, ' ')}"/>), db:replace("_qdb-TEI-02__cache", 'ascending-archiv_cache.xml', <_:dryed order="ascending" label="archiv" ids="{string-join($sorted-ascending-archiv, ' ')}"/>), db:replace("_qdb-TEI-02__cache", 'descending-archiv_cache.xml', <_:dryed order="descending" label="archiv" ids="{string-join($sorted-descending-archiv, ' ')}"/>)) This takes 30 s to about a minute depending on the subsequence I choose. I did experiments with doing multithreading and not. Multiple jobs or fork-join make it worse. Worst case I need to do it every time I save a change to the original DBs for which I maintain that index. Any ideas how to speed this up? Best regards Omar Siam
Re: [basex-talk] HTTPServer + gzip compression
Hi Stefan, Jetty’s GZIP feature can now be enabled in BaseX [1,2]. Looking forward to your testing feedback, Christian [1] http://docs.basex.org/wiki/Options#GZIP [2] http://files.basex.org/releases/latest/ On Mon, Oct 7, 2019 at 1:40 PM Stefan Koch wrote: > Hi Christian, > > thx for your reply. Got it :) > Solution 2 would be cool - maybe feature request? > > But I can live with a workaround. > > kind regards, > > Stefan > > > -Ursprüngliche Nachricht- > *Von:* Christian Grün [mailto:christian.gr...@gmail.com] > *Gesendet:* Donnerstag, 3. Oktober 2019 11:51 > *An:* Stefan Koch > *Cc:* BaseX > *Betreff:* Re: [basex-talk] HTTPServer + gzip compression > > > Hi Stefan, > > There’s a StackOverflow entry that has previously been referenced on > this list (by Michael Seiferle, see [2]). Based on this thread, in > which Joakim Erdfelt (the magnificent Jetty core developer) explains > why the existing approaches for enabling GZIP compression don’t work > anymore, I see three choices: > > 1. to wrap all HTTP responses in a GZIP output stream; > 2. to initialize GZipHandler in our basexhttp code; or > 3. enable GZIP compression outside BaseX, > > Alternative 1 would give us most control, but it would raise new > questions that would need to be solved. Alternative 2 may be the > better approach: It only works if basexhttp is used, but we could > benefit from existing optimizations and tweaks from the Jetty > implementation [2]. Alternative 3 is already available: You can use > another light-weight web server as proxy (caddy, nginx), or you can > start Jetty as described by Joakim. > > > Good idea to ask the Jetty devs - I'll try that. > > Thanks; feel free to keep us updated, > Christian > > [1] https://stackoverflow.com/questions/38635262/jetty-9-and-gziphandler > [2] https://www.eclipse.org/jetty/documentation/current/gzip-filter.html >