date:20191112

[basex-talk] Is there an API that provides XQuery compilation results?

2019-11-12 Thread Peter Villadsen

When I use the GUI application I can see some error description (even if it a 
little terse) when my queries are incorrect. For instance I might get:

Stopped at tableFields.xq, 27/50:
[XQST0118] Different start and end tag: 

If I submit a query like 

However, I cannot seem to find an API that brings me that information? I tried:

xquery:parse("", map {'pass':true()})

but that did not get the result I expected.  Also, while we're at at it, it 
would be nice to also get the optimized query.


Best Regards

Peter Villadsen
Principal Architect
Microsoft Business Applications Group

Re: [basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending

2019-11-12 Thread Christian Grün

And I discovered a little bug, just after I have dispatched the message:
The reversed nodes will be rewritten to document order in the $id-string
function. You can circumvent this by utilizing the simple map operator:

Old: $nodes/(@ID, @xml:id)
New: $nodes!(@ID, @xml:id)



On Tue, Nov 12, 2019 at 6:48 PM Christian Grün 
wrote:

> Dear Omar,
>
> Some spontaneous ideas:
>
> • You could try to evaluate redundant expressions once and bind them to a
> variable instead (see the attached code).
> • You could save each document to a separate database via db:create
> (depending on your data, this may be faster than replacements in a single
> database), or save all new elements in a single document.
> • Instead of creating full index structures with each update operation,
> you may save a lot of time if you only update parts of the data that have
> actually changed.
> • If that’s close to impossible (because the types of updates are too
> manifold), you could work with daily databases that only contain
> incremental changes, and merge them with the main database every night.
>
> 2,4 million tags are a lot, though; and the string length of the created
> attribute values seem to exceed 100.000 characters, which is a lot, too.
> What will you do with the resulting documents?
>
> Best,
> Christian
>
>
> declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;;
>
> let $id-string := function($nodes) {
>   $nodes/(@ID, @xml:id)
>   => subsequence(1, 1)
>   => string-join(' ')
> }
>
> let $db := '_qdb-TEI-02__cache'
> let $nodes := db:open($db)/_:dryed[@order = 'none']/_:d
>
> let $vutlsk := sort($nodes, (), function($n) { $n/@vutlsk })
> let $archiv := sort($nodes, (), function($n) { $n/@vutlsk-archiv })
>
> return (
>   db:replace($db, 'ascending_cache.xml',
> <_:dryed order="ascending" ids="{ $id-string($vutlsk) }"/>),
>   db:replace($db, 'descending_cache.xml',
> <_:dryed order="descending" ids="{ $id-string(reverse($vutlsk)) }"/>),
>   db:replace($db, 'ascending-archiv_cache.xml',
> <_:dryed order="ascending" ids="{ $id-string($archiv) }"
> label="archiv"/>),
>   db:replace($db, 'descending-archiv_cache.xml',
> <_:dryed order="descending" ids="{ $id-string(reverse($archiv)) }"
> label="archiv"/>)
> )
> 
>
> On Tue, Nov 12, 2019 at 6:00 PM Omar Siam  wrote:
>
>> Hi,
>>
>> I have a custom index that looks like this (one db, different files):
>>
>> <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util;
>> db_name="z881_qdb-TEI-02n" order="none">
>><_:d pre="15627" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e2"
>> vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/>
>><_:d pre="15673" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e21"
>> vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/>
>> ...
>> 
>> <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util;
>> db_name="f227_qdb-TEI-02n" order="none">
>><_:d pre="467" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29398"
>> vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#944.1 =
>> fare0126.eck#1.1"/>
>><_:d pre="591" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29438"
>> vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#945.1 =
>> fare0126.eck#2.1"/>
>> ...
>> 
>>
>> There are about 2.4 Mio _:d tags in this db.
>>
>> I need to sort them by the @vutlsk* attributes alphabetically in
>> ascending and descending order.
>>
>> With the code I have now:
>>
>> declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;;
>>
>> let $sorted-ascending := subsequence(for $d in
>> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
>>order by $d/@vutlsk ascending
>>return $d/(@ID, @xml:id)/data(), 1, 1)
>> let $sorted-descending := subsequence(for $d in
>> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
>>order by $d/@vutlsk descending
>>return $d/(@ID, @xml:id)/data(), 1, 1)
>> let $sorted-ascending-archiv := subsequence(for $d in
>> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
>>order by $d/@vutlsk-archiv ascending
>>return $d/(@ID, @xml:id)/data(), 1, 1)
>> let $sorted-descending-archiv := subsequence(for $d in
>> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
>>order by $d/@vutlsk-archiv descending
>>return $d/(@ID, @xml:id)/data(), 1, 1)
>> return (db:replace("_qdb-TEI-02__cache", 'ascending_cache.xml', <_:dryed
>> order="ascending" ids="{string-join($sorted-ascending, ' ')}"/>),
>> db:replace("_qdb-TEI-02__cache", 'descending_cache.xml', <_:dryed
>> order="descending" ids="{string-join($sorted-descending, ' ')}"/>),
>> db:replace("_qdb-TEI-02__cache", 'ascending-archiv_cache.xml', <_:dryed
>> order="ascending" label="archiv"
>> ids="{string-join($sorted-ascending-archiv, ' ')}"/>),
>> db:replace("_qdb-TEI-02__cache", 'descending-archiv_cache.xml', <_:dryed
>> order="descending" label="archiv"
>> ids="{string-join($sorted-descending-archiv, ' ')}"/>))
>>

Re: [basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending

2019-11-12 Thread Christian Grün

Dear Omar,

Some spontaneous ideas:

• You could try to evaluate redundant expressions once and bind them to a
variable instead (see the attached code).
• You could save each document to a separate database via db:create
(depending on your data, this may be faster than replacements in a single
database), or save all new elements in a single document.
• Instead of creating full index structures with each update operation, you
may save a lot of time if you only update parts of the data that have
actually changed.
• If that’s close to impossible (because the types of updates are too
manifold), you could work with daily databases that only contain
incremental changes, and merge them with the main database every night.

2,4 million tags are a lot, though; and the string length of the created
attribute values seem to exceed 100.000 characters, which is a lot, too.
What will you do with the resulting documents?

Best,
Christian


declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;;

let $id-string := function($nodes) {
  $nodes/(@ID, @xml:id)
  => subsequence(1, 1)
  => string-join(' ')
}

let $db := '_qdb-TEI-02__cache'
let $nodes := db:open($db)/_:dryed[@order = 'none']/_:d

let $vutlsk := sort($nodes, (), function($n) { $n/@vutlsk })
let $archiv := sort($nodes, (), function($n) { $n/@vutlsk-archiv })

return (
  db:replace($db, 'ascending_cache.xml',
<_:dryed order="ascending" ids="{ $id-string($vutlsk) }"/>),
  db:replace($db, 'descending_cache.xml',
<_:dryed order="descending" ids="{ $id-string(reverse($vutlsk)) }"/>),
  db:replace($db, 'ascending-archiv_cache.xml',
<_:dryed order="ascending" ids="{ $id-string($archiv) }"
label="archiv"/>),
  db:replace($db, 'descending-archiv_cache.xml',
<_:dryed order="descending" ids="{ $id-string(reverse($archiv)) }"
label="archiv"/>)
)


On Tue, Nov 12, 2019 at 6:00 PM Omar Siam  wrote:

> Hi,
>
> I have a custom index that looks like this (one db, different files):
>
> <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util;
> db_name="z881_qdb-TEI-02n" order="none">
><_:d pre="15627" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e2"
> vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/>
><_:d pre="15673" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e21"
> vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/>
> ...
> 
> <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util;
> db_name="f227_qdb-TEI-02n" order="none">
><_:d pre="467" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29398"
> vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#944.1 =
> fare0126.eck#1.1"/>
><_:d pre="591" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29438"
> vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#945.1 =
> fare0126.eck#2.1"/>
> ...
> 
>
> There are about 2.4 Mio _:d tags in this db.
>
> I need to sort them by the @vutlsk* attributes alphabetically in
> ascending and descending order.
>
> With the code I have now:
>
> declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;;
>
> let $sorted-ascending := subsequence(for $d in
> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
>order by $d/@vutlsk ascending
>return $d/(@ID, @xml:id)/data(), 1, 1)
> let $sorted-descending := subsequence(for $d in
> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
>order by $d/@vutlsk descending
>return $d/(@ID, @xml:id)/data(), 1, 1)
> let $sorted-ascending-archiv := subsequence(for $d in
> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
>order by $d/@vutlsk-archiv ascending
>return $d/(@ID, @xml:id)/data(), 1, 1)
> let $sorted-descending-archiv := subsequence(for $d in
> collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
>order by $d/@vutlsk-archiv descending
>return $d/(@ID, @xml:id)/data(), 1, 1)
> return (db:replace("_qdb-TEI-02__cache", 'ascending_cache.xml', <_:dryed
> order="ascending" ids="{string-join($sorted-ascending, ' ')}"/>),
> db:replace("_qdb-TEI-02__cache", 'descending_cache.xml', <_:dryed
> order="descending" ids="{string-join($sorted-descending, ' ')}"/>),
> db:replace("_qdb-TEI-02__cache", 'ascending-archiv_cache.xml', <_:dryed
> order="ascending" label="archiv"
> ids="{string-join($sorted-ascending-archiv, ' ')}"/>),
> db:replace("_qdb-TEI-02__cache", 'descending-archiv_cache.xml', <_:dryed
> order="descending" label="archiv"
> ids="{string-join($sorted-descending-archiv, ' ')}"/>))
>
> This takes 30 s to about a minute depending on the subsequence I choose.
>
> I did experiments with doing multithreading and not. Multiple jobs or
> fork-join make it worse.
>
> Worst case I need to do it every time I save a change to the original
> DBs for which I maintain that index.
>
> Any ideas how to speed this up?
>
> Best regards
>
> Omar Siam
>
>

[basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending

2019-11-12 Thread Omar Siam


Hi,

I have a custom index that looks like this (one db, different files):

<_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util; 
db_name="z881_qdb-TEI-02n" order="none">
  <_:d pre="15627" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e2" 
vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/>
  <_:d pre="15673" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e21" 
vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/>

...

<_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util; 
db_name="f227_qdb-TEI-02n" order="none">
  <_:d pre="467" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29398" 
vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#944.1 = 
fare0126.eck#1.1"/>
  <_:d pre="591" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29438" 
vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#945.1 = 
fare0126.eck#2.1"/>

...


There are about 2.4 Mio _:d tags in this db.

I need to sort them by the @vutlsk* attributes alphabetically in 
ascending and descending order.


With the code I have now:

declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util;;

let $sorted-ascending := subsequence(for $d in 
collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d

  order by $d/@vutlsk ascending
  return $d/(@ID, @xml:id)/data(), 1, 1)
let $sorted-descending := subsequence(for $d in 
collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d

  order by $d/@vutlsk descending
  return $d/(@ID, @xml:id)/data(), 1, 1)
let $sorted-ascending-archiv := subsequence(for $d in 
collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d

  order by $d/@vutlsk-archiv ascending
  return $d/(@ID, @xml:id)/data(), 1, 1)
let $sorted-descending-archiv := subsequence(for $d in 
collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d

  order by $d/@vutlsk-archiv descending
  return $d/(@ID, @xml:id)/data(), 1, 1)
return (db:replace("_qdb-TEI-02__cache", 'ascending_cache.xml', <_:dryed 
order="ascending" ids="{string-join($sorted-ascending, ' ')}"/>),
db:replace("_qdb-TEI-02__cache", 'descending_cache.xml', <_:dryed 
order="descending" ids="{string-join($sorted-descending, ' ')}"/>),
db:replace("_qdb-TEI-02__cache", 'ascending-archiv_cache.xml', <_:dryed 
order="ascending" label="archiv" 
ids="{string-join($sorted-ascending-archiv, ' ')}"/>),
db:replace("_qdb-TEI-02__cache", 'descending-archiv_cache.xml', <_:dryed 
order="descending" label="archiv" 
ids="{string-join($sorted-descending-archiv, ' ')}"/>))


This takes 30 s to about a minute depending on the subsequence I choose.

I did experiments with doing multithreading and not. Multiple jobs or 
fork-join make it worse.


Worst case I need to do it every time I save a change to the original 
DBs for which I maintain that index.


Any ideas how to speed this up?

Best regards

Omar Siam

Re: [basex-talk] HTTPServer + gzip compression

2019-11-12 Thread Christian Grün

Hi Stefan,

Jetty’s GZIP feature can now be enabled in BaseX [1,2].

Looking forward to your testing feedback,
Christian

[1] http://docs.basex.org/wiki/Options#GZIP
[2] http://files.basex.org/releases/latest/



On Mon, Oct 7, 2019 at 1:40 PM Stefan Koch  wrote:

> Hi Christian,
>
> thx for your reply. Got it :)
> Solution 2 would be cool - maybe feature request?
>
> But I can live with a workaround.
>
> kind regards,
>
> Stefan
>
>
> -Ursprüngliche Nachricht-
> *Von:* Christian Grün [mailto:christian.gr...@gmail.com]
> *Gesendet:* Donnerstag, 3. Oktober 2019 11:51
> *An:* Stefan Koch 
> *Cc:* BaseX 
> *Betreff:* Re: [basex-talk] HTTPServer + gzip compression
>
>
> Hi Stefan,
>
> There’s a StackOverflow entry that has previously been referenced on
> this list (by Michael Seiferle, see [2]). Based on this thread, in
> which Joakim Erdfelt (the magnificent Jetty core developer) explains
> why the existing approaches for enabling GZIP compression don’t work
> anymore, I see three choices:
>
> 1. to wrap all HTTP responses in a GZIP output stream;
> 2. to initialize GZipHandler in our basexhttp code; or
> 3. enable GZIP compression outside BaseX,
>
> Alternative 1 would give us most control, but it would raise new
> questions that would need to be solved. Alternative 2 may be the
> better approach: It only works if basexhttp is used, but we could
> benefit from existing optimizations and tweaks from the Jetty
> implementation [2]. Alternative 3 is already available: You can use
> another light-weight web server as proxy (caddy, nginx), or you can
> start Jetty as described by Joakim.
>
> > Good idea to ask the Jetty devs - I'll try that.
>
> Thanks; feel free to keep us updated,
> Christian
>
> [1] https://stackoverflow.com/questions/38635262/jetty-9-and-gziphandler
> [2] https://www.eclipse.org/jetty/documentation/current/gzip-filter.html
>

[basex-talk] Is there an API that provides XQuery compilation results?

Re: [basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending

Re: [basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending

[basex-talk] I am looking for the fastest way to sort 2.4 Mio tags by two attribute ascending and descending

Re: [basex-talk] HTTPServer + gzip compression

5 matches

Site Navigation

Mail list logo

Footer information