Re: [basex-talk] slow search performance across multiple databases

2020-01-15 Thread Christian Grün
Hi Björn, hi Omar,

All filter expressions that yield document nodes in distinct document
order will now be rewritten to path expressions. Feel free to check
out the latest stable snapshot [1].

Christian

[1] http://files.basex.org/releases/latest/



On Wed, Jan 15, 2020 at 9:22 AM Christian Grün
 wrote:
>
> Hi Björn,
>
> > [2] Does not use index
> > declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
> > count(
> >   (# db:enforceindex #) {
> >   let $dbs := ("00", "01")
> >   for $db in $dbs
> >   for $doc in db:open($db)
> >   where $doc//lmerFile:format/text() = 'urn:123'
> >   return $doc
> > })
>
> The index rewriting should take effect if you e.g. rewrite
> db:open($db)/* instead of db:open($db). The background:
>
> * The where clause will be attached to the for expression as predicate
> * As thefor expression is a simple function call, the result will be a
> filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123']
> * If the for expression is a path expression, the where clause will be
> attached to the last predicate of this path. The result will be a path
> again.
> * Only path expressions are rewritten to index access, no filters.
>
> But thanks for your observation. We will check if we can implicitly
> rewrite your function call to a path expression if it’s embedded in an
> iteration like yours.
>
> Best,
> Christian
>
>
> > Von: BaseX-Talk  Im Auftrag von 
> > Omar Siam
> > Gesendet: Dienstag, 14. Januar 2020 17:06
> > An: basex-talk@mailman.uni-konstanz.de
> > Betreff: Re: [basex-talk] slow search performance across multiple databases
> >
> > Hi,
> >
> > I have trouble seeing the original second query.
> >
> > But if I got it correct then the problem is that to have BaseX 
> > automatically rewrite for indexing you have to supply the DB you want to 
> > search in as string literal. That is db:open("DB") and not let $db := "DB" 
> > db:open($db). Or for with multiple dbs for that matter.
> >
> > Maybe I am wrong and the enforceindex pragma solves that now. I have to 
> > admit my code predates the pragma and always uses string literals when 
> > accessing a database.
> >
> > Best regards
> >
> > Omar
> >
> > Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
> > > Dear Christian,
> > >
> > > Thank you for your quick response.
> > >
> > > Could you tell me why the following two queries have a difference in 
> > > behaviour so that the first one apparently uses the text index [1] and 
> > > the second one [2] does not?
> > >
> > > Your second suggestion with db:text is nice but I cannot figure out how 
> > > to efficiently extract other informations like something from the root 
> > > element without crawling the whole database and performing a full table 
> > > scan so to say.
> > >
> > > Regards,
> > > Björn
> > >
> > > [1]
> > > declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (#
> > > db:enforceindex #) {
> > >let $dbs := ("00")
> > >for $db in $dbs
> > >for $doc in db:open($db)
> > >where $doc//lmerFile:format/text() = 
> > > 'urn:diasid:fty:kopal:00'
> > >return $doc
> > > })
> > >
> > > Compiling:
> > > - merge steps: descendant::element(lmerFile:format)
> > > - rewrite for to let: for $db_1 in $dbs_0
> > > - inline $dbs_0
> > > - inline $db_1
> > > - pre-evaluate db:open(database[,path]) to document-node() sequence:
> > > db:open("00") -> (db:open-pre("00", 0), ...)
> > > - apply text index for "urn:123"
> > > - rewrite cached filter to cached path: ((db:open-pre("00", 0), 
> > > ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> 
> > > db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
> > > - rewrite where clause(s)
> > > - simplify FLWOR expression: db:text("00", 
> > > "urn:123")/parent::element(lmerFile:format)/ancest...
> > > Optimized Query:
> > > count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00",
> > > "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc
> > > ument-node()] })
> > > Query:
> > > declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count

Re: [basex-talk] slow search performance across multiple databases

2020-01-15 Thread Christian Grün
Hi Björn,

> [2] Does not use index
> declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
> count(
>   (# db:enforceindex #) {
>   let $dbs := ("00", "01")
>   for $db in $dbs
>   for $doc in db:open($db)
>   where $doc//lmerFile:format/text() = 'urn:123'
>   return $doc
> })

The index rewriting should take effect if you e.g. rewrite
db:open($db)/* instead of db:open($db). The background:

* The where clause will be attached to the for expression as predicate
* As thefor expression is a simple function call, the result will be a
filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123']
* If the for expression is a path expression, the where clause will be
attached to the last predicate of this path. The result will be a path
again.
* Only path expressions are rewritten to index access, no filters.

But thanks for your observation. We will check if we can implicitly
rewrite your function call to a path expression if it’s embedded in an
iteration like yours.

Best,
Christian


> Von: BaseX-Talk  Im Auftrag von 
> Omar Siam
> Gesendet: Dienstag, 14. Januar 2020 17:06
> An: basex-talk@mailman.uni-konstanz.de
> Betreff: Re: [basex-talk] slow search performance across multiple databases
>
> Hi,
>
> I have trouble seeing the original second query.
>
> But if I got it correct then the problem is that to have BaseX automatically 
> rewrite for indexing you have to supply the DB you want to search in as 
> string literal. That is db:open("DB") and not let $db := "DB" db:open($db). 
> Or for with multiple dbs for that matter.
>
> Maybe I am wrong and the enforceindex pragma solves that now. I have to admit 
> my code predates the pragma and always uses string literals when accessing a 
> database.
>
> Best regards
>
> Omar
>
> Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
> > Dear Christian,
> >
> > Thank you for your quick response.
> >
> > Could you tell me why the following two queries have a difference in 
> > behaviour so that the first one apparently uses the text index [1] and the 
> > second one [2] does not?
> >
> > Your second suggestion with db:text is nice but I cannot figure out how to 
> > efficiently extract other informations like something from the root element 
> > without crawling the whole database and performing a full table scan so to 
> > say.
> >
> > Regards,
> > Björn
> >
> > [1]
> > declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (#
> > db:enforceindex #) {
> >let $dbs := ("00")
> >for $db in $dbs
> >for $doc in db:open($db)
> >where $doc//lmerFile:format/text() = 
> > 'urn:diasid:fty:kopal:00'
> >return $doc
> > })
> >
> > Compiling:
> > - merge steps: descendant::element(lmerFile:format)
> > - rewrite for to let: for $db_1 in $dbs_0
> > - inline $dbs_0
> > - inline $db_1
> > - pre-evaluate db:open(database[,path]) to document-node() sequence:
> > db:open("00") -> (db:open-pre("00", 0), ...)
> > - apply text index for "urn:123"
> > - rewrite cached filter to cached path: ((db:open-pre("00", 0), 
> > ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> 
> > db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
> > - rewrite where clause(s)
> > - simplify FLWOR expression: db:text("00", 
> > "urn:123")/parent::element(lmerFile:format)/ancest...
> > Optimized Query:
> > count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00",
> > "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc
> > ument-node()] })
> > Query:
> > declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (#
> > db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in
> > db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return
> > $doc })
> > Result:
> > - Hit(s): 1 Item
> > - Updated: 0 Items
> > - Printed: 4 b
> > - Read Locking: (global)
> > - Write Locking: (none)
> > Timing:
> > - Parsing: 0.26 ms
> > - Compiling: 55.69 ms
> > - Evaluating: 60.66 ms
> > - Printing: 0.37 ms
> > - Total Time: 116.98 ms
> > Query Plan:
> > 
> >
> >  
> >
> >  db:enforceindex
> >
> >
> >  
> >
> >urn:123
> >  
> >   > type="element()?&q

Re: [basex-talk] slow search performance across multiple databases

2020-01-14 Thread Braunschweig , Björn
Dear Omar,

Sorry I forgot to include the second query and just copied the info output.

Here are the two queries again:

[1] Uses index
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
count(
  (# db:enforceindex #) {
  for $doc in db:open("00")
  where $doc//lmerFile:format/text() = 'urn:123'
  return $doc
})

[2] Does not use index
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
count(
  (# db:enforceindex #) {
  let $dbs := ("00", "01")
  for $db in $dbs
  for $doc in db:open($db)
  where $doc//lmerFile:format/text() = 'urn:123'
  return $doc
})

I have either a limited amount of databases I would like to access or all 
databases. Using anything but db:open with a string literal results in no 
indices are being used. Is there a solution to circumvent that problem?

Regards,
Björn


-Ursprüngliche Nachricht-
Von: BaseX-Talk  Im Auftrag von 
Omar Siam
Gesendet: Dienstag, 14. Januar 2020 17:06
An: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] slow search performance across multiple databases

Hi,

I have trouble seeing the original second query.

But if I got it correct then the problem is that to have BaseX automatically 
rewrite for indexing you have to supply the DB you want to search in as string 
literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for 
with multiple dbs for that matter.

Maybe I am wrong and the enforceindex pragma solves that now. I have to admit 
my code predates the pragma and always uses string literals when accessing a 
database.

Best regards

Omar

Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
> Dear Christian,
>
> Thank you for your quick response.
>
> Could you tell me why the following two queries have a difference in 
> behaviour so that the first one apparently uses the text index [1] and the 
> second one [2] does not?
>
> Your second suggestion with db:text is nice but I cannot figure out how to 
> efficiently extract other informations like something from the root element 
> without crawling the whole database and performing a full table scan so to 
> say.
>
> Regards,
> Björn
>
> [1]
> declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
> db:enforceindex #) {
>let $dbs := ("00")
>for $db in $dbs
>for $doc in db:open($db)
>where $doc//lmerFile:format/text() = 
> 'urn:diasid:fty:kopal:00'
>return $doc
> })
>
> Compiling:
> - merge steps: descendant::element(lmerFile:format)
> - rewrite for to let: for $db_1 in $dbs_0
> - inline $dbs_0
> - inline $db_1
> - pre-evaluate db:open(database[,path]) to document-node() sequence: 
> db:open("00") -> (db:open-pre("00", 0), ...)
> - apply text index for "urn:123"
> - rewrite cached filter to cached path: ((db:open-pre("00", 0), 
> ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> 
> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
> - rewrite where clause(s)
> - simplify FLWOR expression: db:text("00", 
> "urn:123")/parent::element(lmerFile:format)/ancest...
> Optimized Query:
> count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", 
> "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc
> ument-node()] })
> Query:
> declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
> db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in 
> db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return 
> $doc })
> Result:
> - Hit(s): 1 Item
> - Updated: 0 Items
> - Printed: 4 b
> - Read Locking: (global)
> - Write Locking: (none)
> Timing:
> - Parsing: 0.26 ms
> - Compiling: 55.69 ms
> - Evaluating: 60.66 ms
> - Printing: 0.37 ms
> - Total Time: 116.98 ms
> Query Plan:
> 
>
>  
>
>  db:enforceindex
>
>
>  
>
>urn:123
>  
>   type="element()?"/>
>  
>
>   type="document-node()?"/>
>
>  
>
>  
>
> 
>
> [2]
>
> Compiling:
> - pre-evaluate expression list to xs:string sequence: ("00", "01")
> - merge steps: descendant::element(lmerFile:format)
> - inline $dbs_0
> - rewrite where clause(s)
> Optimized Query:
> count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in 
> ("00", "01") return 
> (db:open($db_1))[(descendant::element(lmerFile:format)/text() = 
> "urn:12

Re: [basex-talk] slow search performance across multiple databases

2020-01-14 Thread Omar Siam

Hi,

I have trouble seeing the original second query.

But if I got it correct then the problem is that to have BaseX 
automatically rewrite for indexing you have to supply the DB you want to 
search in as string literal. That is db:open("DB") and not let $db := 
"DB" db:open($db). Or for with multiple dbs for that matter.


Maybe I am wrong and the enforceindex pragma solves that now. I have to 
admit my code predates the pragma and always uses string literals when 
accessing a database.


Best regards

Omar

Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:

Dear Christian,

Thank you for your quick response.

Could you tell me why the following two queries have a difference in behaviour 
so that the first one apparently uses the text index [1] and the second one [2] 
does not?

Your second suggestion with db:text is nice but I cannot figure out how to 
efficiently extract other informations like something from the root element 
without crawling the whole database and performing a full table scan so to say.

Regards,
Björn

[1]
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
count(
(# db:enforceindex #) {
   let $dbs := ("00")
   for $db in $dbs
   for $doc in db:open($db)
   where $doc//lmerFile:format/text() = 
'urn:diasid:fty:kopal:00'
   return $doc
})

Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> 
(db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), 
...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query:
count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { 
let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where 
$doc//lmerFile:format/text() = 'urn:123' return $doc })
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:

   
 
   
 db:enforceindex
   
   
 
   
   urn:123
 
 
 
   
 
   
 
   
 
   


[2]

Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query:
count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return 
(db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs 
:= ("00", "01") for $db in $dbs for $doc in db:open($db) where 
$doc//lmerFile:format/text() = 'urn:123' return $doc })
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:

   
 
   
 db:enforceindex
   
   
 
   
 00
 01
   
 
 
   
 
   
   
 
   
   
 
     urn:123
       
     
   
     
   



-Ursprüngliche Nachricht-
Von: Christian Grün 
Gesendet: Montag, 13. Januar 2020 13:07
An: Braunschweig, Björn 
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] slow search performance across multiple databases

Dear Björn,

Our Article on Indexes may give you some hints [1]:

• Via the 'enforceindex' pragma, you can help the optimizer by indicating that 
all of your database have up-to-date index structures.
• With db:text, you can directly access the database indexes and rewrite the 
remaining steps of your query in reverse order:

   db:text($db, 'urn:123')/parent::lmerFile:format/...

Hope this helps
Christian

[1] http://docs.basex.org/wiki/Indexes





On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn 
 wrote:

Hello everyone,

First of all thank you for building such a great product!

Unfortunately I encounter 

Re: [basex-talk] slow search performance across multiple databases

2020-01-14 Thread Braunschweig , Björn
Dear Christian,

Thank you for your quick response.

Could you tell me why the following two queries have a difference in behaviour 
so that the first one apparently uses the text index [1] and the second one [2] 
does not? 

Your second suggestion with db:text is nice but I cannot figure out how to 
efficiently extract other informations like something from the root element 
without crawling the whole database and performing a full table scan so to say.

Regards,
Björn

[1] 
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
count(
(# db:enforceindex #) {
  let $dbs := ("00")
  for $db in $dbs
  for $doc in db:open($db)
  where $doc//lmerFile:format/text() = 
'urn:diasid:fty:kopal:00'
  return $doc
})

Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence: 
db:open("00") -> (db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), 
...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> 
db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query:
count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()]
 })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in 
db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:

  

  
db:enforceindex
  
  

  
  urn:123



  

  

  

  


[2]

Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query:
count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", 
"01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = 
"urn:123")] })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in 
db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:

  

  
db:enforceindex
  
  

  
00
01
  


  

  
  

  
  

urn:123
      
    
  

  



-Ursprüngliche Nachricht-
Von: Christian Grün  
Gesendet: Montag, 13. Januar 2020 13:07
An: Braunschweig, Björn 
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] slow search performance across multiple databases

Dear Björn,

Our Article on Indexes may give you some hints [1]:

• Via the 'enforceindex' pragma, you can help the optimizer by indicating that 
all of your database have up-to-date index structures.
• With db:text, you can directly access the database indexes and rewrite the 
remaining steps of your query in reverse order:

  db:text($db, 'urn:123')/parent::lmerFile:format/...

Hope this helps
Christian

[1] http://docs.basex.org/wiki/Indexes





On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn 
 wrote:
>
> Hello everyone,
>
> First of all thank you for building such a great product!
>
> Unfortunately I encounter extremely slow queries using a basex 
> instance. My xml files are distributed over 256 databases (00-ff) with 
> roughly 250MB each. All share a structure similar to [3]. I executed 
> OPTIMIZE ALL for all databases, see [4]. [5] shows the info index 
> output. The linux server which is running the basex instance shows 
> great amounts of io. I replicated the data to my local desktop and see 
> the same amount of high io usage across all database files. For
> example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
>
> Querying a single database is fast and the info output window shows 
> the following [2], but a query across all dbs just stalls and does not 
> return a result in a reasonable time.
>
>
> Could somebody give me a hint how to speed things up?
>
> regards, Björn


smime.p7s
Description: S/MIME cryptographic signature


Re: [basex-talk] slow search performance across multiple databases

2020-01-13 Thread Christian Grün
Dear Björn,

Our Article on Indexes may give you some hints [1]:

• Via the 'enforceindex' pragma, you can help the optimizer by
indicating that all of your database have up-to-date index structures.
• With db:text, you can directly access the database indexes and
rewrite the remaining steps of your query in reverse order:

  db:text($db, 'urn:123')/parent::lmerFile:format/...

Hope this helps
Christian

[1] http://docs.basex.org/wiki/Indexes





On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn
 wrote:
>
> Hello everyone,
>
> First of all thank you for building such a great product!
>
> Unfortunately I encounter extremely slow queries using a basex instance. My
> xml files are distributed over 256 databases (00-ff)
> with roughly 250MB each. All share a structure similar to [3]. I executed
> OPTIMIZE ALL for all databases, see [4]. [5] shows the
> info index output. The linux server which is running the basex instance
> shows great amounts of io. I replicated the data to my local desktop
> and see the same amount of high io usage across all database files. For
> example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
>
> Querying a single database is fast and the info output window shows the
> following [2], but a query across all dbs just stalls and does not
> return a result in a reasonable time.
>
>
> Could somebody give me a hint how to speed things up?
>
> regards, Björn


[basex-talk] slow search performance across multiple databases

2020-01-13 Thread Braunschweig , Björn
Hello everyone,

First of all thank you for building such a great product!

Unfortunately I encounter extremely slow queries using a basex instance. My
xml files are distributed over 256 databases (00-ff) 
with roughly 250MB each. All share a structure similar to [3]. I executed
OPTIMIZE ALL for all databases, see [4]. [5] shows the 
info index output. The linux server which is running the basex instance
shows great amounts of io. I replicated the data to my local desktop 
and see the same amount of high io usage across all database files. For
example: [data\81\tbl.basex, data\81\txt.basex, ..., etc] 

Querying a single database is fast and the info output window shows the
following [2], but a query across all dbs just stalls and does not 
return a result in a reasonable time.


Could somebody give me a hint how to speed things up?

regards, Björn

[1]

Example query:

```
declare namespace mets = "http://www.loc.gov/METS/;;
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;

count(
for $db in db:list()
for $doc in db:open($db)
where
$doc/koala/mets:mets/mets:amdSec/mets:techMD/mets:mdWrap/mets:xmlData/lmerFi
le:format = 'urn:123'
return $doc)
  
```

[2]
```
Compiling:
- inline $db_0
- pre-evaluate db:open(database[,path]) to document-node() sequence:
db:open("fe") -> (db:open-pre("fe", 0), ...)
- rewrite cached filter to iter path: ((db:open-pre("fe", 0),
...))[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:t
... -> (db:open-pre("fe", 0),
...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:
t...
- rewrite where clause(s)
- simplify FLWOR expression: (db:open-pre("fe", 0),
...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:
t...
Optimized Query:
count((db:open-pre("fe", 0),
...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:
techMD)/element(mets:mdWrap)/element(mets:xmlData)/element(lmerFile:format)
= "urn:123")])
Query:
declare namespace mets = "http://www.loc.gov/METS/;; declare namespace
lmerFile = "http://www.ddb.de/LMERfile;; count( let $db := "fe" for $doc in
db:open($db) where
$doc/koala/mets:mets/mets:amdSec/mets:techMD/mets:mdWrap/mets:xmlData/lmerFi
le:format = 'urn:123' return $doc)
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.41 ms
- Compiling: 161.91 ms
- Evaluating: 2155.26 ms
- Printing: 0.86 ms
- Total Time: 2318.44 ms
Query Plan:

  

  





  
  

  







  
  urn:123

  

  


```


[3]





2016-05-04
13-57-17-943000
urn:nbn:de:xyz

4e1f-82-45c5-37-1678e81f
false
2016-05-04
13-57-17-943000
hdd
179810
1

http://www.loc.gov/METS/
http://www.loc.gov/standards/mets/mets.xsdhttp://www.ddb.de/LMERfile
http://files.dnb.de/standards/lmer/lmer-file.xsd
http://www.ddb.de/LMERobject
http://files.dnb.de/standards/lmer/lmer-object.xsd
http://www.ddb.de/LMERprocess
http://files.dnb.de/standards/lmer/lmer-process.xsd
http://purl.org/dc/elements/1.1/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd;
xmlns="http://www.loc.gov/METS/; xmlns:xlink="http://www.w3.org/1999/xlink;
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
xmlns:lmerFile="http://www.ddb.de/LMERfile;
xmlns:lmerObject="http://www.ddb.de/LMERobject;
xmlns:lmerProcess="http://www.ddb.de/LMERprocess;
xmlns:dc="http://purl.org/dc/elements/1.1/;>
  

  test
  test

  
  

  


urn:nbn:de:test
  ac38416b34a7b1b55875798819220c092a77f3b7

2015-03-23T11:03:19.080+01:00

2015-03-23T11:03:19.080+01:00

test

1

  


  

  urn:123
  
http://hul.harvard.edu/ois/xml/ns/fits/fits_output
http://files.dnb.de/standards/fits/fits_output_dnb.xsd; version="0.6.1"
timestamp="23.03.15 09:24"
xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output;>
  

  
  
  
  
  
  
  1.6