Re: [basex-talk] slow search performance across multiple databases

2020-01-14 Thread Braunschweig , Björn
Dear Omar,

Sorry I forgot to include the second query and just copied the info output.

Here are the two queries again:

[1] Uses index
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
count(
  (# db:enforceindex #) {
  for $doc in db:open("00")
  where $doc//lmerFile:format/text() = 'urn:123'
  return $doc
})

[2] Does not use index
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
count(
  (# db:enforceindex #) {
  let $dbs := ("00", "01")
  for $db in $dbs
  for $doc in db:open($db)
  where $doc//lmerFile:format/text() = 'urn:123'
  return $doc
})

I have either a limited amount of databases I would like to access or all 
databases. Using anything but db:open with a string literal results in no 
indices are being used. Is there a solution to circumvent that problem?

Regards,
Björn


-Ursprüngliche Nachricht-
Von: BaseX-Talk  Im Auftrag von 
Omar Siam
Gesendet: Dienstag, 14. Januar 2020 17:06
An: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] slow search performance across multiple databases

Hi,

I have trouble seeing the original second query.

But if I got it correct then the problem is that to have BaseX automatically 
rewrite for indexing you have to supply the DB you want to search in as string 
literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for 
with multiple dbs for that matter.

Maybe I am wrong and the enforceindex pragma solves that now. I have to admit 
my code predates the pragma and always uses string literals when accessing a 
database.

Best regards

Omar

Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
> Dear Christian,
>
> Thank you for your quick response.
>
> Could you tell me why the following two queries have a difference in 
> behaviour so that the first one apparently uses the text index [1] and the 
> second one [2] does not?
>
> Your second suggestion with db:text is nice but I cannot figure out how to 
> efficiently extract other informations like something from the root element 
> without crawling the whole database and performing a full table scan so to 
> say.
>
> Regards,
> Björn
>
> [1]
> declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
> db:enforceindex #) {
>let $dbs := ("00")
>for $db in $dbs
>for $doc in db:open($db)
>where $doc//lmerFile:format/text() = 
> 'urn:diasid:fty:kopal:00'
>return $doc
> })
>
> Compiling:
> - merge steps: descendant::element(lmerFile:format)
> - rewrite for to let: for $db_1 in $dbs_0
> - inline $dbs_0
> - inline $db_1
> - pre-evaluate db:open(database[,path]) to document-node() sequence: 
> db:open("00") -> (db:open-pre("00", 0), ...)
> - apply text index for "urn:123"
> - rewrite cached filter to cached path: ((db:open-pre("00", 0), 
> ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> 
> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
> - rewrite where clause(s)
> - simplify FLWOR expression: db:text("00", 
> "urn:123")/parent::element(lmerFile:format)/ancest...
> Optimized Query:
> count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", 
> "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc
> ument-node()] })
> Query:
> declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
> db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in 
> db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return 
> $doc })
> Result:
> - Hit(s): 1 Item
> - Updated: 0 Items
> - Printed: 4 b
> - Read Locking: (global)
> - Write Locking: (none)
> Timing:
> - Parsing: 0.26 ms
> - Compiling: 55.69 ms
> - Evaluating: 60.66 ms
> - Printing: 0.37 ms
> - Total Time: 116.98 ms
> Query Plan:
> 
>
>  
>
>  db:enforceindex
>
>
>  
>
>urn:123
>  
>   type="element()?"/>
>  
>
>   type="document-node()?"/>
>
>  
>
>  
>
> 
>
> [2]
>
> Compiling:
> - pre-evaluate expression list to xs:string sequence: ("00", "01")
> - merge steps: descendant::element(lmerFile:format)
> - inline $dbs_0
> - rewrite where clause(s)
> Optimized Query:
> count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in 
> ("00", "01") return 
> (db:open($db_1))[(descendant::element(lmerFile:format)/text() = 
> "urn:123")] })
> Query:
> declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
> db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc 
> in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return 
> $doc })
> Result:
> - Hit(s): 1 Item
> - Updated: 0 Items
> - Printed: 4 b
> - Read Locking: (global)
> - Write Locking: (none)
> Timing:
> - Parsing: 0.27 ms
> - Compiling: 0.54 ms
> - Evaluating: 1778.21 ms
> - Printing: 0.31 ms
> - Total Time: 1779.33 ms
> Query Plan:
> 
>
>  
>
>  db:enforceindex
>
>
>  
>   

Re: [basex-talk] slow search performance across multiple databases

2020-01-14 Thread Omar Siam

Hi,

I have trouble seeing the original second query.

But if I got it correct then the problem is that to have BaseX 
automatically rewrite for indexing you have to supply the DB you want to 
search in as string literal. That is db:open("DB") and not let $db := 
"DB" db:open($db). Or for with multiple dbs for that matter.


Maybe I am wrong and the enforceindex pragma solves that now. I have to 
admit my code predates the pragma and always uses string literals when 
accessing a database.


Best regards

Omar

Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:

Dear Christian,

Thank you for your quick response.

Could you tell me why the following two queries have a difference in behaviour 
so that the first one apparently uses the text index [1] and the second one [2] 
does not?

Your second suggestion with db:text is nice but I cannot figure out how to 
efficiently extract other informations like something from the root element 
without crawling the whole database and performing a full table scan so to say.

Regards,
Björn

[1]
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
count(
(# db:enforceindex #) {
   let $dbs := ("00")
   for $db in $dbs
   for $doc in db:open($db)
   where $doc//lmerFile:format/text() = 
'urn:diasid:fty:kopal:00'
   return $doc
})

Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> 
(db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), 
...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query:
count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { 
let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where 
$doc//lmerFile:format/text() = 'urn:123' return $doc })
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:

   
 
   
 db:enforceindex
   
   
 
   
   urn:123
 
 
 
   
 
   
 
   
 
   


[2]

Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query:
count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return 
(db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs 
:= ("00", "01") for $db in $dbs for $doc in db:open($db) where 
$doc//lmerFile:format/text() = 'urn:123' return $doc })
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:

   
 
   
 db:enforceindex
   
   
 
   
 00
 01
   
 
 
   
 
   
   
 
   
   
 
 urn:123
   
 
   
 
   



-Ursprüngliche Nachricht-
Von: Christian Grün 
Gesendet: Montag, 13. Januar 2020 13:07
An: Braunschweig, Björn 
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] slow search performance across multiple databases

Dear Björn,

Our Article on Indexes may give you some hints [1]:

• Via the 'enforceindex' pragma, you can help the optimizer by indicating that 
all of your database have up-to-date index structures.
• With db:text, you can directly access the database indexes and rewrite the 
remaining steps of your query in reverse order:

   db:text($db, 'urn:123')/parent::lmerFile:format/...

Hope this helps
Christian

[1] http://docs.basex.org/wiki/Indexes





On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn 
 wrote:

Hello everyone,

First of all thank you for building such a great product!

Unfortunately I encounter extremely slow queries using a basex
instance. My xml files are distributed over 256 databases (00-ff) with
roughly 250MB each. All share a structure similar to [3]. I executed
OPTIMIZE ALL for all databases, see [4]. [5] shows the info 

Re: [basex-talk] slow search performance across multiple databases

2020-01-14 Thread Braunschweig , Björn
Dear Christian,

Thank you for your quick response.

Could you tell me why the following two queries have a difference in behaviour 
so that the first one apparently uses the text index [1] and the second one [2] 
does not? 

Your second suggestion with db:text is nice but I cannot figure out how to 
efficiently extract other informations like something from the root element 
without crawling the whole database and performing a full table scan so to say.

Regards,
Björn

[1] 
declare namespace lmerFile = "http://www.ddb.de/LMERfile;;
count(
(# db:enforceindex #) {
  let $dbs := ("00")
  for $db in $dbs
  for $doc in db:open($db)
  where $doc//lmerFile:format/text() = 
'urn:diasid:fty:kopal:00'
  return $doc
})

Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence: 
db:open("00") -> (db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), 
...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> 
db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query:
count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", 
"urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()]
 })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in 
db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:

  

  
db:enforceindex
  
  

  
  urn:123



  

  

  

  


[2]

Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query:
count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", 
"01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = 
"urn:123")] })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# 
db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in 
db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:

  

  
db:enforceindex
  
  

  
00
01
  


  

  
  

  
  

urn:123
  

  

  



-Ursprüngliche Nachricht-
Von: Christian Grün  
Gesendet: Montag, 13. Januar 2020 13:07
An: Braunschweig, Björn 
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] slow search performance across multiple databases

Dear Björn,

Our Article on Indexes may give you some hints [1]:

• Via the 'enforceindex' pragma, you can help the optimizer by indicating that 
all of your database have up-to-date index structures.
• With db:text, you can directly access the database indexes and rewrite the 
remaining steps of your query in reverse order:

  db:text($db, 'urn:123')/parent::lmerFile:format/...

Hope this helps
Christian

[1] http://docs.basex.org/wiki/Indexes





On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn 
 wrote:
>
> Hello everyone,
>
> First of all thank you for building such a great product!
>
> Unfortunately I encounter extremely slow queries using a basex 
> instance. My xml files are distributed over 256 databases (00-ff) with 
> roughly 250MB each. All share a structure similar to [3]. I executed 
> OPTIMIZE ALL for all databases, see [4]. [5] shows the info index 
> output. The linux server which is running the basex instance shows 
> great amounts of io. I replicated the data to my local desktop and see 
> the same amount of high io usage across all database files. For
> example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
>
> Querying a single database is fast and the info output window shows 
> the following [2], but a query across all dbs just stalls and does not 
> return a result in a reasonable time.
>
>
> Could somebody give me a hint how to speed things up?
>
> regards, Björn


smime.p7s
Description: S/MIME cryptographic signature