Re: [basex-talk] slow search performance across multiple databases
Dear Omar, Sorry I forgot to include the second query and just copied the info output. Here are the two queries again: [1] Uses index declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { for $doc in db:open("00") where $doc//lmerFile:format/text() = 'urn:123' return $doc }) [2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) I have either a limited amount of databases I would like to access or all databases. Using anything but db:open with a string literal results in no indices are being used. Is there a solution to circumvent that problem? Regards, Björn -Ursprüngliche Nachricht- Von: BaseX-Talk Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases Hi, I have trouble seeing the original second query. But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter. Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database. Best regards Omar Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn: > Dear Christian, > > Thank you for your quick response. > > Could you tell me why the following two queries have a difference in > behaviour so that the first one apparently uses the text index [1] and the > second one [2] does not? > > Your second suggestion with db:text is nice but I cannot figure out how to > efficiently extract other informations like something from the root element > without crawling the whole database and performing a full table scan so to > say. > > Regards, > Björn > > [1] > declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# > db:enforceindex #) { >let $dbs := ("00") >for $db in $dbs >for $doc in db:open($db) >where $doc//lmerFile:format/text() = > 'urn:diasid:fty:kopal:00' >return $doc > }) > > Compiling: > - merge steps: descendant::element(lmerFile:format) > - rewrite for to let: for $db_1 in $dbs_0 > - inline $dbs_0 > - inline $db_1 > - pre-evaluate db:open(database[,path]) to document-node() sequence: > db:open("00") -> (db:open-pre("00", 0), ...) > - apply text index for "urn:123" > - rewrite cached filter to cached path: ((db:open-pre("00", 0), > ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> > db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... > - rewrite where clause(s) > - simplify FLWOR expression: db:text("00", > "urn:123")/parent::element(lmerFile:format)/ancest... > Optimized Query: > count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", > "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc > ument-node()] }) > Query: > declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# > db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in > db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return > $doc }) > Result: > - Hit(s): 1 Item > - Updated: 0 Items > - Printed: 4 b > - Read Locking: (global) > - Write Locking: (none) > Timing: > - Parsing: 0.26 ms > - Compiling: 55.69 ms > - Evaluating: 60.66 ms > - Printing: 0.37 ms > - Total Time: 116.98 ms > Query Plan: > > > > > db:enforceindex > > > > >urn:123 > > type="element()?"/> > > > type="document-node()?"/> > > > > > > > > [2] > > Compiling: > - pre-evaluate expression list to xs:string sequence: ("00", "01") > - merge steps: descendant::element(lmerFile:format) > - inline $dbs_0 > - rewrite where clause(s) > Optimized Query: > count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in > ("00", "01") return > (db:open($db_1))[(descendant::element(lmerFile:format)/text() = > "urn:123")] }) > Query: > declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# > db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc > in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return > $doc }) > Result: > - Hit(s): 1 Item > - Updated: 0 Items > - Printed: 4 b > - Read Locking: (global) > - Write Locking: (none) > Timing: > - Parsing: 0.27 ms > - Compiling: 0.54 ms > - Evaluating: 1778.21 ms > - Printing: 0.31 ms > - Total Time: 1779.33 ms > Query Plan: > > > > > db:enforceindex > > > >
Re: [basex-talk] slow search performance across multiple databases
Hi, I have trouble seeing the original second query. But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter. Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database. Best regards Omar Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn: Dear Christian, Thank you for your quick response. Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not? Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say. Regards, Björn [1] declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:00' return $doc }) Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: db:enforceindex urn:123 [2] Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: db:enforceindex 00 01 urn:123 -Ursprüngliche Nachricht- Von: Christian Grün Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases Dear Björn, Our Article on Indexes may give you some hints [1]: • Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order: db:text($db, 'urn:123')/parent::lmerFile:format/... Hope this helps Christian [1] http://docs.basex.org/wiki/Indexes On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn wrote: Hello everyone, First of all thank you for building such a great product! Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info
Re: [basex-talk] slow search performance across multiple databases
Dear Christian, Thank you for your quick response. Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not? Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say. Regards, Björn [1] declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:00' return $doc }) Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: db:enforceindex urn:123 [2] Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile;; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: db:enforceindex 00 01 urn:123 -Ursprüngliche Nachricht- Von: Christian Grün Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases Dear Björn, Our Article on Indexes may give you some hints [1]: • Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order: db:text($db, 'urn:123')/parent::lmerFile:format/... Hope this helps Christian [1] http://docs.basex.org/wiki/Indexes On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn wrote: > > Hello everyone, > > First of all thank you for building such a great product! > > Unfortunately I encounter extremely slow queries using a basex > instance. My xml files are distributed over 256 databases (00-ff) with > roughly 250MB each. All share a structure similar to [3]. I executed > OPTIMIZE ALL for all databases, see [4]. [5] shows the info index > output. The linux server which is running the basex instance shows > great amounts of io. I replicated the data to my local desktop and see > the same amount of high io usage across all database files. For > example: [data\81\tbl.basex, data\81\txt.basex, ..., etc] > > Querying a single database is fast and the info output window shows > the following [2], but a query across all dbs just stalls and does not > return a result in a reasonable time. > > > Could somebody give me a hint how to speed things up? > > regards, Björn smime.p7s Description: S/MIME cryptographic signature