Re: [basex-talk] re-sort database
Christian, Thanks, but I was hoping I could lure you (or someone else) into suggesting what the syntax for a window clause might look like here! :-) Since they are something I haven't mastered yet. (Any takers?) Cheers, Wendell On Wed, Mar 13, 2013 at 10:25 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Wendell, good point. I agree that there are various ways to answer Cerstin’s question. Window clauses should be a good fit here, and should most probably provide better performance than requesting the following and preceding axes of a node. Christian ___ On Wed, Mar 13, 2013 at 3:20 PM, Wendell Piez wap...@wendellpiez.com wrote: Christian, Alternatively, would this be a place one could use a 3.0 window clause? This raises a related question. I have seen a big boost on performance when using 'group by' instead of the classic distinct-values-based grouping. I suppose this is not surprising. Cerstin's question, similarly, is a grouping question, although the grouping is based on proximity in document order, not on values. (In XSLT it would be addressed using xsl:for-each-group[@group-starting-with].) When doing this (or any) sort of grouping, are we generally better off using the new 3.0 power features than doing it the old-fashioned way by hand? (I imagine that given the size of Cerstin's documents it may not be an issue for her, but what if the sequences were long?) Cheers, Wendell On Tue, Mar 12, 2013 at 2:33 PM, Christian Grün christian.gr...@gmail.com wrote: Hi Cerstin, the following query may help: for $entry in $doc//entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery [empty($next) or . $next] return ( insert nodes $sc into $entry, delete nodes $sc ) Cheers, Christian ___ On Tue, Mar 12, 2013 at 5:46 PM, Cerstin Elisabeth Mahlow cerstin.mah...@unibas.ch wrote: Hi, after a lot of data has been gathered, I realized that my update-function has a bug. It's not a big deal fixing it, however, I don't know how to resort the existing data. Essentially, I wanted to create this kind of data: collection entry node123/node queryxyz/query secondqueryabc_1/secondquery secondqueryabc_2/secondquery /entry entry node456/node queryxyz/query secondqueryabc_1/secondquery secondqueryabc_3/secondquery secondqueryabc_4/secondquery /entry /collection However, the data looks like this: collection entry node123/node queryxyz/query /entry secondqueryabc_1/secondquery secondqueryabc_2/secondquery entry node456/node queryxyz/query /entry secondqueryabc_1/secondquery secondqueryabc_3/secondquery secondqueryabc_4/secondquery /collection So, the secondqueries are stored just after the entry they belong to. How would I be able to move these data from right after a particular node to just inside this particular node using XQuery Update? Thanks in advance and best regards Cerstin -- Dr. phil. Cerstin Mahlow Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mah...@unibas.ch Web: http://www.oldphras.net ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk -- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _oo_o_o___oooo_^ -- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _oo_o_o___oooo_^ ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] index usage while processing large xml file(s)
On 03/13/2013 05:04 PM, EV Narasimham wrote: Hi, I'm currently processing a 3GB xml file. Below is the Query that is being executed and the corresponding output count(db:open(ASEPXML, A_SEPXML)/descendant::*:PmtInf[15]/*:CdtTrfTxInf/*:CdtrAgt/*:FinInstnId/*:BIC/text()) Output: 10 And without specifying the wildcard prefix? The query plan suggests that it's not using the text-index. Furthermore I'd probably use: count(db:open(ASEPXML, A_SEPXML)/root/PmtInf[15]/CdtTrfTxInf/CdtrAgt/FinInstnId/BIC/text()) without the descendant-axis step. Without the prefixes however, the descendant-axis step should be rewritten to the explicit child steps (I guess). kind regards Johannes ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] re-sort database
Hi Cerstin, However, my real data has about 140 000 of such entries and about 30 000 of such secondqueries, it's all in one database. Which is probably too big. true; it may well be that the total amount of update operations is too large to be processed in a single step. I would advise to try to run the updates in several steps und trigger several query executions, à la… declare variable $start external := 1; declare variable $end external := 1000; for $entry in db:open(collection-ws-new.xml)/descendant::entry[position() = $start to $end] let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery[empty($next) or . $next] return (insert node $sc into $entry, delete nodes $sc) After 3320855 ms of execution time (and 3355613 ms for a second attempt) I got the following error message. Any ideas? Did you stop the update process, and do you still have the original data instance? The error messages indicates that the updatable index structure could be corrupt. You could try to export your data and create a new database without updatable index structures; this could also speed up your updates. Maybe it even allows you to update all nodes in a single run. Christian ___ I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a MacBook Air with a 2 GHz processor and 8 GB RAM. Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.6.1 beta Java: Apple Inc., 1.6.0_43 OS: Mac OS X, x86_64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 2147483647 org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:485) org.basex.io.random.TableDiskAccess.read5(TableDiskAccess.java:211) org.basex.data.Data.textOff(Data.java:422) org.basex.data.DiskData.text(DiskData.java:234) org.basex.index.value.DiskValues.readKeyAt(DiskValues.java:285) org.basex.index.value.DiskValues.get(DiskValues.java:441) org.basex.index.value.UpdatableDiskValues.index(UpdatableDiskValues.java:65) org.basex.data.DiskData.indexEnd(DiskData.java:355) org.basex.data.Data.insert(Data.java:841) org.basex.data.atomic.Insert.apply(Insert.java:31) org.basex.data.atomic.AtomicUpdateList.applyStructuralUpdates(AtomicUpdateList.java:297) org.basex.data.atomic.AtomicUpdateList.execute(AtomicUpdateList.java:285) org.basex.query.up.DatabaseUpdates.apply(DatabaseUpdates.java:183) org.basex.query.up.ContextModifier.apply(ContextModifier.java:90) org.basex.query.up.Updates.apply(Updates.java:120) org.basex.query.QueryContext.update(QueryContext.java:270) org.basex.query.QueryContext.value(QueryContext.java:255) org.basex.query.QueryContext.iter(QueryContext.java:240) org.basex.query.QueryContext.execute(QueryContext.java:498) org.basex.query.QueryProcessor.execute(QueryProcessor.java:96) org.basex.core.cmd.AQuery.query(AQuery.java:77) org.basex.core.cmd.XQuery.run(XQuery.java:22) org.basex.core.Command.run(Command.java:342) org.basex.core.Command.exec(Command.java:321) org.basex.core.Command.execute(Command.java:78) org.basex.gui.GUI.exec(GUI.java:397) org.basex.gui.GUI$7.run(GUI.java:349) Compiling: - simplifying descendant-or-self step(s) Optimized Query: for $entry in document-node { collection-ws-new.xml }/descendant::entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery[(fn:empty($next) or (. $next))] return (insert node $sc into $entry, delete nodes $sc) -- Dr. phil. Cerstin Mahlow Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mah...@unibas.ch Web: http://www.oldphras.net ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] re-sort database
On Wed, 2013-03-13 at 22:29 +0100, Christian Grün wrote: Hi Cerstin, [...] You could try to export your data and create a new database without updatable index structures; this could also speed up your updates. Maybe it even allows you to update all nodes in a single run. I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a MacBook Air with a 2 GHz processor and 8 GB RAM. I'd try using VM=-Xmx6000m if you have 8G of RAM. Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org freenode/#xml ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk