Re: [basex-talk] re-sort database

2013-03-13 Thread Wendell Piez
Christian,

Thanks, but I was hoping I could lure you (or someone else) into
suggesting what the syntax for a window clause might look like here!
:-) Since they are something I haven't mastered yet.

(Any takers?)

Cheers, Wendell


On Wed, Mar 13, 2013 at 10:25 AM, Christian Grün
christian.gr...@gmail.com wrote:
 Hi Wendell,

 good point. I agree that there are various ways to answer Cerstin’s
 question. Window clauses should be a good fit here, and should most
 probably provide better performance than requesting the following and
 preceding axes of a node.

 Christian
 ___

 On Wed, Mar 13, 2013 at 3:20 PM, Wendell Piez wap...@wendellpiez.com wrote:
 Christian,

 Alternatively, would this be a place one could use a 3.0 window clause?

 This raises a related question. I have seen a big boost on performance
 when using 'group by' instead of the classic distinct-values-based
 grouping. I suppose this is not surprising. Cerstin's question,
 similarly, is a grouping question, although the grouping is based on
 proximity in document order, not on values. (In XSLT it would be
 addressed using xsl:for-each-group[@group-starting-with].)

 When doing this (or any) sort of grouping, are we generally better off
 using the new 3.0 power features than doing it the old-fashioned way
 by hand? (I imagine that given the size of Cerstin's documents it may
 not be an issue for her, but what if the sequences were long?)

 Cheers, Wendell


 On Tue, Mar 12, 2013 at 2:33 PM, Christian Grün
 christian.gr...@gmail.com wrote:
 Hi Cerstin,

 the following query may help:

   for $entry in $doc//entry
   let $next := $entry/following-sibling::entry[1]
   let $sc := $entry/following-sibling::secondquery
 [empty($next) or .  $next]
   return (
 insert nodes $sc into $entry,
 delete nodes $sc
   )

 Cheers,
 Christian
 ___

 On Tue, Mar 12, 2013 at 5:46 PM, Cerstin Elisabeth Mahlow
 cerstin.mah...@unibas.ch wrote:
 Hi,

 after a lot of data has been gathered, I realized that my update-function 
 has a bug.  It's not a big deal fixing it, however, I don't know how to 
 resort the existing data.

 Essentially, I wanted to create this kind of data:

 collection
 entry
 node123/node
 queryxyz/query
 secondqueryabc_1/secondquery
 secondqueryabc_2/secondquery
 /entry
 entry
 node456/node
 queryxyz/query
 secondqueryabc_1/secondquery
 secondqueryabc_3/secondquery
 secondqueryabc_4/secondquery
 /entry
 /collection


 However, the data looks like this:

 collection
 entry
 node123/node
 queryxyz/query
 /entry
 secondqueryabc_1/secondquery
 secondqueryabc_2/secondquery
 entry
 node456/node
 queryxyz/query
 /entry
 secondqueryabc_1/secondquery
 secondqueryabc_3/secondquery
 secondqueryabc_4/secondquery
 /collection

 So, the secondqueries are stored just after the entry they belong to.  How 
 would I be able to move these data from right after a particular node to 
 just inside this particular node using XQuery Update?

 Thanks in advance and best regards

 Cerstin
 --
 Dr. phil. Cerstin Mahlow

 Universität Basel
 Departement Sprach- und Literaturwissenschaften
 Fachbereich Deutsche Sprach- und Literaturwissenschaft
 Nadelberg 4
 4051 Basel
 Schweiz

 Tel:  +41 61 267 07 65
 Fax: +41 61 267 34 40
 Mail: cerstin.mah...@unibas.ch
 Web: http://www.oldphras.net

 ___
 BaseX-Talk mailing list
 BaseX-Talk@mailman.uni-konstanz.de
 https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
 ___
 BaseX-Talk mailing list
 BaseX-Talk@mailman.uni-konstanz.de
 https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk



 --
 Wendell Piez | http://www.wendellpiez.com
 XML | XSLT | electronic publishing
 Eat Your Vegetables
 _oo_o_o___oooo_^



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_oo_o_o___oooo_^
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] index usage while processing large xml file(s)

2013-03-13 Thread Johannes.Lichtenberger

On 03/13/2013 05:04 PM, EV Narasimham wrote:

Hi,
 I'm currently processing a 3GB xml file. Below is the Query that is
being executed and the corresponding output

count(db:open(ASEPXML,
A_SEPXML)/descendant::*:PmtInf[15]/*:CdtTrfTxInf/*:CdtrAgt/*:FinInstnId/*:BIC/text())
Output: 10


And without specifying the wildcard prefix? The query plan suggests that 
it's not using the text-index. Furthermore I'd probably use:


count(db:open(ASEPXML, 
A_SEPXML)/root/PmtInf[15]/CdtTrfTxInf/CdtrAgt/FinInstnId/BIC/text())


without the descendant-axis step. Without the prefixes however, the 
descendant-axis step should be rewritten to the explicit child steps (I 
guess).


kind regards
Johannes


___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] re-sort database

2013-03-13 Thread Christian Grün
Hi Cerstin,

 However, my real data has about 140 000 of such entries and about 30 000 of 
 such secondqueries, it's all in one database.  Which is probably too big.

true; it may well be that the total amount of update operations is too
large to be processed in a single step. I would advise to try to run
the updates in several steps und trigger several query executions, à
la…

 declare variable $start external := 1;
 declare variable $end external := 1000;

 for $entry in db:open(collection-ws-new.xml)/descendant::entry[position()
= $start to $end]
 let $next := $entry/following-sibling::entry[1]
 let $sc := $entry/following-sibling::secondquery[empty($next) or .  $next]
 return (insert node $sc into $entry, delete nodes $sc)

 After 3320855 ms of execution time (and 3355613 ms for a second attempt) I 
 got the following error message.  Any ideas?

Did you stop the update process, and do you still have the original
data instance?

The error messages indicates that the updatable index structure could
be corrupt. You could try to export your data and create a new
database without updatable index structures; this could also speed up
your updates. Maybe it even allows you to update all nodes in a single
run.

Christian
___



 I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a 
 MacBook Air with a 2 GHz processor and 8 GB RAM.

 Error:
 Improper use? Potential bug? Your feedback is welcome:
 Contact: basex-talk@mailman.uni-konstanz.de
 Version: BaseX 7.6.1 beta
 Java: Apple Inc., 1.6.0_43
 OS: Mac OS X, x86_64

 Stack Trace:
 java.lang.ArrayIndexOutOfBoundsException: 2147483647
   org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:485)
   org.basex.io.random.TableDiskAccess.read5(TableDiskAccess.java:211)
   org.basex.data.Data.textOff(Data.java:422)
   org.basex.data.DiskData.text(DiskData.java:234)
   org.basex.index.value.DiskValues.readKeyAt(DiskValues.java:285)
   org.basex.index.value.DiskValues.get(DiskValues.java:441)
   org.basex.index.value.UpdatableDiskValues.index(UpdatableDiskValues.java:65)
   org.basex.data.DiskData.indexEnd(DiskData.java:355)
   org.basex.data.Data.insert(Data.java:841)
   org.basex.data.atomic.Insert.apply(Insert.java:31)
   
 org.basex.data.atomic.AtomicUpdateList.applyStructuralUpdates(AtomicUpdateList.java:297)
   org.basex.data.atomic.AtomicUpdateList.execute(AtomicUpdateList.java:285)
   org.basex.query.up.DatabaseUpdates.apply(DatabaseUpdates.java:183)
   org.basex.query.up.ContextModifier.apply(ContextModifier.java:90)
   org.basex.query.up.Updates.apply(Updates.java:120)
   org.basex.query.QueryContext.update(QueryContext.java:270)
   org.basex.query.QueryContext.value(QueryContext.java:255)
   org.basex.query.QueryContext.iter(QueryContext.java:240)
   org.basex.query.QueryContext.execute(QueryContext.java:498)
   org.basex.query.QueryProcessor.execute(QueryProcessor.java:96)
   org.basex.core.cmd.AQuery.query(AQuery.java:77)
   org.basex.core.cmd.XQuery.run(XQuery.java:22)
   org.basex.core.Command.run(Command.java:342)
   org.basex.core.Command.exec(Command.java:321)
   org.basex.core.Command.execute(Command.java:78)
   org.basex.gui.GUI.exec(GUI.java:397)
   org.basex.gui.GUI$7.run(GUI.java:349)

 Compiling:
 - simplifying descendant-or-self step(s)

 Optimized Query:
 for $entry in document-node { collection-ws-new.xml }/descendant::entry
 let $next := $entry/following-sibling::entry[1]
 let $sc := $entry/following-sibling::secondquery[(fn:empty($next) or (.  
 $next))]
 return (insert node $sc into $entry, delete nodes $sc)

 --
 Dr. phil. Cerstin Mahlow

 Universität Basel
 Departement Sprach- und Literaturwissenschaften
 Fachbereich Deutsche Sprach- und Literaturwissenschaft
 Nadelberg 4
 4051 Basel
 Schweiz

 Tel:  +41 61 267 07 65
 Fax: +41 61 267 34 40
 Mail: cerstin.mah...@unibas.ch
 Web: http://www.oldphras.net

 ___
 BaseX-Talk mailing list
 BaseX-Talk@mailman.uni-konstanz.de
 https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] re-sort database

2013-03-13 Thread Liam R E Quin
On Wed, 2013-03-13 at 22:29 +0100, Christian Grün wrote:
 Hi Cerstin,
 [...]

  You could try to export your data and create a new
 database without updatable index structures; this could also speed up
 your updates. Maybe it even allows you to update all nodes in a single
 run.

  I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a 
  MacBook Air with a 2 GHz processor and 8 GB RAM.

I'd try using VM=-Xmx6000m if you have 8G of RAM.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk