Re: [basex-talk] Func def & performance: element()* vs item()*
> You may be interested in my > https://github.com/rhdunn/xquery-intellij-plugin/blob/master/docs/XQuery%20IntelliJ%20Plugin%20Data%20Model.md > document. It is the result of previous investigations in supporting static > type analysis in my XQuery plugin. Thanks, I’ll definitely have a look at your document. In BaseX, we don’t have union types; apart from that, our static typing system should be close to complete (if some types in the query plan should indicate otherwise, feedback is welcome). For example, we are iteratively computing the type of built-in higher-order functions at compile time. The static type of the following function is xs:decimal+: fold-left(1 to 123, 4.5, function($tmp, $curr) { ($curr, $tmp) }) And the static type of the next expression is xs:short+, not xs:integer+ as one might guess (the type derived from the actual types of the input arguments, which will never yield an xs:integer result type): fold-left((1 to 20) ! xs:byte(.), (), function($tmp, $curr) { $tmp, if($curr instance of xs:byte) then xs:short($curr) else xs:integer($curr) }) One missing link is that the static type is not always propagated to the result (i.e., to the internal result representation, which can be a plain untyped array of typed items, in particular if the result is generated by untyped iterators). I think I’ve found a good trade-off between performance and better runtime typing. In future, the intersection of the type of the original expression and the resulting type will be assigned to the resulting value instance.
Re: [basex-talk] Nooby/#CitizenScientist REQUEST for HELP: Python implementation of this XQuery
Hey Liam and BaseX Community, With Liam's helpful pointers I was able to create a modification of the QueryBindExample.py sample script and use the String Template class to do the Python version of my machine learning dataset update XQuery expression for adding new training image specs to the #MAGAZINEgts ground-truth storage document in BaseX! :-) Here is a link to this sandbox script: https://1drv.ms/u/s!AtML1v0eUlpEiJYTjUDhkFmDyXoK1A The one "hinky" thing in using the String Template approach was the "double-take" bit where a string of the name of the XQuery variable, $new_spec in this case, is replaced by the Template variable, $query_var, so the Template substitution could "put back" the query's original variable as a string, '$new_spec', within the constructed XQuery expression to be submitted to BaseX. All the other Template substitution values hardcoded in this sandbox exploration script will simply be pulled from the advertisement bound box being ground-truthed once I work this query into the FactMiners Toolkit. Thank you, Liam, those pointers had all the info I needed when put together with the supplied BaseX Python client API examples to move forward. :D Happy-Healthy Vibes from Colorado USA, -: Jim :- -Original Message- From: BaseX-Talk On Behalf Of Jim Salmons Sent: Wednesday, April 10, 2019 8:30 PM To: 'Liam R. E. Quin' ; basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Nooby/#CitizenScientist REQUEST for HELP: Python implementation of this XQuery Hi Liam, Thank you! And not at all confusing in terms of babbleness. Those links will certainly help with the nitty-gritty of creating the Python side of preparing the parameterized string of a new entry in the dataset. I will take your thoughts into account when trying to make a next step in this "solution discovery" process that I am on. Still remaining is the basic "harness" of a Python-originated update transaction. I'm still trying to bridge the gap between all the fine-grained info of the Server Protocol page (http://docs.basex.org/wiki/Server_Protocol) and proper transformation of those bits into Python-submitted API calls. (I hope this makes sense.) BTW, in the meantime, I have downloaded Christian Grun et al's 2012 awesome "A framework for retrieval and annotation in digital humanities using XQuery full text and update in BaseX" (PDF: https://ids-pub.bsz-bw.de/frontdoor/index/index/year/2015/docId/3715). I am digesting this incredible inspirational resource as fast as I can. :-) -: Jim :- [[[snip]]]
Re: [basex-talk] Func def & performance: element()* vs item()*
Hi Christian, On Thu, 11 Apr 2019 at 13:37, Christian Grün wrote: > Hi Chuck, > > Martin already suggested that map construction via map:merge is > preferable and faster (my personal experience is that there are just > few cases in which map:put is a better choice). > > Your query was an interesting one, though. In various cases, we drop > type information at runtime, as it can be expensive to decorate all > newly generated sequences with the correct type. As a result, the type > of your function arguments is verified every time the function is > called, and this takes additional time. > > But as it’s always recommendable to declare types, and as this is not > the first time that this is chasing me, I had some more thoughts, and > I have found a good answer on how to improve generally typing at > runtime! You can already be sure that your query will benefit from the > upcoming optimizations, i.e., with BaseX 9.2. > You may be interested in my https://github.com/rhdunn/xquery-intellij-plugin/blob/master/docs/XQuery%20IntelliJ%20Plugin%20Data%20Model.md document. It is the result of previous investigations in supporting static type analysis in my XQuery plugin. Specifically: 1. 3.2.1 Item Type Union -- computing the best matching union type of two item types. 2. 3.2.2 Sequence Type Union -- computing the union of two sequences for use in disjoint expressions such as the if and else branches of an IfExpr. 3. 3.2.3 Sequence Type Addition -- computing the resulting type that best matches an Expr. The advantage of this is that the type information can be computed at compile time. I was able to get a basic prototype implementation working for some expressions, and have tested the logic for the rules in that document. I haven't worked on this recently, as I have been adding other features to my plugin. Kind regards, Reece Due to this, and due to some other minor optimizations that are still > in progress, we decided to delay the release until beginning of next > week. > > Cheers > Christian > > > > On Thu, Apr 11, 2019 at 12:10 AM Chuck Bearden > wrote: > > > > BaseX is a great tool for analyzing & characterizing large amounts of > > XML data. I have used it both at work and on personal projects. I hope > > the following observation is useful. > > > > When I define a function that recurs over a sequence of elements in > > order to build a map of element name counts, I find that when I > > specify the type of the element sequence as 'element()*', the function > > runs so slowly that I give up after 5 minutes or so. But when I > > specify the type as 'item()*', it finishes in 40 seconds or less. > > Here's an example: > > > > -begin code snippet- > > declare namespace local="w00fw00f"; > > declare function local:count($elems as element()*, $elem_counts as > map(*)) > > as > map(*) { > > let $elem := head($elems), > > $elem_name := $elem/name(), > > $elems_new := tail($elems), > > $elem_name_count := if (map:contains($elem_counts, $elem_name)) > > then map:get($elem_counts, $elem_name) + 1 > > else 1, > > $elem_counts_new := map:put($elem_counts, $elem_name, > $elem_name_count) > > return if (count($elems_new) = 0) > > then $elem_counts_new > > else local:count($elems_new, $elem_counts_new) > > }; > > > > let $coll := collection('pure_20190402'), > > $elems := $coll/result/items/*, > > $elem_names_map := local:count($elems, map {}) > > return json:serialize($elem_names_map, map {'format' : 'xquery'}) > > -end code snippet- > > > > In the function declaration, changing "$elems as element()*" to > > "$elems as item()*" makes the difference in performance. Replacing the > > JSON serialization with a standard XML one does not change the > > performance. I am running BaseX 9.1.2 under Ubuntu 16.04.6. > > > > All the best, > > Chuck Bearden >
Re: [basex-talk] Func def & performance: element()* vs item()*
Hi Chuck, Martin already suggested that map construction via map:merge is preferable and faster (my personal experience is that there are just few cases in which map:put is a better choice). Your query was an interesting one, though. In various cases, we drop type information at runtime, as it can be expensive to decorate all newly generated sequences with the correct type. As a result, the type of your function arguments is verified every time the function is called, and this takes additional time. But as it’s always recommendable to declare types, and as this is not the first time that this is chasing me, I had some more thoughts, and I have found a good answer on how to improve generally typing at runtime! You can already be sure that your query will benefit from the upcoming optimizations, i.e., with BaseX 9.2. Due to this, and due to some other minor optimizations that are still in progress, we decided to delay the release until beginning of next week. Cheers Christian On Thu, Apr 11, 2019 at 12:10 AM Chuck Bearden wrote: > > BaseX is a great tool for analyzing & characterizing large amounts of > XML data. I have used it both at work and on personal projects. I hope > the following observation is useful. > > When I define a function that recurs over a sequence of elements in > order to build a map of element name counts, I find that when I > specify the type of the element sequence as 'element()*', the function > runs so slowly that I give up after 5 minutes or so. But when I > specify the type as 'item()*', it finishes in 40 seconds or less. > Here's an example: > > -begin code snippet- > declare namespace local="w00fw00f"; > declare function local:count($elems as element()*, $elem_counts as map(*)) > as map(*) { > let $elem := head($elems), > $elem_name := $elem/name(), > $elems_new := tail($elems), > $elem_name_count := if (map:contains($elem_counts, $elem_name)) > then map:get($elem_counts, $elem_name) + 1 > else 1, > $elem_counts_new := map:put($elem_counts, $elem_name, > $elem_name_count) > return if (count($elems_new) = 0) > then $elem_counts_new > else local:count($elems_new, $elem_counts_new) > }; > > let $coll := collection('pure_20190402'), > $elems := $coll/result/items/*, > $elem_names_map := local:count($elems, map {}) > return json:serialize($elem_names_map, map {'format' : 'xquery'}) > -end code snippet- > > In the function declaration, changing "$elems as element()*" to > "$elems as item()*" makes the difference in performance. Replacing the > JSON serialization with a standard XML one does not change the > performance. I am running BaseX 9.1.2 under Ubuntu 16.04.6. > > All the best, > Chuck Bearden
Re: [basex-talk] Add & replace documents
ok, I will run some tests and see if it works and get back to you. Thanks Johannes Am 11.04.2019 um 10:00 schrieb Christian Grün: Will the iterative approach have a comparable performance? Or are there some optimizations for directory structures in db:replace(). This always depends on various factors (on the input sizes, the directory structure, possibly even the file system). If it turns out that it would be much faster to add a specific option to db:add and db:replace, we could think about that as well. Cheers, Christian Am 11.04.2019 um 09:33 schrieb Christian Grün: Hi Johannes, We allow file paths for db:add and db:replace as a basic convenience feature. To have full control over your imports, it is often better to work with the functions provided by the File Module. Herre is a simple example that might already cover your requirements (provided that the input directory contains only XML documents): let $root := '/path/to/your/files/' for $path in file:list($root, true()) return db:replace('your-db', $path, $root || $path) Hope this helps, Christian On Thu, Apr 11, 2019 at 9:13 AM Johannes Bauer wrote: Hi, this may seem like a dumb question, but how can I add & replace a number of documents in a database? I have a directory with some files that I want to import. Some of the documents may already exist in the database. If I use the db:replace('db', '', '/path/to/my/files') function it adds the missing docs and overwrites the existing ones. But it also deletes all other documents from the database. I cannot work with db:add() because it does not overwrite existing docs. Any suggestions? Best regards Johannes
Re: [basex-talk] Add & replace documents
> Will the iterative approach have a comparable performance? Or are there > some optimizations for directory structures in db:replace(). This always depends on various factors (on the input sizes, the directory structure, possibly even the file system). If it turns out that it would be much faster to add a specific option to db:add and db:replace, we could think about that as well. Cheers, Christian > Am 11.04.2019 um 09:33 schrieb Christian Grün: > > Hi Johannes, > > > > We allow file paths for db:add and db:replace as a basic convenience > > feature. To have full control over your imports, it is often better to > > work with the functions provided by the File Module. Herre is a simple > > example that might already cover your requirements (provided that the > > input directory contains only XML documents): > > > >let $root := '/path/to/your/files/' > >for $path in file:list($root, true()) > >return db:replace('your-db', $path, $root || $path) > > > > Hope this helps, > > Christian > > > > > > On Thu, Apr 11, 2019 at 9:13 AM Johannes Bauer > > wrote: > >> Hi, > >> > >> this may seem like a dumb question, but how can I add & replace a number > >> of documents in a database? > >> > >> I have a directory with some files that I want to import. Some of the > >> documents may already exist in the database. > >> > >> If I use the db:replace('db', '', '/path/to/my/files') function it adds > >> the missing docs and overwrites the existing ones. > >> But it also deletes all other documents from the database. > >> > >> I cannot work with db:add() because it does not overwrite existing docs. > >> > >> Any suggestions? > >> > >> Best regards > >> Johannes > >> > >> > >> > >> > >> > >> > >
Re: [basex-talk] Add & replace documents
Hello Christian, thank you for the fast reply. With your example I will actually make a db:replace() call for each file. I've choosen the approach with just passing the directory to db:replace() for performance reasons. My experience is that this call is extremely fast even for a large number of files (let's say > 30.000). Will the iterative approach have a comparable performance? Or are there some optimizations for directory structures in db:replace(). Thanks Johannes Am 11.04.2019 um 09:33 schrieb Christian Grün: Hi Johannes, We allow file paths for db:add and db:replace as a basic convenience feature. To have full control over your imports, it is often better to work with the functions provided by the File Module. Herre is a simple example that might already cover your requirements (provided that the input directory contains only XML documents): let $root := '/path/to/your/files/' for $path in file:list($root, true()) return db:replace('your-db', $path, $root || $path) Hope this helps, Christian On Thu, Apr 11, 2019 at 9:13 AM Johannes Bauer wrote: Hi, this may seem like a dumb question, but how can I add & replace a number of documents in a database? I have a directory with some files that I want to import. Some of the documents may already exist in the database. If I use the db:replace('db', '', '/path/to/my/files') function it adds the missing docs and overwrites the existing ones. But it also deletes all other documents from the database. I cannot work with db:add() because it does not overwrite existing docs. Any suggestions? Best regards Johannes
Re: [basex-talk] Add & replace documents
Hi Johannes, We allow file paths for db:add and db:replace as a basic convenience feature. To have full control over your imports, it is often better to work with the functions provided by the File Module. Herre is a simple example that might already cover your requirements (provided that the input directory contains only XML documents): let $root := '/path/to/your/files/' for $path in file:list($root, true()) return db:replace('your-db', $path, $root || $path) Hope this helps, Christian On Thu, Apr 11, 2019 at 9:13 AM Johannes Bauer wrote: > > Hi, > > this may seem like a dumb question, but how can I add & replace a number > of documents in a database? > > I have a directory with some files that I want to import. Some of the > documents may already exist in the database. > > If I use the db:replace('db', '', '/path/to/my/files') function it adds > the missing docs and overwrites the existing ones. > But it also deletes all other documents from the database. > > I cannot work with db:add() because it does not overwrite existing docs. > > Any suggestions? > > Best regards > Johannes > > > > > >
[basex-talk] Add & replace documents
Hi, this may seem like a dumb question, but how can I add & replace a number of documents in a database? I have a directory with some files that I want to import. Some of the documents may already exist in the database. If I use the db:replace('db', '', '/path/to/my/files') function it adds the missing docs and overwrites the existing ones. But it also deletes all other documents from the database. I cannot work with db:add() because it does not overwrite existing docs. Any suggestions? Best regards Johannes