Re: [basex-talk] Func def & performance: element()* vs item()*

2019-04-11 Thread Christian Grün
> You may be interested in my 
> https://github.com/rhdunn/xquery-intellij-plugin/blob/master/docs/XQuery%20IntelliJ%20Plugin%20Data%20Model.md
>  document. It is the result of previous investigations in supporting static 
> type analysis in my XQuery plugin.

Thanks, I’ll definitely have a look at your document.

In BaseX, we don’t have union types; apart from that, our static
typing system should be close to complete (if some types in the query
plan should indicate otherwise, feedback is welcome). For example, we
are iteratively computing the type of built-in higher-order functions
at compile time. The static type of the following function is
xs:decimal+:

  fold-left(1 to 123, 4.5, function($tmp, $curr) {
($curr, $tmp)
  })

And the static type of the next expression is xs:short+, not
xs:integer+ as one might guess (the type derived from the actual types
of the input arguments, which will never yield an xs:integer result
type):

  fold-left((1 to 20) ! xs:byte(.), (), function($tmp, $curr) {
$tmp,
if($curr instance of xs:byte) then xs:short($curr)
else xs:integer($curr)
  })

One missing link is that the static type is not always propagated to
the result (i.e., to the internal result representation, which can be
a plain untyped array of typed items, in particular if the result is
generated by untyped iterators). I think I’ve found a good trade-off
between performance and better runtime typing. In future, the
intersection of the type of the original expression and the resulting
type will be assigned to the resulting value instance.


Re: [basex-talk] Nooby/#CitizenScientist REQUEST for HELP: Python implementation of this XQuery

2019-04-11 Thread Jim Salmons
Hey Liam and BaseX Community,

With Liam's helpful pointers I was able to create a modification of the 
QueryBindExample.py sample script and use the String Template class to do the 
Python version of my machine learning dataset update XQuery expression for 
adding new training image specs to the #MAGAZINEgts ground-truth storage 
document in BaseX! :-)

Here is a link to this sandbox script: 
https://1drv.ms/u/s!AtML1v0eUlpEiJYTjUDhkFmDyXoK1A

The one "hinky" thing in using the String Template approach was the 
"double-take" bit where a string of the name of the XQuery variable, $new_spec 
in this case, is replaced by the Template variable, $query_var, so the Template 
substitution could "put back" the query's original variable as a string, 
'$new_spec', within the constructed XQuery expression to be submitted to BaseX. 
All the other Template substitution values hardcoded in this sandbox 
exploration script will simply be pulled from the advertisement bound box being 
ground-truthed once I work this query into the FactMiners Toolkit.

Thank you, Liam, those pointers had all the info I needed when put together 
with the supplied BaseX Python client API examples to move forward. :D

Happy-Healthy Vibes from Colorado USA,
  -: Jim :-

-Original Message-
From: BaseX-Talk  On Behalf Of Jim 
Salmons
Sent: Wednesday, April 10, 2019 8:30 PM
To: 'Liam R. E. Quin' ; 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Nooby/#CitizenScientist REQUEST for HELP: Python 
implementation of this XQuery

Hi Liam,

Thank you! And not at all confusing in terms of babbleness.

Those links will certainly help with the nitty-gritty of creating the Python 
side of preparing the parameterized string of a new entry in the dataset. I 
will take your thoughts into account when trying to make a next step in this 
"solution discovery" process that I am on.

Still remaining is the basic "harness" of a Python-originated update 
transaction. I'm still trying to bridge the gap between all the fine-grained 
info of the Server Protocol page (http://docs.basex.org/wiki/Server_Protocol) 
and proper transformation of those bits into Python-submitted API calls. (I 
hope this makes sense.)

BTW, in the meantime, I have downloaded Christian Grun et al's 2012 awesome "A 
framework for retrieval and annotation in digital humanities using XQuery full 
text and update in BaseX" (PDF: 
https://ids-pub.bsz-bw.de/frontdoor/index/index/year/2015/docId/3715). I am 
digesting this incredible inspirational resource as fast as I can. :-)

-: Jim :-

[[[snip]]]




Re: [basex-talk] Func def & performance: element()* vs item()*

2019-04-11 Thread Reece Dunn
Hi Christian,

On Thu, 11 Apr 2019 at 13:37, Christian Grün 
wrote:

> Hi Chuck,
>
> Martin already suggested that map construction via map:merge is
> preferable and faster (my personal experience is that there are just
> few cases in which map:put is a better choice).
>
> Your query was an interesting one, though. In various cases, we drop
> type information at runtime, as it can be expensive to decorate all
> newly generated sequences with the correct type. As a result, the type
> of your function arguments is verified every time the function is
> called, and this takes additional time.
>
> But as it’s always recommendable to declare types, and as this is not
> the first time that this is chasing me, I had some more thoughts, and
> I have found a good answer on how to improve generally typing at
> runtime! You can already be sure that your query will benefit from the
> upcoming optimizations, i.e., with BaseX 9.2.
>

You may be interested in my
https://github.com/rhdunn/xquery-intellij-plugin/blob/master/docs/XQuery%20IntelliJ%20Plugin%20Data%20Model.md
document. It is the result of previous investigations in supporting static
type analysis in my XQuery plugin. Specifically:
1.  3.2.1 Item Type Union -- computing the best matching union type of two
item types.
2.  3.2.2 Sequence Type Union -- computing the union of two sequences for
use in disjoint expressions such as the if and else branches of an IfExpr.
3.  3.2.3 Sequence Type Addition -- computing the resulting type that best
matches an Expr.

The advantage of this is that the type information can be computed at
compile time.

I was able to get a basic prototype implementation working for some
expressions, and have tested the logic for the rules in that document. I
haven't worked on this recently, as I have been adding other features to my
plugin.

Kind regards,
Reece

Due to this, and due to some other minor optimizations that are still
> in progress, we decided to delay the release until beginning of next
> week.
>
> Cheers
> Christian
>
>
>
> On Thu, Apr 11, 2019 at 12:10 AM Chuck Bearden 
> wrote:
> >
> > BaseX is a great tool for analyzing & characterizing large amounts of
> > XML data. I have used it both at work and on personal projects. I hope
> > the following observation is useful.
> >
> > When I define a function that recurs over a sequence of elements in
> > order to build a map of element name counts, I find that when I
> > specify the type of the element sequence as 'element()*', the function
> > runs so slowly that I give up after 5 minutes or so. But when I
> > specify the type as 'item()*', it finishes in 40 seconds or less.
> > Here's an example:
> >
> > -begin code snippet-
> > declare namespace local="w00fw00f";
> > declare function local:count($elems as element()*, $elem_counts as
> map(*))
> > as
> map(*) {
> > let $elem := head($elems),
> > $elem_name := $elem/name(),
> > $elems_new := tail($elems),
> > $elem_name_count := if (map:contains($elem_counts, $elem_name))
> > then map:get($elem_counts, $elem_name) + 1
> > else 1,
> > $elem_counts_new := map:put($elem_counts, $elem_name,
> $elem_name_count)
> > return if (count($elems_new) = 0)
> > then $elem_counts_new
> > else local:count($elems_new, $elem_counts_new)
> > };
> >
> > let $coll := collection('pure_20190402'),
> > $elems := $coll/result/items/*,
> > $elem_names_map := local:count($elems, map {})
> > return json:serialize($elem_names_map, map {'format' : 'xquery'})
> > -end code snippet-
> >
> > In the function declaration, changing "$elems as element()*" to
> > "$elems as item()*" makes the difference in performance. Replacing the
> > JSON serialization with a standard XML one does not change the
> > performance. I am running BaseX 9.1.2 under Ubuntu 16.04.6.
> >
> > All the best,
> > Chuck Bearden
>


Re: [basex-talk] Func def & performance: element()* vs item()*

2019-04-11 Thread Christian Grün
Hi Chuck,

Martin already suggested that map construction via map:merge is
preferable and faster (my personal experience is that there are just
few cases in which map:put is a better choice).

Your query was an interesting one, though. In various cases, we drop
type information at runtime, as it can be expensive to decorate all
newly generated sequences with the correct type. As a result, the type
of your function arguments is verified every time the function is
called, and this takes additional time.

But as it’s always recommendable to declare types, and as this is not
the first time that this is chasing me, I had some more thoughts, and
I have found a good answer on how to improve generally typing at
runtime! You can already be sure that your query will benefit from the
upcoming optimizations, i.e., with BaseX 9.2.

Due to this, and due to some other minor optimizations that are still
in progress, we decided to delay the release until beginning of next
week.

Cheers
Christian



On Thu, Apr 11, 2019 at 12:10 AM Chuck Bearden  wrote:
>
> BaseX is a great tool for analyzing & characterizing large amounts of
> XML data. I have used it both at work and on personal projects. I hope
> the following observation is useful.
>
> When I define a function that recurs over a sequence of elements in
> order to build a map of element name counts, I find that when I
> specify the type of the element sequence as 'element()*', the function
> runs so slowly that I give up after 5 minutes or so. But when I
> specify the type as 'item()*', it finishes in 40 seconds or less.
> Here's an example:
>
> -begin code snippet-
> declare namespace local="w00fw00f";
> declare function local:count($elems as element()*, $elem_counts as map(*))
> as map(*) {
> let $elem := head($elems),
> $elem_name := $elem/name(),
> $elems_new := tail($elems),
> $elem_name_count := if (map:contains($elem_counts, $elem_name))
> then map:get($elem_counts, $elem_name) + 1
> else 1,
> $elem_counts_new := map:put($elem_counts, $elem_name, 
> $elem_name_count)
> return if (count($elems_new) = 0)
> then $elem_counts_new
> else local:count($elems_new, $elem_counts_new)
> };
>
> let $coll := collection('pure_20190402'),
> $elems := $coll/result/items/*,
> $elem_names_map := local:count($elems, map {})
> return json:serialize($elem_names_map, map {'format' : 'xquery'})
> -end code snippet-
>
> In the function declaration, changing "$elems as element()*" to
> "$elems as item()*" makes the difference in performance. Replacing the
> JSON serialization with a standard XML one does not change the
> performance. I am running BaseX 9.1.2 under Ubuntu 16.04.6.
>
> All the best,
> Chuck Bearden


Re: [basex-talk] Add & replace documents

2019-04-11 Thread Johannes Bauer

ok, I will run some tests and see if it works and get back to you.

Thanks
Johannes

Am 11.04.2019 um 10:00 schrieb Christian Grün:

Will the iterative approach have a comparable performance? Or are there
some optimizations for directory structures in db:replace().

This always depends on various factors (on the input sizes, the
directory structure, possibly even the file system). If it turns out
that it would be much faster to add a specific option to db:add and
db:replace, we could think about that as well.

Cheers,
Christian




Am 11.04.2019 um 09:33 schrieb Christian Grün:

Hi Johannes,

We allow file paths for db:add and db:replace as a basic convenience
feature. To have full control over your imports, it is often better to
work with the functions provided by the File Module. Herre is a simple
example that might already cover your requirements (provided that the
input directory contains only XML documents):

let $root := '/path/to/your/files/'
for $path in file:list($root, true())
return db:replace('your-db', $path, $root || $path)

Hope this helps,
Christian


On Thu, Apr 11, 2019 at 9:13 AM Johannes Bauer  wrote:

Hi,

this may seem like a dumb question, but how can I add & replace a number
of documents in a database?

I have a directory with some files that I want to import. Some of the
documents may already exist in the database.

If I use the db:replace('db', '', '/path/to/my/files') function it adds
the missing docs and overwrites the existing ones.
But it also deletes all other documents from the database.

I cannot work with db:add() because it does not overwrite existing docs.

Any suggestions?

Best regards
Johannes













Re: [basex-talk] Add & replace documents

2019-04-11 Thread Christian Grün
> Will the iterative approach have a comparable performance? Or are there
> some optimizations for directory structures in db:replace().

This always depends on various factors (on the input sizes, the
directory structure, possibly even the file system). If it turns out
that it would be much faster to add a specific option to db:add and
db:replace, we could think about that as well.

Cheers,
Christian



> Am 11.04.2019 um 09:33 schrieb Christian Grün:
> > Hi Johannes,
> >
> > We allow file paths for db:add and db:replace as a basic convenience
> > feature. To have full control over your imports, it is often better to
> > work with the functions provided by the File Module. Herre is a simple
> > example that might already cover your requirements (provided that the
> > input directory contains only XML documents):
> >
> >let $root := '/path/to/your/files/'
> >for $path in file:list($root, true())
> >return db:replace('your-db', $path, $root || $path)
> >
> > Hope this helps,
> > Christian
> >
> >
> > On Thu, Apr 11, 2019 at 9:13 AM Johannes Bauer  
> > wrote:
> >> Hi,
> >>
> >> this may seem like a dumb question, but how can I add & replace a number
> >> of documents in a database?
> >>
> >> I have a directory with some files that I want to import. Some of the
> >> documents may already exist in the database.
> >>
> >> If I use the db:replace('db', '', '/path/to/my/files') function it adds
> >> the missing docs and overwrites the existing ones.
> >> But it also deletes all other documents from the database.
> >>
> >> I cannot work with db:add() because it does not overwrite existing docs.
> >>
> >> Any suggestions?
> >>
> >> Best regards
> >> Johannes
> >>
> >>
> >>
> >>
> >>
> >>
>
>


Re: [basex-talk] Add & replace documents

2019-04-11 Thread Johannes Bauer

Hello Christian,

thank you for the fast reply.

With your example I will actually make a db:replace() call for each file.

I've choosen the approach with just passing the directory to 
db:replace() for performance reasons.
My experience is that this call is extremely fast even for a large 
number of files (let's say > 30.000).


Will the iterative approach have a comparable performance? Or are there 
some optimizations for directory structures in db:replace().


Thanks
Johannes

Am 11.04.2019 um 09:33 schrieb Christian Grün:

Hi Johannes,

We allow file paths for db:add and db:replace as a basic convenience
feature. To have full control over your imports, it is often better to
work with the functions provided by the File Module. Herre is a simple
example that might already cover your requirements (provided that the
input directory contains only XML documents):

   let $root := '/path/to/your/files/'
   for $path in file:list($root, true())
   return db:replace('your-db', $path, $root || $path)

Hope this helps,
Christian


On Thu, Apr 11, 2019 at 9:13 AM Johannes Bauer  wrote:

Hi,

this may seem like a dumb question, but how can I add & replace a number
of documents in a database?

I have a directory with some files that I want to import. Some of the
documents may already exist in the database.

If I use the db:replace('db', '', '/path/to/my/files') function it adds
the missing docs and overwrites the existing ones.
But it also deletes all other documents from the database.

I cannot work with db:add() because it does not overwrite existing docs.

Any suggestions?

Best regards
Johannes











Re: [basex-talk] Add & replace documents

2019-04-11 Thread Christian Grün
Hi Johannes,

We allow file paths for db:add and db:replace as a basic convenience
feature. To have full control over your imports, it is often better to
work with the functions provided by the File Module. Herre is a simple
example that might already cover your requirements (provided that the
input directory contains only XML documents):

  let $root := '/path/to/your/files/'
  for $path in file:list($root, true())
  return db:replace('your-db', $path, $root || $path)

Hope this helps,
Christian


On Thu, Apr 11, 2019 at 9:13 AM Johannes Bauer  wrote:
>
> Hi,
>
> this may seem like a dumb question, but how can I add & replace a number
> of documents in a database?
>
> I have a directory with some files that I want to import. Some of the
> documents may already exist in the database.
>
> If I use the db:replace('db', '', '/path/to/my/files') function it adds
> the missing docs and overwrites the existing ones.
> But it also deletes all other documents from the database.
>
> I cannot work with db:add() because it does not overwrite existing docs.
>
> Any suggestions?
>
> Best regards
> Johannes
>
>
>
>
>
>


[basex-talk] Add & replace documents

2019-04-11 Thread Johannes Bauer

Hi,

this may seem like a dumb question, but how can I add & replace a number 
of documents in a database?


I have a directory with some files that I want to import. Some of the 
documents may already exist in the database.


If I use the db:replace('db', '', '/path/to/my/files') function it adds 
the missing docs and overwrites the existing ones.

But it also deletes all other documents from the database.

I cannot work with db:add() because it does not overwrite existing docs.

Any suggestions?

Best regards
Johannes








Re: [basex-talk] Func def & performance: element()* vs item()*

2019-04-11 Thread Martin Honnen

Am 11.04.2019 um 00:09 schrieb Chuck Bearden:

BaseX is a great tool for analyzing & characterizing large amounts of
XML data. I have used it both at work and on personal projects. I hope
the following observation is useful.

When I define a function that recurs over a sequence of elements in
order to build a map of element name counts, I find that when I
specify the type of the element sequence as 'element()*', the function
runs so slowly that I give up after 5 minutes or so. But when I
specify the type as 'item()*', it finishes in 40 seconds or less.
Here's an example:

-begin code snippet-
declare namespace local="w00fw00f";
declare function local:count($elems as element()*, $elem_counts as map(*))
 as map(*) {
 let $elem := head($elems),
 $elem_name := $elem/name(),
 $elems_new := tail($elems),
 $elem_name_count := if (map:contains($elem_counts, $elem_name))
 then map:get($elem_counts, $elem_name) + 1
 else 1,
 $elem_counts_new := map:put($elem_counts, $elem_name, $elem_name_count)
 return if (count($elems_new) = 0)
 then $elem_counts_new
 else local:count($elems_new, $elem_counts_new)
};

let $coll := collection('pure_20190402'),
 $elems := $coll/result/items/*,
 $elem_names_map := local:count($elems, map {})



It seems that task to build the map can also be solved with grouping:

let $elem_names_map := map:merge(
for $item in $coll/result/items/*
group by $name := name($item)
return map { $name : count($item) }
)


Not sure whether that improves performance.