Re: [basex-talk] Atomization

2018-05-24 Thread Christian Grün
Hi Giuseppe,

Your summary is 100% correct, and surely helpful to other users.

An even better option than using data() is the ENFORCEINDEX option,
which was added with BaseX 9. It allows you to enforce index
rewritings for the whole query, or for specific comparisons or
predicates.

I noticed it was restricted to comparisons with static values, which
is something I have just changed. The following query should be faster
than the original version:

  for $s in db:open("ru_syntagrus-ud-dev")//s//t
  for $d in db:open("UD_Russian-SynTagRus")//case
  where (# db:enforceindex #) { $d/verb_lemma = $s/@l }
and $d//verb_form/@value = $s/@f
and $d/aspect-values/@sign = "yes"
  return $s

I have added some words on this enhancement in the documentation [1].
Feel free to check out the new snapshot [2],
Christian

[1] http://docs.basex.org/wiki/Indexes#Enforce_Rewritings
[2] http://files.basex.org/releases/latest/



On Thu, May 24, 2018 at 11:18 AM, Giuseppe Celano
 wrote:
> Hi  Christian,
>
> Thank you for your help! To summarize (also for the benefit of other users),
> while it is true that in XQuery data($d/aspect-values/@sign) = "yes" and
> $d/aspect-values/@sign = "yes" are equivalent (because of atomization), the
> use of data() enables the user to
> prevent the use of a certain index in BaseX (so this is a BaseX-specific
> feature). Paying attention to how BaseX uses indexes (which can be seen in
> the GUI Info panel) seems to be particularly important when join operations
> between documents are done: as far as I understand, which index and how
> these indexes are used automatically by BaseX cannot be predicted in
> advance, so what one can do is to actually try to use the data() function in
> order to test which index use turns out to be the best (especially when the
> query evaluates slowly).
>
> Is this correct?
>
> Thank you again!
> Giuseppe
>
>
> Universität Leipzig
> Institute of Computer Science, NLP
> Augustusplatz 10
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de
> E-mail: giuseppegacel...@gmail.com
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
> Web site 2: https://sites.google.com/site/giuseppegacelano/
>
> On May 23, 2018, at 2:44 PM, Christian Grün 
> wrote:
>
> Hi Giuseppe,
>
> I think your observation was related to another issue that has already
> been fixed recently. Did you try the latest snapshot [1]?
>
> Btw, in your specific query I noticed that the data() may indeed be
> helpful to suppress the index rewriting for the last condition. As
> it’s the only one that has a static comparison string, it will be the
> one that will be chosen for index access, but for your data, it will
> actually be better if one of the other two conditions will be
> evaluated by the index.
>
> Thanks for the sample documents,
> Christian
>
> PS: 9.0.2 will be available until end of May.
>
> [1] http://files.basex.org/releases/latest/
>
>
>
> On Tue, May 22, 2018 at 5:22 PM, Giuseppe Celano
>  wrote:
>
> I think I have identified a problem with atomization of attribute content
> (no database involved). I have a simple query:
>
> for $s in doc("doc1")//s//t
> for $d in doc("doc2")//case
> where  $d/verb_lemma = $s/@l and $d//verb_form/@value = $s/@f and
> $d/aspect-values/@sign = "yes"
> return
> $s
>
> In order to get a result, I (necessarily) need to use the data() function in
> data($d/aspect-values/@sign) = "yes", otherwise the query never returns a
> result. Is this a bug?
> I would expect that the value of @sign is automatically atomized and
> compared to "yes", but this does not seem the case.
> Thanks.
>
> Ciao,
> Giuseppe
>
> Universität Leipzig
> Institute of Computer Science, NLP
> Augustusplatz 10
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de
> E-mail: giuseppegacel...@gmail.com
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
> Web site 2: https://sites.google.com/site/giuseppegacelano/
>
>
>


Re: [basex-talk] Atomization

2018-05-24 Thread Giuseppe Celano
Hi  Christian,

Thank you for your help! To summarize (also for the benefit of other users), 
while it is true that in XQuery data($d/aspect-values/@sign) = "yes" and 
$d/aspect-values/@sign = "yes" are equivalent (because of atomization), the use 
of data() enables the user to
prevent the use of a certain index in BaseX (so this is a BaseX-specific 
feature). Paying attention to how BaseX uses indexes (which can be seen in the 
GUI Info panel) seems to be particularly important when join operations between 
documents are done: as far as I understand, which index and how these indexes 
are used automatically by BaseX cannot be predicted in advance, so what one can 
do is to actually try to use the data() function in order to test which index 
use turns out to be the best (especially when the query evaluates slowly).

Is this correct?

Thank you again!
Giuseppe 


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On May 23, 2018, at 2:44 PM, Christian Grün  wrote:
> 
> Hi Giuseppe,
> 
> I think your observation was related to another issue that has already
> been fixed recently. Did you try the latest snapshot [1]?
> 
> Btw, in your specific query I noticed that the data() may indeed be
> helpful to suppress the index rewriting for the last condition. As
> it’s the only one that has a static comparison string, it will be the
> one that will be chosen for index access, but for your data, it will
> actually be better if one of the other two conditions will be
> evaluated by the index.
> 
> Thanks for the sample documents,
> Christian
> 
> PS: 9.0.2 will be available until end of May.
> 
> [1] http://files.basex.org/releases/latest/
> 
> 
> 
> On Tue, May 22, 2018 at 5:22 PM, Giuseppe Celano
>  wrote:
>> I think I have identified a problem with atomization of attribute content
>> (no database involved). I have a simple query:
>> 
>> for $s in doc("doc1")//s//t
>> for $d in doc("doc2")//case
>> where  $d/verb_lemma = $s/@l and $d//verb_form/@value = $s/@f and
>> $d/aspect-values/@sign = "yes"
>> return
>> $s
>> 
>> In order to get a result, I (necessarily) need to use the data() function in
>> data($d/aspect-values/@sign) = "yes", otherwise the query never returns a
>> result. Is this a bug?
>> I would expect that the value of @sign is automatically atomized and
>> compared to "yes", but this does not seem the case.
>> Thanks.
>> 
>> Ciao,
>> Giuseppe
>> 
>> Universität Leipzig
>> Institute of Computer Science, NLP
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
> 



Re: [basex-talk] Atomization

2018-05-23 Thread Christian Grün
Hi Giuseppe,

I think your observation was related to another issue that has already
been fixed recently. Did you try the latest snapshot [1]?

Btw, in your specific query I noticed that the data() may indeed be
helpful to suppress the index rewriting for the last condition. As
it’s the only one that has a static comparison string, it will be the
one that will be chosen for index access, but for your data, it will
actually be better if one of the other two conditions will be
evaluated by the index.

Thanks for the sample documents,
Christian

PS: 9.0.2 will be available until end of May.

[1] http://files.basex.org/releases/latest/



On Tue, May 22, 2018 at 5:22 PM, Giuseppe Celano
 wrote:
> I think I have identified a problem with atomization of attribute content
> (no database involved). I have a simple query:
>
> for $s in doc("doc1")//s//t
> for $d in doc("doc2")//case
> where  $d/verb_lemma = $s/@l and $d//verb_form/@value = $s/@f and
> $d/aspect-values/@sign = "yes"
> return
> $s
>
> In order to get a result, I (necessarily) need to use the data() function in
> data($d/aspect-values/@sign) = "yes", otherwise the query never returns a
> result. Is this a bug?
> I would expect that the value of @sign is automatically atomized and
> compared to "yes", but this does not seem the case.
> Thanks.
>
> Ciao,
> Giuseppe
>
> Universität Leipzig
> Institute of Computer Science, NLP
> Augustusplatz 10
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de
> E-mail: giuseppegacel...@gmail.com
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
> Web site 2: https://sites.google.com/site/giuseppegacelano/
>


[basex-talk] Atomization

2018-05-22 Thread Giuseppe Celano
I think I have identified a problem with atomization of attribute content (no 
database involved). I have a simple query:

for $s in doc("doc1")//s//t
for $d in doc("doc2")//case
where  $d/verb_lemma = $s/@l and $d//verb_form/@value = $s/@f and 
$d/aspect-values/@sign = "yes"
return
$s

In order to get a result, I (necessarily) need to use the data() function in 
data($d/aspect-values/@sign) = "yes", otherwise the query never returns a 
result. Is this a bug? 
I would expect that the value of @sign is automatically atomized and compared 
to "yes", but this does not seem the case.
Thanks.

Ciao,
Giuseppe

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/