Re: [basex-talk] Binding external variable in GUI

2016-02-15 Thread Christian Grün
> At the risk of a) sounding like a real newbie  b) being accused of not
> Googling for an answer (which I did - but apparently overlooked)
> I am wondering how I can bind an external variable using just XQuery syntax
> in the BaseX GUI.

Just press the icon showing "$x".


>
> I have found how to set the variable via command line options, and how to do
> it in Java, but I merely
> wanted to run a quick interactive test in the GUI  to test an idea of mine
> (before writing a bunch of Java code).
> What I have done is in a tab in the GUI, I have included the module, and
> before running a function, want to set that external variable.
> Surprisingly I found myself at a loss on how to do what appears to be a
> simple task.
>
> So, thanks in advance for any helpand for the rest of this community for
> not laughing at me too loudly
>
> Buddy
>


Re: [basex-talk] Binding external variable in GUI

2016-02-15 Thread Etanchaud Fabrice
Hi Buddy,

Did you try the $x button between the run query and run tests ones ?
Best regards,
Fabrice



De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de 
buddyonweb-softw...@yahoo.com
Envoyé : lundi 15 février 2016 16:53
À : BaseX 
Objet : [basex-talk] Binding external variable in GUI

At the risk of a) sounding like a real newbie  b) being accused of not Googling 
for an answer (which I did - but apparently overlooked)
I am wondering how I can bind an external variable using just XQuery syntax in 
the BaseX GUI.

I have found how to set the variable via command line options, and how to do it 
in Java, but I merely
wanted to run a quick interactive test in the GUI  to test an idea of mine 
(before writing a bunch of Java code).
What I have done is in a tab in the GUI, I have included the module, and before 
running a function, want to set that external variable.
Surprisingly I found myself at a loss on how to do what appears to be a simple 
task.

So, thanks in advance for any helpand for the rest of this community for 
not laughing at me too loudly

Buddy



[basex-talk] Binding external variable in GUI

2016-02-15 Thread buddyonweb-software
At the risk of a) sounding like a real newbie  b) being accused of not Googling 
for an answer (which I did - but apparently overlooked)
I am wondering how I can bind an external variable using just XQuery syntax in 
the BaseX GUI.
I have found how to set the variable via command line options, and how to do it 
in Java, but I merelywanted to run a quick interactive test in the GUI  to test 
an idea of mine (before writing a bunch of Java code).What I have done is in a 
tab in the GUI, I have included the module, and before running a function, want 
to set that external variable.
Surprisingly I found myself at a loss on how to do what appears to be a simple 
task.
So, thanks in advance for any helpand for the rest of this community for 
not laughing at me too loudly
Buddy



Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text

2016-02-15 Thread Christian Grün
> Your second query executes in a fraction of a second - but only because the 
> optimizer successfully rewrites it to use the TEXT index.

Exactly. A single context item reference (".") can be rewritten for
index if the database statistics indicate that the addressed element –
in your case order-id – has no other descendant elements. If it did
have, you would get different results: see e.g.:

  XY = 'XY' → true

In your case, I assume that your database statistics have got outdated
(e.g. due to updates), or there may be order-id elements with elements
as children.


Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text

2016-02-15 Thread Hondros, Constantine (ELS-AMS)
Hello Christian,

Your second query executes in a fraction of a second - but only because the 
optimizer successfully rewrites it to use the TEXT index. When I remove the 
TEXT index so that the optimizer uses db:open-pre(), then the query takes some 
88 minutes to run. (I did improve it to use a direct path).

On the other hand, I did some benchmarking against different sized lookups. 
I've discovered that I am averageing just over 10 lookups per second against 
DB2 using db:open-pre, which is independent of the number of lookups performed 
per query. Presumably this is just an expensive sort of query to repeat 
thousands of times, so the major learning for me is to double-check that 
queries are rewritten against the TEXT index where possible.

Thanks for helping me think this through.

C.

-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 15 February 2016 10:44
To: Hondros, Constantine (ELS-AMS)
Cc: BaseX
Subject: Re: [basex-talk] Spectactularly slow performance with db:open vs. 
db:text

Hi Constantine,

> for $a in (db:open('DB1')/item/order-id) return
>   if (db:open('DB2')//order-id[. = $a]) then
> $a
>   else
> ()

Do some of the order-id elements contain descendant elements?

  db:open('DB1')/item/order-id[*]

If yes, the following query might be faster:

  for $a in (db:open('DB1')/item/order-id)
  return
if (db:open('DB2')//order-id[text() = $a]) then
  $a
else
  ()

Here is another way to rewrite the query:

  for $a in (db:open('DB1')/item/order-id)
  where db:open('DB2')//order-id[text() = $a]
  return $a




>
>
> Note that the optimized query uses db:open-pre to access DB2. When I
> re-write the query myself to use the TEXT index then performance is
> excellent. But why such a difference?
>
>
>
>
>
> Query 2 [returns in 0.3 second]
>
> 
>
>
>
> for $a in (db:open('DB1')/item/order-id)
>
> return
>
>   if (db:text('DB2', $a)/parent::order-id) then
>
> $a
>
>   else
>
>()
>
>
> 
>
> Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The
> Netherlands, Registration No. 33156677, Registered in The Netherlands.



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.


Re: [basex-talk] Benchmarking and caching in BaseX

2016-02-15 Thread Christian Grün
Hi Bram,

Thanks for the summary on your work on Treebank and BaseX!

> The problem that I have encountered is that BaseX seems to
> cache very efficiently. Obviously this is not a problem on production
> websites but for benchmarking it may not be ideal. My first question to you,
> then, is: is it possible to disable caching when testing queries locally?
> And how exactly does BaseX handle the caching? Or more specifically, if I
> enter a query: what is cached, and for how long? This information me be
> useful to analyse our logs with.

You may be surprised to hear that BaseX does not have any particular
caching strategies for queries and query results. Various
optimizations exist for caching IO data on a lower level, though. As
these strategies reach down to the OS and hardware disk access level,
it’s hardly possible to disable all of them. Usually, it’s simply your
main memory that distorts your performance measurements, because the
relevant disk data will only be pulled once from disk as long as
enough main memory is available. Besides that, Java programs are
generally getting faster and faster the longer they are running (due
to Just-in-Time Compilation – JIT)… and so on.

In practice, if you do benchmarking, it’s usually good to “warm up”
your BaseX instance by running various initial queries, and by using
the client/server architecture and e.g. look at the execution time
output by the -v or -V command-line flag. In order to simulate
real-life query patterns, you should run your test queries in random
order, and run a great number of different queries. Moreover, it’s
recommendable to run your queries multiple times and eventually take
the mean or minimum value as result. If this value differs more than
5% when repeating the test, then you should possibly increase the
number of runs.

I hope this helps a bit; I invite you to report back on your experiences,
Christian


Re: [basex-talk] Serialization to jsonml / Whitespace-Handling

2016-02-15 Thread Günter Dunz-Wolff
In my context that code helped a lot:

for $text in .//text()
return replace value of node $text with normalize-space($text)

Thx, Günter

> Am 15.02.2016 um 10:22 schrieb Christian Grün :
> 
>> json:serialize(fn:normalize-space(doc('bettelweib.xml')//*:body), map { 
>> 'format': 'jsonml' })
> 
> Simply calling fn:normalize-space(...) won’t be enough, as it will
> always return a string.
> 
> To give you better help, I’d be pleased if you could provide us a
> little self-contained example. Maybe it’s not necessary if you have a
> closer look at the following example:
> ___
> 
> let $body := Das Bettelweib von
>Locarno.
> let $updated-body := $body update (
>  for $text in .//text()
>  return replace value of node $text with normalize-space($text)
> )
> return json:serialize($updated-body, map { 'format': 'jsonml' })
> ___
> 
> Cheers,
> Christian
> 
> 
>> 
>> I get
>> JSON serializer: Atomic values cannot be serialized
>> 
>> Greetings from Hamburg,
>> Günter
>> 
>>> Am 11.02.2016 um 13:20 schrieb Christian Grün :
>>> 
>>> Hi Günter,
>>> 
>>> Did you try fn:normalize-space?
>>> 
>>> Greetings from Prague,
>>> Christian
>>> 
>>> 
>>> 
>>> On Thu, Feb 11, 2016 at 1:19 PM, Günter Dunz-Wolff
>>>  wrote:
 Hi all, hi Christian
 
 I want to serialize my xml-documents to jsonml like so:
 
 json:serialize(doc('bettelweib.xml')//*:body, map { 'format': 'jsonml' })
 
 Inside the jsonml are lots of strings with unnecessary whitespace like:
 
 "Das Bettelweib von\nLocarno."
 
 How could I remove that whitespace and also \n during or after the 
 serialization?
 
 Thanks for any help.
 
 Regards,
 
 Günter
>>> 
>> 
>> 
> 




[basex-talk] Benchmarking and caching in BaseX

2016-02-15 Thread Bram Vanroy | KU Leuven
Dear all

My name is Bram Vanroy, and I am an intern at the Centre for Computational
Linguistics (CCL; http://www.arts.kuleuven.be/ling/ccl [Dutch]) at the
University of Leuven. My supervisor, Vincent Vandeghinste, has had contact
with this mailing list some time ago, more specifically with Dirk Kirsten.
My intership is titled "Fine-tuning the GrETEL Treebank Query Engine".
GrETEL stands for Greedy Extraction of Trees for Empirical Linguistics;
available at http://gretel.ccl.kuleuven.be/gretel-2.0/. Its goal is to
provide users with a fast, user-friendly on-line tool to search through text
corpora backed by treebanks. Accessibility is an important point for us:
users do not need to be proficient with any programming languages, strict
formalisms, or treebank specific annotations; every query can be executed by
using an intuitive graphical interface. More advanced users can use XPath to
write the representation of the syntactic structure that they are looking
for. BaseX is our tool of choice as a database for our corpora in XML
format.

Initially, GrETEL provided access to smaller corpora such as CGN (9 million
words) and Lassy Small (1 million words). We would like to expand the
searchable corpora by also making the full Sonar corpus available (500
million words). This is already partially possible in GrETEL 2.0 but due to
efficiency reasons, capabilities are restricted: users can only search in
one component at a time, and the largest component in the corpus is not
available due to its size (15 million sentences). We have applied these
restrictions because the search time for the whole corpus was too long,
which in turn would decrease the user-friendliness of the tool drastically.

Steps have already been taken to improve search times in larger corpora.
(See "Making a Large Treebank Searchable Online. The SoNaR Case." by Vincent
Vandeghinste, and Liesbeth Augustinus;
http://nederbooms.ccl.kuleuven.be/documentation/LREC2014-GrETELSoNaR.pdf.)
To spare you the effort to go through the whole article, I hereby quote the
most relevant citation from that article for this email:

 

The general idea behind our approach is to restrict the search space by
splitting up the data in many small databases, allowing for faster retrieval
of syntactic structures. We organise the data in databases that contain all
bottom-up subtrees for which the two top levels (i.e. the root and its
children) adhere to the same syntactic pattern. When querying the database
for certain syntactic constructions, we know on which databases we have to
apply the XPath query which would otherwise have to be applied on the whole
data set. We have called this method GrETEL Indexing (GrInd). (p. 17)

 

So to optimise searching, the data has been pulled apart - in a sense -
which would make the search space smaller and subsequently the search time
shorter. In the future we would like to apply this technique on parallel
corpora as well. We have not tested yet what influence this change has made
to query time which is what I am going to find out during my internship. I
have already analysed the XPath queries that users have made since GrETEL
saw its first user and found that the queries are ten embedded levels deep
at the most, but most are between one and five. The amount of nodes per
query varies between one and 24, but most searches are for structures that
contain between one and eight nodes. Based on this information, I am writing
example XPaths that I am going to pull through BaseX as a sort of benchmark.
I can then compare the query speeds between the split-up corpus, and the
regular one. The problem that I have encountered is that BaseX seems to
cache very efficiently. Obviously this is not a problem on production
websites but for benchmarking it may not be ideal. My first question to you,
then, is: is it possible to disable caching when testing queries locally?
And how exactly does BaseX handle the caching? Or more specifically, if I
enter a query: what is cached, and for how long? This information me be
useful to analyse our logs with.

 

If you have any feedback on GrETEL, or the new approach of GrInding, or if
you have any ideas to improve search time for large corpora - I would love
to hear from you, you can contact me via this email address or on LinkedIn.
I reply to each email as extensively as possible.

 

 

Thank you in advance,

Kind regards

 

Bram Vanroy
https://be.linkedin.com/in/bramvanroy



Re: [basex-talk] Invalid certs

2016-02-15 Thread Christian Grün
Hi Marco,

> My case it's rather the other way around. I'd like to declare the IGNORECERT
> option online for a given call by keeping it disabled by default.

I see. In Java, the decision how certificates are handled is a static
one [1]; this makes it a bit difficult to restrict it to specific
calls (in particular if parallel requests are to be handled). If you
have some idea how this could be done differently in Java, your
feedback is welcome.

Cheers,
Christian

[1] 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/io/IOUrl.java#L172


> This sounds to me like a reasonable use-case being the usual approach to
> this the rejection of invalid certificates.
> In my opinion (and I know it's not up to you) it should be even specified in
> the http:request structure in order to achieve a declarative way of
> expressing it at the finest grain possible.
>
> BTW, my issue was that being the request executed from inside an .xq that
> had to be run from embedded Java code linked with Basex.jar the global
> option was just not usable. At least to my knowledge.
> I circumvent this by sending the script through client:query() to a "nearby"
> Basex server instance which I conveniently restarted with the IGNORECERT in
> the .basex.
>
> Regards,
> Marco.
>
>
> On 12/02/2016 21:59, Christian Grün wrote:
>>
>> Hi Marco,
>>
>>> Is it possible to declare the IGNORECERT option inside a XQuery script
>>> that
>>> will be executed from inside a Java?
>>
>> If I get it right, you enabled this option, but would like to only
>> disable it for your XQuery script? The answer, I guess, is no. As
>> IGNORECERT is a global option, it will only be parsed when BaseX is
>> initialized.
>>
>> Cheers from Prague,
>> Christian
>>
>>
>> On Thu, Feb 11, 2016 at 2:09 PM, Marco Lettere 
>> wrote:
>>>
>>> Hi all,
>>> sorry for this maybe dumb question.
>>>
>>> Thanks,
>>> Marco.
>
>


Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text

2016-02-15 Thread Christian Grün
Hi Constantine,

> for $a in (db:open('DB1')/item/order-id)
> return
>   if (db:open('DB2')//order-id[. = $a]) then
> $a
>   else
> ()

Do some of the order-id elements contain descendant elements?

  db:open('DB1')/item/order-id[*]

If yes, the following query might be faster:

  for $a in (db:open('DB1')/item/order-id)
  return
if (db:open('DB2')//order-id[text() = $a]) then
  $a
else
  ()

Here is another way to rewrite the query:

  for $a in (db:open('DB1')/item/order-id)
  where db:open('DB2')//order-id[text() = $a]
  return $a




>
>
> Note that the optimized query uses db:open-pre to access DB2. When I
> re-write the query myself to use the TEXT index then performance is
> excellent. But why such a difference?
>
>
>
>
>
> Query 2 [returns in 0.3 second]
>
> 
>
>
>
> for $a in (db:open('DB1')/item/order-id)
>
> return
>
>   if (db:text('DB2', $a)/parent::order-id) then
>
> $a
>
>   else
>
>()
>
>
> 
>
> Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The
> Netherlands, Registration No. 33156677, Registered in The Netherlands.


Re: [basex-talk] Invalid certs

2016-02-15 Thread Marco Lettere

Hi Christian,
hope you had fun at XMLPrague!
My case it's rather the other way around. I'd like to declare the 
IGNORECERT option online for a given call by keeping it disabled by default.
This sounds to me like a reasonable use-case being the usual approach to 
this the rejection of invalid certificates.
In my opinion (and I know it's not up to you) it should be even 
specified in the http:request structure in order to achieve a 
declarative way of expressing it at the finest grain possible.


BTW, my issue was that being the request executed from inside an .xq 
that had to be run from embedded Java code linked with Basex.jar the 
global option was just not usable. At least to my knowledge.
I circumvent this by sending the script through client:query() to a 
"nearby" Basex server instance which I conveniently restarted with the 
IGNORECERT in the .basex.


Regards,
Marco.

On 12/02/2016 21:59, Christian Grün wrote:

Hi Marco,


Is it possible to declare the IGNORECERT option inside a XQuery script that
will be executed from inside a Java?

If I get it right, you enabled this option, but would like to only
disable it for your XQuery script? The answer, I guess, is no. As
IGNORECERT is a global option, it will only be parsed when BaseX is
initialized.

Cheers from Prague,
Christian


On Thu, Feb 11, 2016 at 2:09 PM, Marco Lettere  wrote:

Hi all,
sorry for this maybe dumb question.

Thanks,
Marco.




Re: [basex-talk] JavaDoc link?

2016-02-15 Thread Christian Grün
Hi Buddy,

We stopped maintaining JavaDoc, because most people seem to use GitHub
or Eclipse to access the documentation of the source files.

I would be interested in hearing who else is still reading the BaseX
JavaDoc files?

Christian


On Sat, Feb 13, 2016 at 10:49 AM,   wrote:
> I may have overlooked it (and if so, sorry), but I can't seem to find the
> link to the JavaDocs on any of the online documentation pages.
> Can you direct me to the online documentation page that has that link?
>
> Again sorry if it was right in front of me and I missed it.
>
> Thanks,
> Buddy


Re: [basex-talk] metadata

2016-02-15 Thread Christian Grün
Hi Kendall,

One popular approach is to maintain an additional metadata.xml
document, which contains all metadata for the documents stored in a
database. Here is a simple example for maintaining a timestamp:


* Creation:

  db:create('db', , 'metadata.xml')

* Insertion:

  let $path := 'new-doc.xml'
  let $doc := 
  let $meta := element doc {
attribute path { $path },
element timestamp { current-dateTime() }
  }
  return (
   db:add('db', $doc, $path),
   insert node $meta into db:open('db', 'metadata.xml')/meta
  )

* Retrieval:

  let $path := 'new-doc.xml'
  return db:open('db', 'metadata.xml')/meta/doc[@path = $path]

* Deletion:

  let $path := 'new-doc.xml'
  return (
db:delete('db', $path),
delete node db:open('db', 'metadata.xml')/meta/doc[@path = $path]
  )


Cheers,
Christian



On Sat, Feb 13, 2016 at 8:28 PM, Kendall Shaw  wrote:
> Unless there is now a standard way to associate metadata with documents and
> collections, is there a preferred method?
>
> I could try to ensure that there is always a unique id that can be derived
> from combination of an id attribute or element in the document + the
> document’s document uri, then have separate metadata documents that are
> associated with the unique id. Is there an obviously better approach?
>
> Kendall


Re: [basex-talk] JavaDoc link?

2016-02-15 Thread Hondros, Constantine (ELS-AMS)
This is where I tend to get the Javadoc from – though I note it’s a few 
releases out of date.

http://docs.basex.org/javadoc/

C.

From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of 
buddyonweb-softw...@yahoo.com
Sent: 13 February 2016 10:50
To: BaseX
Subject: [basex-talk] JavaDoc link?

I may have overlooked it (and if so, sorry), but I can't seem to find the link 
to the JavaDocs on any of the online documentation pages.
Can you direct me to the online documentation page that has that link?

Again sorry if it was right in front of me and I missed it.

Thanks,
Buddy



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.


[basex-talk] Spectactularly slow performance with db:open vs. db:text

2016-02-15 Thread Hondros, Constantine (ELS-AMS)
Hello Basexers,

I'm getting such a low performance on a relatively simple join between two 
databases that I feel there must be something going wrong here. I can provide 
the sources if necessary, but basically DB1 is 26 MB, about 80,000 small 
documents; DB2 is 47 MB, about 18,500 small documents. I'm using 8.4, by the 
way, haven't tested on other releases.

Query 1 [returns in 144 minutes]


for $a in (db:open('DB1')/item/order-id)
return
  if (db:open('DB2')//order-id[. = $a]) then
$a
  else
()

Note that the optimized query uses db:open-pre to access DB2. When I re-write 
the query myself to use the TEXT index then performance is excellent. But why 
such a difference?


Query 2 [returns in 0.3 second]


for $a in (db:open('DB1')/item/order-id)
return
  if (db:text('DB2', $a)/parent::order-id) then
$a
  else
   ()



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.