Re: [basex-talk] java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder

2017-09-18 Thread Liam R. E. Quin
On Mon, 2017-09-18 at 23:31 +, Kendall Shaw wrote:
> [...] maybe using the oracle JDK would work.

Thank you for such a quick answer - it wasn't what i did, but it helped
me find the problem.

It seems an OS upgrade had upgraded the Java runtime, and that class is
now neither provided nor called... but (and i should have mentioned
this) my BaseX server was still running... i was worried about
restarting the server in case it wouldn't work, until i resolved the
problem, but in fact, restarting the server resolved it.


> Kendall
> 
> On 9/18/17, 4:06 PM, "basex-talk-boun...@mailman.uni-konstanz.de
> on behalf of Liam R. E. Quin"  z.de on behalf of l...@w3.org> wrote:
> 
> It seems with the latest Java 1.8 -
> java-1.8.0-openjdk-headless-1.8.0.144-0.b01.el7_4.x86_64
> on Centos 7, I can no longer drop a database, any ideas?
> 
> This is with both 8.5.3 and 8.6.6, and also with
> the latest snapshot, BaseX867-20170824.195627.zip
> 
> [[
> $ bin/basexclient -p 1994
> Username: admin
> Password: 
> BaseX 8.6.6 [Client]
> Try 'help' to get more information.
> > open rdf
> Database 'rdf' was opened in 1.61 ms.
> > close
> Database 'rdf' was closed.
> > drop db rdf
> Improper use? Potential bug? Your feedback is welcome:
> Contact: basex-talk@mailman.uni-konstanz.de
> Version: BaseX 8.5.3
> Java: Oracle Corporation, 1.8.0_131
> OS: Linux, amd64
> Stack Trace: 
> java.lang.NoClassDefFoundError: Could not initialize class
> java.nio.file.FileSystems$DefaultFileSystemHolder
>   at
> java.nio.file.FileSystems.getDefault(FileSystems.java:176)
>   at java.nio.file.Paths.get(Paths.java:84)
>   at org.basex.io.IOFile.toPath(IOFile.java:335)
>   at org.basex.io.IOFile.delete(IOFile.java:243)
>   at org.basex.io.IOFile.delete(IOFile.java:240)
>   at org.basex.core.cmd.DropDB.drop(DropDB.java:77)
>   at org.basex.core.cmd.DropDB.run(DropDB.java:46)
>   at org.basex.core.Command.run(Command.java:253)
>   at org.basex.core.Command.execute(Command.java:99)
>   at
> org.basex.server.ClientListener.run(ClientListener.java:136)
> 
> > open rdf
> Database 'rdf' was opened in 1.56 ms.
> > xquery count(//image)
> 7370
> Query executed in 68.2 ms.
> ]]
> 
> 
> 
> -- 
> Liam Quin, W3C, https://urldefense.proofpoint.com/v2/url?u=ht
> tp-
> 3A__www.w3.org_People_Quin_=DwICaQ=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i
> 9ijVXllEdOozc=JgwnBEpN1c-DDmq-
> Up2QMq9rrGyfWK0KtSpT7dxRglA=nQjzwq1-
> tGGjgX2B1ykqHFTvIIAsUM7apaVuSdPVWkk=0s_TznyqfelNVXekmL3_FHwI3_ASwud
> gsnLjh_k5nDM= 
> Staff contact for Verifiable Claims WG, XQuery WG
> 
> Web slave for https://urldefense.proofpoint.com/v2/url?u=http
> -3A__www.fromoldbooks.org_=DwICaQ=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9
> ijVXllEdOozc=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA=nQjzwq1-
> tGGjgX2B1ykqHFTvIIAsUM7apaVuSdPVWkk=9CUbgtTPIfEhIOd76gJHf8kbpM6WP9l
> LIRPFG7YGLkc= 
> 
> 
> 
> 
-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, SVG WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] Issue with Full Text Retrieval

2017-09-18 Thread Ron Katriel
Hi Christian,

Yes, this helps. By index rewritings, are you referring to the indices
created when FTINDEX is set to true?

Thanks,
Ron

On September 18, 2017 at 11:12:54 AM, Christian Grün (
christian.gr...@gmail.com) wrote:

Hi Ron,

With mixed-content, it can be beneficial if element boundaries are
ignored. An example:

Hello world!
contains text 'hello'

If you set the CHOP option to false before creating a database,
whitespaces will be included in your database. As Fabrice has pointed
out, however, it is usually better to directly address the text nodes
of your database; otherwise, you won’t be able to benefit from the
index rewritings.

Hope this helps,
Christian



On Mon, Sep 11, 2017 at 4:59 PM, Ron Katriel  wrote:
> Thanks Fabrice and Michael. Solution (1) works great!
>
> A parting question: why not make the default behavior when querying the
> textual representation of a document to not “chop” away critical word
> boundary delimiters? So, in the example below it would return
>
> XQuery
> and XPAth are awesome
>
> The munging together of "XPAth" and “are” seems counter intuitive to me.
>
> Best,
> Ron
>
> On September 11, 2017 at 4:13:54 AM, Michael Seiferle (m...@basex.org)
wrote:
>
> Hi Ron,
> Hi Fabrice,
>
> Your observation w.r.t. to element boundaries is right, the document is
> converted to a textual representation, by default it returns all nodes in
> their string representation:
>
> $doc :=
>
> 
> XQuery
> <_>and XPAth
> <_>are awesome
> /data()
>
> Will turn to:
>
>
> XQuery
> and XPAthare awesome
>
>
> So:
>
> $doc contains text { 'XPath‘ }
>
>
> will return false.
>
> You have 3.5 options:
>
> 1) => as Fabrice showed, query the individual text nodes
>
> 2) use the ft:search() Function to query the index directly,
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Full-2DText-5FModule-23ft-3Asearch=DwIFaQ=fi2D4-9xMzmjyjREwHYlAw=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE=n_ahruJkCgxM-EH4-m0dMIKL305fX-u2hwEeRQfL_v4=3ALZg_foDFZOpL2OY8SZS_E053zSfBiBcqtQ7Fl98m4=
>
> ft:search(
> 'CTGovDebug',
> 'neoplasms'
> )/.. (: get parent element for the matching text()-node
>
>
> 3) disable chopping when creating the database,
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Options-23XML-5FParsing=DwIFaQ=fi2D4-9xMzmjyjREwHYlAw=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE=n_ahruJkCgxM-EH4-m0dMIKL305fX-u2hwEeRQfL_v4=dUP3VlR3Skm4sDb5U1tQAo0eK2Fc3xbgFNsl41XZ-Lc=
>
> db:create(
> 'CTGovDebug',
> "Path/to/NCT00473512.xml",
> "NCT00473512.xml",
>
> map {
> 'ftindex': true(),
> 'chop': false()
> })
>
>
> 3.5) use the xml:space="preserve“ attribute to tell the parser not to
chop
> child nodes of  when creating a database:
>
> 
> 
> 
> ClinicalTrials.gov processed this data on August 31,
> 2017
> Link to the current ClinicalTrials.gov record.
>
>
>
> Hope this helped shed some light :-)
>
> Best from Konstanz
> Michael
> --
> Michael Seiferle, BaseX GmbH,
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.basexgmbh.de=DwIFaQ=fi2D4-9xMzmjyjREwHYlAw=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE=n_ahruJkCgxM-EH4-m0dMIKL305fX-u2hwEeRQfL_v4=DUaqsc-g-lnjiBM_qG1YH2IUb0rNL0CwOYYzSbcXoM4=
> |-- Firmensitz: Obere Laube 73, 78462 Konstanz
> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
> | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
> `-- Tel: +49 7531 916 82 77
>
> Am 11.09.2017 um 09:35 schrieb Fabrice ETANCHAUD
> :
>
> Hello Ron,
>
> I don’t know how ft operators behave on document nodes.
> Supposing documents are converted to their data() representation, Your
query
> would yield the same negative answer.
> You should consider applying ft operators on text nodes like this :
>
> for $trial in db:open('NCT00473512')//text() (:
> [clinical_study/id_info/nct_id='NCT00473512'] :)
> return $trial[. contains text { 'neoplasms' }]
>
> Best regards,
> Fabrice Etanchaud
>
>
> De : basex-talk-boun...@mailman.uni-konstanz.de
> [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Ron
> Katriel
> Envoyé : lundi 11 septembre 2017 00:42
> À : BaseX
> Objet : [basex-talk] Issue with Full Text Retrieval
>
> Hi,
>
> I am seeing strange behavior with Full Text retrieval. The following
query
> fails for a number of words that are in the XML document (see attached):
>
> for $trial in db:open('CTGovDebug)' (:
> [clinical_study/id_info/nct_id='NCT00473512'] :)
> return $trial contains text { 'neoplasms' }
>
> It fails on a good number of words including neoplasms, cougar, industry,
> yes, completed, november, 2005, interventional, single, male, female,
> assignment, none, research, principal, primary, secondary, age, years,
> gender, etc. But it matches most of the words in the file.
>
> Observation: The words that fail are located at the beginning and/or end
of
> the text and do not occur anywhere else in the middle of any text.
>
> The document is the only one in the database. It does not make a

Re: [basex-talk] java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder

2017-09-18 Thread Kendall Shaw
Or build basex using openjdk.

On 9/18/17, 4:30 PM, "basex-talk-boun...@mailman.uni-konstanz.de on behalf of 
Kendall Shaw"  wrote:

I don’t know if this helps. It could be because of openjdk vs oracle. In 
FileSystems.java in the oracle JDK getDefault loads:

sun.nio.fs.DefaultFileSystemProvider.create()

I imagine that openjdk wouldn’t use that, perhaps. So, maybe using the 
oracle JDK would work.

Kendall

On 9/18/17, 4:06 PM, "basex-talk-boun...@mailman.uni-konstanz.de on behalf 
of Liam R. E. Quin"  wrote:

It seems with the latest Java 1.8 -
java-1.8.0-openjdk-headless-1.8.0.144-0.b01.el7_4.x86_64
on Centos 7, I can no longer drop a database, any ideas?

This is with both 8.5.3 and 8.6.6, and also with
the latest snapshot, BaseX867-20170824.195627.zip

[[
$ bin/basexclient -p 1994
Username: admin
Password: 
BaseX 8.6.6 [Client]
Try 'help' to get more information.
> open rdf
Database 'rdf' was opened in 1.61 ms.
> close
Database 'rdf' was closed.
> drop db rdf
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.5.3
Java: Oracle Corporation, 1.8.0_131
OS: Linux, amd64
Stack Trace: 
java.lang.NoClassDefFoundError: Could not initialize class 
java.nio.file.FileSystems$DefaultFileSystemHolder
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at java.nio.file.Paths.get(Paths.java:84)
at org.basex.io.IOFile.toPath(IOFile.java:335)
at org.basex.io.IOFile.delete(IOFile.java:243)
at org.basex.io.IOFile.delete(IOFile.java:240)
at org.basex.core.cmd.DropDB.drop(DropDB.java:77)
at org.basex.core.cmd.DropDB.run(DropDB.java:46)
at org.basex.core.Command.run(Command.java:253)
at org.basex.core.Command.execute(Command.java:99)
at org.basex.server.ClientListener.run(ClientListener.java:136)

> open rdf
Database 'rdf' was opened in 1.56 ms.
> xquery count(//image)
7370
Query executed in 68.2 ms.
]]



-- 
Liam Quin, W3C, 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_People_Quin_=DwICaQ=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA=nQjzwq1-tGGjgX2B1ykqHFTvIIAsUM7apaVuSdPVWkk=0s_TznyqfelNVXekmL3_FHwI3_ASwudgsnLjh_k5nDM=
 
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.fromoldbooks.org_=DwICaQ=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA=nQjzwq1-tGGjgX2B1ykqHFTvIIAsUM7apaVuSdPVWkk=9CUbgtTPIfEhIOd76gJHf8kbpM6WP9lLIRPFG7YGLkc=
 






Re: [basex-talk] java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder

2017-09-18 Thread Kendall Shaw
I don’t know if this helps. It could be because of openjdk vs oracle. In 
FileSystems.java in the oracle JDK getDefault loads:

sun.nio.fs.DefaultFileSystemProvider.create()

I imagine that openjdk wouldn’t use that, perhaps. So, maybe using the oracle 
JDK would work.

Kendall

On 9/18/17, 4:06 PM, "basex-talk-boun...@mailman.uni-konstanz.de on behalf of 
Liam R. E. Quin"  wrote:

It seems with the latest Java 1.8 -
java-1.8.0-openjdk-headless-1.8.0.144-0.b01.el7_4.x86_64
on Centos 7, I can no longer drop a database, any ideas?

This is with both 8.5.3 and 8.6.6, and also with
the latest snapshot, BaseX867-20170824.195627.zip

[[
$ bin/basexclient -p 1994
Username: admin
Password: 
BaseX 8.6.6 [Client]
Try 'help' to get more information.
> open rdf
Database 'rdf' was opened in 1.61 ms.
> close
Database 'rdf' was closed.
> drop db rdf
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.5.3
Java: Oracle Corporation, 1.8.0_131
OS: Linux, amd64
Stack Trace: 
java.lang.NoClassDefFoundError: Could not initialize class 
java.nio.file.FileSystems$DefaultFileSystemHolder
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at java.nio.file.Paths.get(Paths.java:84)
at org.basex.io.IOFile.toPath(IOFile.java:335)
at org.basex.io.IOFile.delete(IOFile.java:243)
at org.basex.io.IOFile.delete(IOFile.java:240)
at org.basex.core.cmd.DropDB.drop(DropDB.java:77)
at org.basex.core.cmd.DropDB.run(DropDB.java:46)
at org.basex.core.Command.run(Command.java:253)
at org.basex.core.Command.execute(Command.java:99)
at org.basex.server.ClientListener.run(ClientListener.java:136)

> open rdf
Database 'rdf' was opened in 1.56 ms.
> xquery count(//image)
7370
Query executed in 68.2 ms.
]]



-- 
Liam Quin, W3C, 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_People_Quin_=DwICaQ=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA=nQjzwq1-tGGjgX2B1ykqHFTvIIAsUM7apaVuSdPVWkk=0s_TznyqfelNVXekmL3_FHwI3_ASwudgsnLjh_k5nDM=
 
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.fromoldbooks.org_=DwICaQ=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA=nQjzwq1-tGGjgX2B1ykqHFTvIIAsUM7apaVuSdPVWkk=9CUbgtTPIfEhIOd76gJHf8kbpM6WP9lLIRPFG7YGLkc=
 




[basex-talk] java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder

2017-09-18 Thread Liam R. E. Quin
It seems with the latest Java 1.8 -
java-1.8.0-openjdk-headless-1.8.0.144-0.b01.el7_4.x86_64
on Centos 7, I can no longer drop a database, any ideas?

This is with both 8.5.3 and 8.6.6, and also with
the latest snapshot, BaseX867-20170824.195627.zip

[[
$ bin/basexclient -p 1994
Username: admin
Password: 
BaseX 8.6.6 [Client]
Try 'help' to get more information.
> open rdf
Database 'rdf' was opened in 1.61 ms.
> close
Database 'rdf' was closed.
> drop db rdf
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.5.3
Java: Oracle Corporation, 1.8.0_131
OS: Linux, amd64
Stack Trace: 
java.lang.NoClassDefFoundError: Could not initialize class 
java.nio.file.FileSystems$DefaultFileSystemHolder
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at java.nio.file.Paths.get(Paths.java:84)
at org.basex.io.IOFile.toPath(IOFile.java:335)
at org.basex.io.IOFile.delete(IOFile.java:243)
at org.basex.io.IOFile.delete(IOFile.java:240)
at org.basex.core.cmd.DropDB.drop(DropDB.java:77)
at org.basex.core.cmd.DropDB.run(DropDB.java:46)
at org.basex.core.Command.run(Command.java:253)
at org.basex.core.Command.execute(Command.java:99)
at org.basex.server.ClientListener.run(ClientListener.java:136)

> open rdf
Database 'rdf' was opened in 1.56 ms.
> xquery count(//image)
7370
Query executed in 68.2 ms.
]]



-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] A few general questions about BaseX

2017-09-18 Thread Christian Grün
Hi Anastasiou (and thanks Bridger),

>> BaseX fails with a message along the lines “This is too big for one
>> database”.
>>
>> 1)  Are there any logs, beyond the DB logs? If yes, where can I find
>> them?
>>
> I'm not sure how to enable more verbose logging with the GUI -- hopefully
> one of the devs or power users can weigh in on this.

You can enable the debugging mode, e.g. by entering "set debug true"
in the GUI command input panel on top. If the returned feedback does
not help, it would be interested in the exact error message you get
(because there may be several reasons why the input is too large for a
single database).


>> 2)  The parser options include reading XML files from archives, which
>> is very convenient, but once the file has been
>> parsed, does BaseX require the “originals” for queries / returning
>> results?
>>
> AFAIK, no it does not. BaseX will query and return results from the internal
> database(s).

Exactly!


>> 3)  Is it possible to do federation with BaseX? In other words, let’s
>> say I split a database in two large parts (as per #1),
>> is it possible to launch two baseX servers and then have them talk to each
>> other so that ultimately I just query one of
>> them and get back unified results?
>
> AFAIK, the preferred method is to split your files across many databases,
> then query multiple databases from a single expression[1]. Others will be
> able to speak to this better, but I don't think there's a straightforward
> way to run multiple BaseX servers in a single JVM.

Exactly!


Re: [basex-talk] Issue with Full Text Retrieval

2017-09-18 Thread Christian Grün
Hi Ron,

With mixed-content, it can be beneficial if element boundaries are
ignored. An example:

   Hello world!
 contains text 'hello'

If you set the CHOP option to false before creating a database,
whitespaces will be included in your database. As Fabrice has pointed
out, however, it is usually better to directly address the text nodes
of your database; otherwise, you won’t be able to benefit from the
index rewritings.

Hope this helps,
Christian



On Mon, Sep 11, 2017 at 4:59 PM, Ron Katriel  wrote:
> Thanks Fabrice and Michael. Solution (1) works great!
>
> A parting question: why not make the default behavior when querying the
> textual representation of a document to not “chop” away critical word
> boundary delimiters? So, in the example below it would return
>
>   XQuery
>   and XPAth are   awesome
>
> The munging together of "XPAth" and “are” seems counter intuitive to me.
>
> Best,
> Ron
>
> On September 11, 2017 at 4:13:54 AM, Michael Seiferle (m...@basex.org) wrote:
>
> Hi Ron,
> Hi Fabrice,
>
> Your observation w.r.t. to element boundaries is right, the document is
> converted to a textual representation, by default it returns all nodes in
> their string representation:
>
> $doc :=
>
> 
>   XQuery
>   <_>and XPAth
>   <_>are   awesome
> /data()
>
> Will turn to:
>
>
>   XQuery
>   and XPAthare   awesome
>
>
> So:
>
> $doc contains text { 'XPath‘ }
>
>
> will return false.
>
> You have 3.5 options:
>
> 1) => as Fabrice showed, query the individual text nodes
>
> 2) use the ft:search() Function to query the index directly,
> http://docs.basex.org/wiki/Full-Text_Module#ft:search
>
> ft:search(
>   'CTGovDebug',
>   'neoplasms'
> )/.. (: get parent element for the matching text()-node
>
>
> 3) disable chopping when creating the database,
> http://docs.basex.org/wiki/Options#XML_Parsing
>
> db:create(
>   'CTGovDebug',
>   "Path/to/NCT00473512.xml",
>   "NCT00473512.xml",
>
>   map {
>'ftindex': true(),
>'chop': false()
>   })
>
>
> 3.5) use the xml:space="preserve“ attribute to tell the parser not to chop
> child nodes of  when creating a database:
>
> 
>   
>   
> ClinicalTrials.gov processed this data on August 31,
> 2017
> Link to the current ClinicalTrials.gov record.
>
>
>
> Hope this helped shed some light :-)
>
> Best from Konstanz
> Michael
> --
> Michael Seiferle, BaseX GmbH, http://www.basexgmbh.de
> |-- Firmensitz: Obere Laube 73, 78462 Konstanz
> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
> |   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
> `-- Tel: +49 7531 916 82 77
>
> Am 11.09.2017 um 09:35 schrieb Fabrice ETANCHAUD
> :
>
> Hello Ron,
>
> I don’t know how ft operators behave on document nodes.
> Supposing documents are converted to their data() representation, Your query
> would yield the same negative answer.
> You should consider applying ft operators on text nodes like this :
>
> for $trial in db:open('NCT00473512')//text() (:
> [clinical_study/id_info/nct_id='NCT00473512'] :)
> return $trial[. contains text { 'neoplasms' }]
>
> Best regards,
> Fabrice Etanchaud
>
>
> De : basex-talk-boun...@mailman.uni-konstanz.de
> [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Ron
> Katriel
> Envoyé : lundi 11 septembre 2017 00:42
> À : BaseX
> Objet : [basex-talk] Issue with Full Text Retrieval
>
> Hi,
>
> I am seeing strange behavior with Full Text retrieval. The following query
> fails for a number of words that are in the XML document (see attached):
>
> for $trial in db:open('CTGovDebug') (:
> [clinical_study/id_info/nct_id='NCT00473512'] :)
> return $trial contains text { 'neoplasms' }
>
> It fails on a good number of words including neoplasms, cougar, industry,
> yes, completed, november, 2005, interventional, single, male, female,
> assignment, none, research, principal, primary, secondary, age, years,
> gender, etc. But it matches most of the words in the file.
>
> Observation: The words that fail are located at the beginning and/or end of
> the text and do not occur anywhere else in the middle of any text.
>
> The document is the only one in the database. It does not make a difference
> whether full text indexing is on or off. My BaseX version is 8.6.4.
>
> Thanks,
> Ron
>
>
> Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions
> 350 Hudson Street, 7th Floor, New York, NY 10014
> rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 |
> main: +1 212 918 1800
>
>


Re: [basex-talk] retrieving the name of the archive?

2017-09-18 Thread Graydon Saunders
Thank you!

Someday I will get it through my head that it's not really a file system
down there. :)

On Mon, Sep 18, 2017 at 11:00 AM, Christian Grün 
wrote:

> Hi Graydon,
>
> > the config switch is "ARCHIVENAME = true"
>
> Exactly, that’s the option you’ll need to enable to get the archive
> names included in your database paths. It can also be passed on to
> XQuery functions (db:create, db:add, etc.).
>
> > This gets me the behaviour I was expecting would happen, but I'm still
> > curious if there's a way to get the archive name back in the default
> case,
> > because it does look like BaseX is in no way confused about which of
> those
> > identically named files belong together.
>
> By default, the archive name will be ignored. In BaseX, it’s possible
> to have several documents with the same name (this provides better
> performance if document paths are irrelevant), and db:list-details
> simply returns all document names in the order in which the documents
> were added.
>
> Hope this helps,
> Christian
>


Re: [basex-talk] Basex Inner Workings

2017-09-18 Thread Anastasiou A .
Hello Fabrice

Given:


[a number][a string]
…
   [a number][a string]


The size is ~5m `item`. (Depending on the query, we are talking about a few 
million items)

If I don’t add any external additional structure, which here is defined by the 
`item`, `items` elements, then the “unformatted” output is generated in under 
2sec.

[a number][a string]
…
[a number][a string]

Again, that would be a few million items. Queries are exactly the same apart 
from the addition of element items{ for… for… return element item {…}}

The problem with this second representation is that you don’t really know where 
tags from one item of the original database begin and end, this is why I want 
to enclose them further.

All the best




All the best



From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Fabrice 
ETANCHAUD
Sent: 18 September 2017 15:56
To: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

Hi Athanasios,

Could you please give us a idea of your resulting document size after 1.5 
minutes of BaseX time ?

Best regards,
Fabrice

De : 
basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Anastasiou A.
Envoyé : lundi 18 septembre 2017 14:47
À : 'Graydon Saunders'; 
basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Basex Inner Workings

Hello

Many thanks, Dirk, Fabrice and Graydon.

I was going to look up ways of enabling the server to run as fast as possible 
anyway later on, so it is always good to know how is BaseX “thinking”.

I can see what you mean Graydon. This is a simple nested `for` to denormalise 
some of the structures of the XML file, where “some” is defined by
an XPath expression.

As far as I can tell, there is nothing being re-evaluated repeatedly within the 
inner loop that could be brought outside.

I have gone through the dot plans of the quickest and slowest versions of the 
query and the only thing they differ is in the addition of the CElems.

The “scaling” of the timings, in case it helps, is as follows:

Simple query, returning elements: 1100-1500 ms

Adding an `element` to what is returned just by the innermost `for`: 7500-9311 
ms
This means:
For…
   For….
Return element item{someElement|someOtherElement}

Adding an `element` to the whole block (no `element` to the innermost 
`for`):49000-67000ms
This means:
Element Items{
For…
For…
 Return someElement|someOtherElement
}

Adding an `element` to both places: 5-8ms
This means:
Element Items{
For…
For …
Return element Item {someElement|someOtherElement}
}


I don’t mind the ~8sec time but when we get to 1.5min, then yes…that’s going to 
be a bit annoying.

All the best







From: Graydon Saunders [mailto:graydon...@gmail.com]
Sent: 15 September 2017 17:04
To: Anastasiou A.; 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

As a follow-on to Dirk, it's amazing how much of a performance difference it 
can make to use typed variables when you're constructing something for output.  
(So far as I can tell, variables declarations function as an "optimize this!" 
flag for BaseX.)

If you get good performance when you're just throwing the resulting nodes and 
lose it massively by adding structure, as you relate up there somewhere are:
The change was to go from simply returning the nodes themselves with a `return 
thisnode | thatnode |theothernode` to a "formatted" document that has an outer 
 with a number of `return 
{thisNode|thatNode|theOtherNode}` inside it.

my immediate thought was "it's querying the same thing multiple times".

Most programming languages it's good practice to not create variables when you 
can inline.  XQuery does not appear to be one of those languages. :)  I try to 
think of this as "how can I make things easy for the optimizer?"

-- Graydon

On Fri, Sep 15, 2017 at 11:55 AM, Kirsten, Dirk 
> wrote:
Hello Athanasios,

I think you should really check the actual query plan which is executed. If you 
have such a huge spike in performance surely they processor will be executing 
it differently. I don't think looking into file access patterns BaseX 
internally uses is very useful for an end user. You should let BaseX handle 
that (but of course, if you find better/more efficient ways I am sure 
Christian' gladly accepts Pull Requests). But the pattern you describe sounds 
very much excepted, so reads if you open databases seem logical and short write 
operations are also expected when just reading a database, because e.g. BaseX 
has to lock the databases.

So I think it would be more useful to look into the query plan. Of course you 
are more than welcome to ask about what is going 

Re: [basex-talk] Server Variables, cached vars, etc

2017-09-18 Thread Fabrice ETANCHAUD
Hello Christian !

Yes, a -c option for the basexhttp would help, as mentioned earlier, for 
example creating a shared mainmem collection.

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Christian Grün
Envoyé : lundi 18 septembre 2017 16:20
À : Kendall Shaw; coach3pete
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Server Variables, cached vars, etc

Hi Erik,

I think that Xavier-Laurent, Marco, Fabrice and Kendall have already given 
excellent feedback.

In our own projects, we store all global data in databases, or in local 
configuration files. One advantage is that this data requires no initialization 
and will automatically be available after a restart.

I don’t know anything about »server variables« in MarkLogic so far, so
@Erik: feel free to pass me on a link to the documentation, and I can check if 
a similar solution could make sense for BaseX.

Talking about the start script server option: The basexserver command comes 
with a -c flag, which allows you run initial commands [1]. We could add such a 
flag for basexhttp, or even allow an initial input for both startup commands 
(similar to basex/basexclient). Would this be helpful for some of you reading 
this? Quite obviously, this requires BaseX to be run via these scripts (it 
wouldn’t have any effect if BaseX is deployed as servlet).

Cheers,
Christian

[1] http://docs.basex.org/wiki/Command-Line_Options#Server



On Sun, Sep 10, 2017 at 1:56 AM, Kendall Shaw  wrote:
> The servlet could populate your singleton just once upon startup, or 
> run xquery etc. The load-on-startup configuration means that the 
> servlet is initialized after basex has been loaded. So, if you restart 
> jetty or whatever web server/web container you are using basex 
> restarts and then your servlet’s init method is invoked.
>
>
>
> Kendall
>
>
>
> From: Erik Peterson 
> Date: Saturday, September 9, 2017 at 4:16 AM
> To: Kendall Shaw 
> Cc: "basex-talk@mailman.uni-konstanz.de"
> 
>
>
> Subject: Re: [basex-talk] Server Variables, cached vars, etc
>
>
>
> Thanks Kendal for your reply. What would be the advantage of creating 
> a servlet over a singleton class to do the same thing?
>
>
>
> On Fri, Sep 8, 2017 at 11:12 AM, Kendall Shaw 
> 
> wrote:
>
> I thought it might be useful to mention advice I was given about 
> startup
> hooks:
>
>
>
>> From: "Kirsten, Dirk" dirk.kirs...@senacor.com
>
> ,,,
>
>> there is currently no way to do this using BaseX itself. But I also 
>> don’t think that should be the job of BaseX. Instead you can write a 
>> servlet and deploy it using Tomcat which runs some Java application, 
>> e.g. which could trigger some BaseXX command. See 
>> http://crunchify.com/how-to-run-java-program-automatically-on-tomcat-
>> startup/
>> for an example how to do this.
>
>
>
> I switched from using a cron job, to doing this in order to schedule jobs.
> I have very simple servlet that is configured with 
> 2 (basex has load-on-startup 2). It 
> runs a shell script which schedules the jobs, soon after basex is loaded.
>
>
>
> Kendall
>
>
>
> From:  on behalf of Erik 
> Peterson 
> Date: Tuesday, September 5, 2017 at 7:02 AM
> To: Fabrice ETANCHAUD , 
> "basex-talk@mailman.uni-konstanz.de" 
> 
> Subject: Re: [basex-talk] Server Variables, cached vars, etc
>
>
>
> Thank you all for your replys.  It looks like a main memory database 
> is the best "built in" option.  However, I have created Jar file  to 
> drop him/lib with a Java Singleton object...holding a map.  That should be  
> accessible
> across requests and sessions.   The question is how to populate this just
> once upon start up?  Perhaps I could do a job that would do that?  
> Also I could memoize the variables in a global script.  That way the 
> expensive operation is only run the first time it is needed.
>
>
>
> Any other suggestions welcome.  Recommend that a standard built-in 
> feature be added to handle these scenarios.
>
>
>
> On Tue, Sep 5, 2017 at 1:33 AM Fabrice ETANCHAUD 
>  wrote:
>
> To be confirmed : there is no 'start script' server option.
> I do manually create and populate the mainmem db in the dba query interface.
>
> Best regards,
> Fabrice
>
> -Message d'origine-
> De : Fabrice ETANCHAUD
> Envoyé : mardi 5 septembre 2017 09:29
> À : 'Marco Lettere'; basex-talk@mailman.uni-konstanz.de
> Objet : RE: [basex-talk] Server Variables, cached vars, etc
>
> Hi all,
>
> Another solution is to share a main memory database, that behaves like 
> a memory cache.
> In Client/Server mode, any main memory created by one client is 
> available to all the other ones.
>
> Best 

Re: [basex-talk] retrieving the name of the archive?

2017-09-18 Thread Christian Grün
Hi Graydon,

> the config switch is "ARCHIVENAME = true"

Exactly, that’s the option you’ll need to enable to get the archive
names included in your database paths. It can also be passed on to
XQuery functions (db:create, db:add, etc.).

> This gets me the behaviour I was expecting would happen, but I'm still
> curious if there's a way to get the archive name back in the default case,
> because it does look like BaseX is in no way confused about which of those
> identically named files belong together.

By default, the archive name will be ignored. In BaseX, it’s possible
to have several documents with the same name (this provides better
performance if document paths are irrelevant), and db:list-details
simply returns all document names in the order in which the documents
were added.

Hope this helps,
Christian


Re: [basex-talk] Basex Inner Workings

2017-09-18 Thread Fabrice ETANCHAUD
Hi Athanasios,

Could you please give us a idea of your resulting document size after 1.5 
minutes of BaseX time ?

Best regards,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Anastasiou A.
Envoyé : lundi 18 septembre 2017 14:47
À : 'Graydon Saunders'; basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Basex Inner Workings

Hello

Many thanks, Dirk, Fabrice and Graydon.

I was going to look up ways of enabling the server to run as fast as possible 
anyway later on, so it is always good to know how is BaseX “thinking”.

I can see what you mean Graydon. This is a simple nested `for` to denormalise 
some of the structures of the XML file, where “some” is defined by
an XPath expression.

As far as I can tell, there is nothing being re-evaluated repeatedly within the 
inner loop that could be brought outside.

I have gone through the dot plans of the quickest and slowest versions of the 
query and the only thing they differ is in the addition of the CElems.

The “scaling” of the timings, in case it helps, is as follows:

Simple query, returning elements: 1100-1500 ms

Adding an `element` to what is returned just by the innermost `for`: 7500-9311 
ms
This means:
For…
   For….
Return element item{someElement|someOtherElement}

Adding an `element` to the whole block (no `element` to the innermost 
`for`):49000-67000ms
This means:
Element Items{
For…
For…
 Return someElement|someOtherElement
}

Adding an `element` to both places: 5-8ms
This means:
Element Items{
For…
For …
Return element Item {someElement|someOtherElement}
}


I don’t mind the ~8sec time but when we get to 1.5min, then yes…that’s going to 
be a bit annoying.

All the best







From: Graydon Saunders [mailto:graydon...@gmail.com]
Sent: 15 September 2017 17:04
To: Anastasiou A.; 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

As a follow-on to Dirk, it's amazing how much of a performance difference it 
can make to use typed variables when you're constructing something for output.  
(So far as I can tell, variables declarations function as an "optimize this!" 
flag for BaseX.)

If you get good performance when you're just throwing the resulting nodes and 
lose it massively by adding structure, as you relate up there somewhere are:
The change was to go from simply returning the nodes themselves with a `return 
thisnode | thatnode |theothernode` to a "formatted" document that has an outer 
 with a number of `return 
{thisNode|thatNode|theOtherNode}` inside it.

my immediate thought was "it's querying the same thing multiple times".

Most programming languages it's good practice to not create variables when you 
can inline.  XQuery does not appear to be one of those languages. :)  I try to 
think of this as "how can I make things easy for the optimizer?"

-- Graydon

On Fri, Sep 15, 2017 at 11:55 AM, Kirsten, Dirk 
> wrote:
Hello Athanasios,

I think you should really check the actual query plan which is executed. If you 
have such a huge spike in performance surely they processor will be executing 
it differently. I don't think looking into file access patterns BaseX 
internally uses is very useful for an end user. You should let BaseX handle 
that (but of course, if you find better/more efficient ways I am sure 
Christian' gladly accepts Pull Requests). But the pattern you describe sounds 
very much excepted, so reads if you open databases seem logical and short write 
operations are also expected when just reading a database, because e.g. BaseX 
has to lock the databases.

So I think it would be more useful to look into the query plan. Of course you 
are more than welcome to ask about what is going on there on this list. I would 
expect that because of your rewrite maybe some indexes are not applied anymore 
(or if your rewrite is simply very big that most of the time is spent 
serializing the data).

Cheers
Dirk

Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht 
Frankfurt am Main - Reg.-Nr.: HRB 105546
Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel 
Grözinger

-Ursprüngliche Nachricht-
Von: 
basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] Im Auftrag von Fabrice 
ETANCHAUD
Gesendet: Freitag, 15. September 2017 17:35
An: 'Anastasiou A.' 
>; 
basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Basex Inner Workings


You can find the time spent in each step in the query info bar graph.

If you are looking for the schema and the facets of your dataset, you should 
have a look at the index module, and for sure at 

Re: [basex-talk] retrieving the name of the archive?

2017-09-18 Thread Graydon Saunders
Ok, so it looks like:

   1. find where BaseX is really getting its config files ($HOME in my
   case); http://docs.basex.org/wiki/Configuration#Configuration_Files
says "Q{org.basex.util.Prop}USERHOME()"  which is exceedingly helpful!
   2. add, to .basex (NOT .basexgui) AFTER # Local Options the necessary
   config switch
   3. the config switch is "ARCHIVENAME = true"
   4. restart BaseX and go to recreate the DB and you'll see a ticky-box
   option for "add the archive name to the path"

This gets me the behaviour I was expecting would happen, but I'm still
curious if there's a way to get the archive name back in the default case,
because it does look like BaseX is in no way confused about which of those
identically named files belong together.

Thanks!
Graydon

On Mon, Sep 18, 2017 at 8:55 AM, Graydon Saunders 
wrote:

> Hello --
>
> BaseX will happily consume zip archives; this is just splendid for loading
> up a bunch of docx files.
>
> Now I find myself wanting the name of the docx file -- the original name
> of the archive -- and I don't know how to retrieve that.  (or if it can
> be!)  But I think it must be there somewhere because db:path repeats the
> standard OOXML file paths:
>
> [Content_Types].xml
> word/document.xml
> word/footnotes.xml
> word/footer1.xml
> word/endnotes.xml
> word/theme/theme1.xml
> word/settings.xml
> docProps/custom.xml
> customXml/itemProps2.xml
> docProps/app.xml
> customXml/item2.xml
> customXml/itemProps1.xml
> word/fontTable.xml
> customXml/item1.xml
> customXml/item3.xml
> customXml/itemProps3.xml
> customXml/item4.xml
> customXml/itemProps4.xml
> word/numbering.xml
> word/styles.xml
> word/webSettings.xml
> docProps/core.xml
> word/people.xml
>
> over and over; if they were all going to exactly that there'd be one copy,
> and all several hundred docx files are there by content.  (db:list-details
> tells me about > 4000 individual xml files.)
>
> If I can get the name of the original archive, how do I do that?
>
> Thanks!
> Graydon
>


Re: [basex-talk] Is there any documentation on the narrow limits of XQuery index optimization in BaseX?

2017-09-18 Thread Christian Grün
> The thing I miss most is a function
> like xquery:eval that accepts a function as an argument but also takes a
> context and does that runtime optimization.

I assume you are looking for something like the following query?

  xquery:eval-func(
function($db, $name) { db:open($db)//person[name = $name] },
[ 'persons', 'john' ]
  )

This sounds like an enticing idea. It is hard to realize, though, as
we would need to recompile code that has already been rewritten to an
internal representation that can be evaluated by our XQuery processor.

> Or a way to convert a function
> to  string.

Same here: It would require a lot of work to create a bug free XQuery
string representation of the internal code.




> Am 18.09.2017 um 15:59 schrieb Christian Grün:
>>
>> Hi Omar,
>>
>> Our current XQuery optimizer opens the addressed database in order to
>> find out if it has the required index structures, and if these are
>> up-to-date. Moreover, the cheapest index lookup will be selected if
>> there are several index candidates. For example, in the following
>> query, it will be likely that the second predicate will be rewritten
>> for index access:
>>
>>db:open('persons')//person[country = 'Italy'][@id = 'id124']
>>
>> If the addressed database is not statically known, these checks cannot
>> be performed that easily. Further implications and in-depth
>> information can be found in »Storing and Querying Large XML Instances«
>> [1].
>>
>> Here are two ideas how this could be tackled:
>>
>> • We could add an XQuery pragma to enforce specific index rewritings.
>> Examples:
>>
>>for $n in 1 to 10
>>for $db in ('persons' || $n)
>>for $person in db:open($db)//person
>>where (# basex:index #) { $person/country = 'Italy' }
>>where $person/@id = 'id124's
>>return $person
>>
>>(1 to 10) ! db:open('persons' || .)//person
>>  [(# basex:index #) { country = 'Italy' }]
>>  [@id = 'id124']
>>
>> • We could create multiple query plans at compile time (with and
>> without index, one rewriting for each index candidate) and choose the
>> one that is expected to be the cheapest at evaluation time. This would
>> definitely be the most flexible option (but the number of query plans
>> increases exponentially if you have nested FLWOR expressions and
>> queries with numerous predicates or where clauses).
>>
>> Cheers,
>> Christian
>>
>> [1] http://basex.org/about-us/publications/
>>
>


Re: [basex-talk] Server Variables, cached vars, etc

2017-09-18 Thread Marco Lettere

Hi Christian,
welcome back to the list!
Having the possibility to run a script or call any sort of cuntvionality 
at startup is quite a useful thing.
Doing it in a declarative way would be great maybe exploiting the .basex 
file or the web.xml?


Btw, hooking up on Server variables, a colleague of mine has just come 
up with another (somewhat exotic) way of passing content from one RestXQ 
call to another using proc:system to set and get environment variables 
variables. ;-)

Regards,
Marco.

On 18/09/2017 16:19, Christian Grün wrote:

Hi Erik,

I think that Xavier-Laurent, Marco, Fabrice and Kendall have already
given excellent feedback.

In our own projects, we store all global data in databases, or in
local configuration files. One advantage is that this data requires no
initialization and will automatically be available after a restart.

I don’t know anything about »server variables« in MarkLogic so far, so
@Erik: feel free to pass me on a link to the documentation, and I can
check if a similar solution could make sense for BaseX.

Talking about the start script server option: The basexserver command
comes with a -c flag, which allows you run initial commands [1]. We
could add such a flag for basexhttp, or even allow an initial input
for both startup commands (similar to basex/basexclient). Would this
be helpful for some of you reading this? Quite obviously, this
requires BaseX to be run via these scripts (it wouldn’t have any
effect if BaseX is deployed as servlet).

Cheers,
Christian

[1] http://docs.basex.org/wiki/Command-Line_Options#Server



On Sun, Sep 10, 2017 at 1:56 AM, Kendall Shaw  wrote:

The servlet could populate your singleton just once upon startup, or run
xquery etc. The load-on-startup configuration means that the servlet is
initialized after basex has been loaded. So, if you restart jetty or
whatever web server/web container you are using basex restarts and then your
servlet’s init method is invoked.



Kendall



From: Erik Peterson 
Date: Saturday, September 9, 2017 at 4:16 AM
To: Kendall Shaw 
Cc: "basex-talk@mailman.uni-konstanz.de"



Subject: Re: [basex-talk] Server Variables, cached vars, etc



Thanks Kendal for your reply. What would be the advantage of creating a
servlet over a singleton class to do the same thing?



On Fri, Sep 8, 2017 at 11:12 AM, Kendall Shaw 
wrote:

I thought it might be useful to mention advice I was given about startup
hooks:




From: "Kirsten, Dirk" dirk.kirs...@senacor.com

,,,


there is currently no way to do this using BaseX itself. But I also don’t
think that should be the job of BaseX. Instead you can write a servlet and
deploy it using Tomcat which runs some Java application, e.g. which could
trigger some BaseXX command. See
http://crunchify.com/how-to-run-java-program-automatically-on-tomcat-startup/
for an example how to do this.



I switched from using a cron job, to doing this in order to schedule jobs.
I have very simple servlet that is configured with
2 (basex has load-on-startup 2). It runs
a shell script which schedules the jobs, soon after basex is loaded.



Kendall



From:  on behalf of Erik
Peterson 
Date: Tuesday, September 5, 2017 at 7:02 AM
To: Fabrice ETANCHAUD ,
"basex-talk@mailman.uni-konstanz.de" 
Subject: Re: [basex-talk] Server Variables, cached vars, etc



Thank you all for your replys.  It looks like a main memory database is the
best "built in" option.  However, I have created Jar file  to drop him/lib
with a Java Singleton object...holding a map.  That should be  accessible
across requests and sessions.   The question is how to populate this just
once upon start up?  Perhaps I could do a job that would do that?  Also I
could memoize the variables in a global script.  That way the expensive
operation is only run the first time it is needed.



Any other suggestions welcome.  Recommend that a standard built-in feature
be added to handle these scenarios.



On Tue, Sep 5, 2017 at 1:33 AM Fabrice ETANCHAUD
 wrote:

To be confirmed : there is no 'start script' server option.
I do manually create and populate the mainmem db in the dba query interface.

Best regards,
Fabrice

-Message d'origine-
De : Fabrice ETANCHAUD
Envoyé : mardi 5 septembre 2017 09:29
À : 'Marco Lettere'; basex-talk@mailman.uni-konstanz.de
Objet : RE: [basex-talk] Server Variables, cached vars, etc

Hi all,

Another solution is to share a main memory database, that behaves like a
memory cache.
In Client/Server mode, any main memory created by one client is available to
all the other ones.

Best regards,
Fabrice


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Marco

Re: [basex-talk] Server Variables, cached vars, etc

2017-09-18 Thread Christian Grün
Hi Erik,

I think that Xavier-Laurent, Marco, Fabrice and Kendall have already
given excellent feedback.

In our own projects, we store all global data in databases, or in
local configuration files. One advantage is that this data requires no
initialization and will automatically be available after a restart.

I don’t know anything about »server variables« in MarkLogic so far, so
@Erik: feel free to pass me on a link to the documentation, and I can
check if a similar solution could make sense for BaseX.

Talking about the start script server option: The basexserver command
comes with a -c flag, which allows you run initial commands [1]. We
could add such a flag for basexhttp, or even allow an initial input
for both startup commands (similar to basex/basexclient). Would this
be helpful for some of you reading this? Quite obviously, this
requires BaseX to be run via these scripts (it wouldn’t have any
effect if BaseX is deployed as servlet).

Cheers,
Christian

[1] http://docs.basex.org/wiki/Command-Line_Options#Server



On Sun, Sep 10, 2017 at 1:56 AM, Kendall Shaw  wrote:
> The servlet could populate your singleton just once upon startup, or run
> xquery etc. The load-on-startup configuration means that the servlet is
> initialized after basex has been loaded. So, if you restart jetty or
> whatever web server/web container you are using basex restarts and then your
> servlet’s init method is invoked.
>
>
>
> Kendall
>
>
>
> From: Erik Peterson 
> Date: Saturday, September 9, 2017 at 4:16 AM
> To: Kendall Shaw 
> Cc: "basex-talk@mailman.uni-konstanz.de"
> 
>
>
> Subject: Re: [basex-talk] Server Variables, cached vars, etc
>
>
>
> Thanks Kendal for your reply. What would be the advantage of creating a
> servlet over a singleton class to do the same thing?
>
>
>
> On Fri, Sep 8, 2017 at 11:12 AM, Kendall Shaw 
> wrote:
>
> I thought it might be useful to mention advice I was given about startup
> hooks:
>
>
>
>> From: "Kirsten, Dirk" dirk.kirs...@senacor.com
>
> ,,,
>
>> there is currently no way to do this using BaseX itself. But I also don’t
>> think that should be the job of BaseX. Instead you can write a servlet and
>> deploy it using Tomcat which runs some Java application, e.g. which could
>> trigger some BaseXX command. See
>> http://crunchify.com/how-to-run-java-program-automatically-on-tomcat-startup/
>> for an example how to do this.
>
>
>
> I switched from using a cron job, to doing this in order to schedule jobs.
> I have very simple servlet that is configured with
> 2 (basex has load-on-startup 2). It runs
> a shell script which schedules the jobs, soon after basex is loaded.
>
>
>
> Kendall
>
>
>
> From:  on behalf of Erik
> Peterson 
> Date: Tuesday, September 5, 2017 at 7:02 AM
> To: Fabrice ETANCHAUD ,
> "basex-talk@mailman.uni-konstanz.de" 
> Subject: Re: [basex-talk] Server Variables, cached vars, etc
>
>
>
> Thank you all for your replys.  It looks like a main memory database is the
> best "built in" option.  However, I have created Jar file  to drop him/lib
> with a Java Singleton object...holding a map.  That should be  accessible
> across requests and sessions.   The question is how to populate this just
> once upon start up?  Perhaps I could do a job that would do that?  Also I
> could memoize the variables in a global script.  That way the expensive
> operation is only run the first time it is needed.
>
>
>
> Any other suggestions welcome.  Recommend that a standard built-in feature
> be added to handle these scenarios.
>
>
>
> On Tue, Sep 5, 2017 at 1:33 AM Fabrice ETANCHAUD
>  wrote:
>
> To be confirmed : there is no 'start script' server option.
> I do manually create and populate the mainmem db in the dba query interface.
>
> Best regards,
> Fabrice
>
> -Message d'origine-
> De : Fabrice ETANCHAUD
> Envoyé : mardi 5 septembre 2017 09:29
> À : 'Marco Lettere'; basex-talk@mailman.uni-konstanz.de
> Objet : RE: [basex-talk] Server Variables, cached vars, etc
>
> Hi all,
>
> Another solution is to share a main memory database, that behaves like a
> memory cache.
> In Client/Server mode, any main memory created by one client is available to
> all the other ones.
>
> Best regards,
> Fabrice
>
>
> -Message d'origine-
> De : basex-talk-boun...@mailman.uni-konstanz.de
> [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Marco
> Lettere Envoyé : mardi 5 septembre 2017 09:14 À :
> basex-talk@mailman.uni-konstanz.de
> Objet : Re: [basex-talk] Server Variables, cached vars, etc
>
> On 05/09/2017 01:37, Erik Peterson wrote:
>> How can I create a variable that is evaluated only once but accessed
>> across many RestXQ requests and sessions. I'm trying to cache data
>> that comes from an 

Re: [basex-talk] Is there any documentation on the narrow limits of XQuery index optimization in BaseX?

2017-09-18 Thread Omar Siam

Hi!

Interesting ideas. I don't like the pragma idea that much because there 
is already sth. like that with xquery:eval. The thing I miss most is a 
function like xquery:eval that accepts a function as an argument but 
also takes a context and does that runtime optimization. Or a way to 
convert a function to  string. Is there already sth. like this? I though 
it might be xquery:invoke but that seems to do sth. else.


Best regards

Omar


Am 18.09.2017 um 15:59 schrieb Christian Grün:

Hi Omar,

Our current XQuery optimizer opens the addressed database in order to
find out if it has the required index structures, and if these are
up-to-date. Moreover, the cheapest index lookup will be selected if
there are several index candidates. For example, in the following
query, it will be likely that the second predicate will be rewritten
for index access:

   db:open('persons')//person[country = 'Italy'][@id = 'id124']

If the addressed database is not statically known, these checks cannot
be performed that easily. Further implications and in-depth
information can be found in »Storing and Querying Large XML Instances«
[1].

Here are two ideas how this could be tackled:

• We could add an XQuery pragma to enforce specific index rewritings. Examples:

   for $n in 1 to 10
   for $db in ('persons' || $n)
   for $person in db:open($db)//person
   where (# basex:index #) { $person/country = 'Italy' }
   where $person/@id = 'id124'
   return $person

   (1 to 10) ! db:open('persons' || .)//person
 [(# basex:index #) { country = 'Italy' }]
 [@id = 'id124']

• We could create multiple query plans at compile time (with and
without index, one rewriting for each index candidate) and choose the
one that is expected to be the cheapest at evaluation time. This would
definitely be the most flexible option (but the number of query plans
increases exponentially if you have nested FLWOR expressions and
queries with numerous predicates or where clauses).

Cheers,
Christian

[1] http://basex.org/about-us/publications/



Re: [basex-talk] Is there any documentation on the narrow limits of XQuery index optimization in BaseX?

2017-09-18 Thread Christian Grün
Hi Omar,

Our current XQuery optimizer opens the addressed database in order to
find out if it has the required index structures, and if these are
up-to-date. Moreover, the cheapest index lookup will be selected if
there are several index candidates. For example, in the following
query, it will be likely that the second predicate will be rewritten
for index access:

  db:open('persons')//person[country = 'Italy'][@id = 'id124']

If the addressed database is not statically known, these checks cannot
be performed that easily. Further implications and in-depth
information can be found in »Storing and Querying Large XML Instances«
[1].

Here are two ideas how this could be tackled:

• We could add an XQuery pragma to enforce specific index rewritings. Examples:

  for $n in 1 to 10
  for $db in ('persons' || $n)
  for $person in db:open($db)//person
  where (# basex:index #) { $person/country = 'Italy' }
  where $person/@id = 'id124'
  return $person

  (1 to 10) ! db:open('persons' || .)//person
[(# basex:index #) { country = 'Italy' }]
[@id = 'id124']

• We could create multiple query plans at compile time (with and
without index, one rewriting for each index candidate) and choose the
one that is expected to be the cheapest at evaluation time. This would
definitely be the most flexible option (but the number of query plans
increases exponentially if you have nested FLWOR expressions and
queries with numerous predicates or where clauses).

Cheers,
Christian

[1] http://basex.org/about-us/publications/



On Wed, Sep 6, 2017 at 4:02 PM, Omar Siam  wrote:
> Hi list!
>
> Recently I started to wonder why functions in my XQuery modules make no use
> of indexes unless I force them to by using the respective function like
> db:text(). Now I just did some I think minimal changes to the example for
> text index at: http://docs.basex.org/wiki/Indexes#Text_Index.
>
> If I just change it like below I loose almost all index optimization.
>
> The only optimizations left are:
>
> * Using db:open or collection with a string but not one in a variable of a
> for FLOWR expression or the simple map operator.
>
> * Using xquery:eval and passing a context with db:open or collection
>
> Is there any chance that this will change any time soon? Is this a
> fundamental restriction?
>
> Best regards
>
> Omar
>
> xquery version "3.1";
>
> declare namespace _ = "urn:local:namespace:_";
> import module namespace functx = "http://www.functx.com;;
>
> declare function _:_1st_example($ctx as document-node()) {
>   $ctx//*[text() = 'Germany']
> };
>
> declare function _:_2nd_example($file as xs:string) {
>   doc($file)//name[. = 'Germany']
> };
>
> declare function _:_3rd_example($dbname as xs:string+) {
>   (
>   for $c in ($dbname!collection(.))//country
>   where $c//city/name = 'Hanoi'
>   return $c/name,
>   $dbname!xquery:eval("//*[text() = 'Vietnam']", map {'': db:open(.)}))
> };
>
> (
> (: 1st example :)
> _:_1st_example(.),
> (: 2nd example :)
> _:_2nd_example('factbook.xml'),
> (: 3rd example :)
> _:_3rd_example(('factbook', 'factbook')),
> xs:string(_:_1st_example)
> )
>
> Optimized Query:
> (let $ctx_314 := . return $ctx_314/descendant::*[(text() = "Germany")],
> db:open-pre("factbook",0)/descendant::name[(. = "Germany")], (for $c_317 in
> (("factbook", "factbook") !
> collection(.))/descendant::country[(descendant::city/name = "Hanoi")] return
> $c_317/name, (("factbook", "factbook") ! xquery:eval("//*[text() =
> 'Vietnam']", map { "":db:open(.) }))), _:_1st_example cast as xs:string?)
>
> Compiling:
>
> [...]
>
> - RUNTIME: pre-evaluate root() to document-node()
> - RUNTIME: rewrite descendant-or-self step(s)
> - RUNTIME: apply text index for "Vietnam"
> - RUNTIME: pre-evaluate root() to document-node()
> - RUNTIME: rewrite descendant-or-self step(s)
> - RUNTIME: apply text index for "Vietnam"
>


Re: [basex-talk] Simple Map Operator in oXygen XML (XQJ)

2017-09-18 Thread Christian Grün
Hi Omar,

> Ok. Interesting. Please someone note this in the Wiki so there are no
> surprises.

I have added a little hint in our Wiki [1].

Cheers,
Christian

[1] http://docs.basex.org/wiki/Developing



 For the ! syntax I can just stick with for in return but missing
> map {} is a reason to switch to dba.
>
> Best regards
>
> Omar
>
>
> Am 04.09.2017 um 17:14 schrieb Charles Foster:
>
> Hi Michael,
>
> BaseX XQJ supports XQuery 3.0 syntax I believe, but not XQuery 3.1 specific
> syntax.
>
> It would be possible to update the XQJ driver to support XQuery 3.1 syntax,
> but it would mean upgrading the internal XQuery parser and re-writer.
>
> Omar, as a work around you could potentially use equivalent functions in the
> standard F library (fn).
>
> Kind Regards,
>
> Charles
>
>
> On 4 Sep 2017, at 22:02, Michael Seiferle  wrote:
>
> Hi Omar,
> /cc Hi @Charles,
>
> you are right, this has nothing to with oXygen in particular!
> Yet: I am not even sure if XQJ is meant to support anything but XQuery 1.0.
> Charles might correct me if I am wrong, but I don’t think there’s an easy
> way to add 3.1 features to the current XQJ implementation.
>
> Best from Konstanz
>
> Michael
>
>
>
>
>
> Am 04.09.2017 um 16:42 schrieb Omar Siam :
>
> Good idea but I was pretty sure that this comes from within BaseX not
> oXygen. You can configure the parser and the executor of your XQuery in
> oXygen und I used BaseX for both.
>
> So I found in basex-examples a XQJ tutorial. And just by changing Part1.java
> a little and filling in the example query from the Wiki I also get
>
>   XQJQS001 - Invalid XQuery syntax, syntax does not pass static validation.
>   Root Cause:
>   net.xqj.basex.bin.bB: Lexical error at line 1, column 12.  Encountered: "
> " (32), after : "!"
>   at
> org.basex.examples.xqj.tutorial.simplemaptest.main(simplemaptest.java:34)
>   at
> org.basex.examples.xqj.tutorial.simplemaptest.main(simplemaptest.java:34)
>
> So most probably BaseX'es XQJ parser does not understand the simple map
> operator.
>
> XQSequence xqjs = xqje.executeQuery("xquery version \"3.1\"; (1 to 10) !
> element node { . }");
>
>
> Am 04.09.2017 um 15:37 schrieb Michael Seiferle:
>
> Hi Omar,
>
> Looks like that error message is generated by oXygen; you might want to
> crosspost to their list.
>
> Best
> Michael
>
> Am 04.09.2017 um 14:31 schrieb Omar Siam :
>
> Hi!
>
> I just tried to use ther Simple Map Operator while writing an XQuery in
> oXygen XML and executing the query using a client/server BaseX 8.6.6 data
> source. I configured it as described here:
> http://docs.basex.org/wiki/Integrating_oXygen
>
> When I use the Simple Map Operator (!) I only get "Invalid XQuery syntax,
> syntax does not pass static validation.". For example using "(1 to 10) !
> element node { . }"
>
> If I run that XQuery in DBA for example it works.
>
> Any ideas?
>
> Best regards
>
> Omar Siam
>
>
>
>
>


Re: [basex-talk] Fwd: Re: using apply-function within an updating function

2017-09-18 Thread Christian Grün
Hi Rob,

the official XQuery functions are limited to non-updating function
arguments. Enabling MIXUPDATES is the only way out, because you would
otherwise be able to write code that is both updating and
non-updating:

  for $f in (db:output#1, count#1)
  return apply($f, [1])

Hope this helps,
Christian



On Wed, Aug 30, 2017 at 7:37 AM, Rob Stapper  wrote:
> Hi Michael and Marco,
>
> Thanx for the feedback.
>
> @Marco, setting MIXUPDATES is an efficient work around for suppressing the 
> error message but the cost is that you lose the updating-checks completely.  
> Updating-checks are very useful.
> @Michael, I'll dive into the XQUF 3.0. Sounds interesting.
>
> Fact stays that the updating check doesn't work, as I see it, properly on the 
> apply-function. Something for Christian to dive into when he is back on 
> holidays ;-)
>
> Again thanx for the replies.
>
> Have fun,
>
> Rob
>
>
> -Oorspronkelijk bericht-
> Van: basex-talk-boun...@mailman.uni-konstanz.de 
> [mailto:basex-talk-boun...@mailman.uni-konstanz.de] Namens Michael Seiferle
> Verzonden: dinsdag 29 augustus 2017 21:08
> Aan: Marco Lettere
> CC: basex-talk@mailman.uni-konstanz.de
> Onderwerp: Re: [basex-talk] Fwd: Re: using apply-function within an updating 
> function
>
> Hi Rob,
> hi Marco,
>
>
> First: sorry for not coming back to you earlier.
>
> Glad it worked using mixupdates, but I think there might be a way to make it 
> work using the new XQUF 3.0 "invoke updating" capabilities  :-) Christian is 
> currently on holiday, I am sure he will report back on this.
>
>
> Best from Konstanz,
>
> Michael
>
> Von meinem iPhone gesendet
>
>> Am 29.08.2017 um 15:17 schrieb Marco Lettere :
>>
>> Hi Rob,
>> I suppose that if you add MIXUPDATES=true below #LOCAL OPTIONS in the .basex 
>> file that you will find in your basex install directory and restart your 
>> GUI, the error will not show up any longer in the GUI itself.
>> Regards,
>> Marco.
>>
>>> On 29/08/2017 15:07, r.stap...@lijbrandt.nl wrote:
>>> Hi Michael,
>>>
>>> Though the GUI still gives an error onth the use of the apply-function 
>>> within an updating function, my webservices do work correctly. Just added 
>>> the MIXUPDATES-option in the web.xml file.
>>>
>>> Best Rob.
>>>
>>>  Oorspronkelijke bericht 
>>> Onderwerp: Re: [basex-talk] using apply-function within an updating
>>> function
>>> Datum: 29.08.2017 08:06
>>> Afzender: r.stap...@lijbrandt.nl
>>> Ontvanger: Michael Seiferle 
>>> Kopie: BaseX 
>>>
>>> Hi Michael,
>>>
>>> Th point is that I am looking for a generic solution which I thought I had 
>>> found with the use of the apply-function.
>>> Most of my webservices take a record with more than one fieldvalue. By 
>>> putting the fieldvalues in an json-formatted array on the client side and 
>>> the use of the apply-function on the serverside, see [1], I thought I had 
>>> found a nice and clean generic solution. Unfortunately this soultion gives 
>>> me the error: "Function body must be an updating expression".
>>>
>>> The issue is that the error is, in my opinion, falsely triggered by the use 
>>> of the apply-function within updating function. An updating-error-situation 
>>> should, in my opinion, be determined based on the function that is called 
>>> by the apply function. In this case though the apply function calls an 
>>> updating function, Basex still triggeres an updating error.
>>>
>>> I cann't use MIXEDUPDATES because my webservices are in a module.
>>>
>>> I really would like to use the apply-function here. What to do?
>>>
>>> [1]
>>> declare
>>>  %rest:path("/cFactBank/request/new")
>>>  %rest:PUT("{$dataRec}")
>>>  %input:json("format=map")
>>>
>>>  %updating
>>>  function _:request.new
>>>  ( $dataRec as array(*)
>>>  )
>>>  {
>>>apply( request:new#6
>>> , $dataRec
>>> )
>>>  } ;
>>>
>>>
>>> Greetings,
>>> Rob
>>>
>>>
>>> Michael Seiferle schreef op 28.08.2017 18:20:
 Hi Rob,

 may I ask what you intented to do? Looks like you expect $dataRec to
 contain exactly one value, right?
 At least I think so because you called `db:create#1` in `fn:apply`
 which implies you expect the array to contain a single value.

 To create a single database use:

> ```
> db:create($dataRec => array:get(1)) ```

 …or… if you want to create a database for each of the array
 values:

> for $db in ($dataRec => array:flatten()) (: Flatten array to a
> sequence :)
> return db:create($db).(: create one
> database per array item :)

 …and… last but not least, for "Dynamic Updating Function
 Invocation“[1] you might use:

> let $create  := db:create#1
> for $db in ($dataRec => array:flatten()) return invoke updating
> $create($db)

 I could not wrap my head around using 

Re: [basex-talk] Basex Inner Workings

2017-09-18 Thread Anastasiou A .
Must be the day today, sorry, please see below:

No, but it did not make any difference.

But I will tell you what did make a difference, forcing everything to be a 
string and hard coding the names of the tags. That’s a ~3-4 sec query to return 
~5 million items.

I was led to this by what you said about computed elements because it makes 
perfect sense if BaseX has to create the document it returns, in memory, as a 
“proper” XML tree data structure.

I am not particularly jumping up and down about this but it works for the 
moment for such a simple use case. It’s not best practice though so I would be 
more inclined to use the right way of speeding this query up if possible.

By the way, there are now “computed” (in the sense of derived) fields in this 
query, in case you meant it that way.

All the best






From: Anastasiou A.
Sent: 18 September 2017 14:29
To: 'Graydon Saunders'; basex-talk@mailman.uni-konstanz.de
Subject: RE: [basex-talk] Basex Inner Workings

No, but it did not make any difference.

But I will tell you what did make a difference, forcing everything to be a 
string and hard coding the names of the tags. That’s a ~3-4 sec query to return 
~5 million items.

I was led to this by what you said about computed elements because it makes 
perfect sense if BaseX
has to create the “document” it returns, in memory, as a prop




From: 
basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Graydon 
Saunders
Sent: 18 September 2017 14:01
To: 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

Sorry for the fumble-fingers; let me try that again.

Have you tried creating literal elements?

Computed elements have overhead; it's presumptively akin to why you don't want 
to create untyped variables in XSLT 2.0 and 3.0 (an untyped variable might be 
anything and needs a whole document node to exist in, and this is expensive).  
In this case, I'd be darkly suspicious the computed elements are computing 
their contents every time.

I'd be trying
for ...
let $elem1 as element() := ...
let $elem2 as element() := ...

{$elem1,$elem2}

instead of the computed element.

The optimizer is really good in BaseX but it's also really complicated; the 
local maxima can be quite narrow.


On Mon, Sep 18, 2017 at 8:58 AM, Graydon Saunders 
> wrote:
Have you tried creating literal elements?

Computed elements have overhead; it's presumptively akin to why you don't want 
to create untyped variables in XSLT 2.0 and 3.0 (an untyped variable might be 
anything and needs a whole document node to exist in, and this is expensive).  
In this case, I'd be darkly suspicious the computed elements are computing 
their contents every time.

I'd be trying
for ...
let $elem1 as element() := ...
let $elem2 as element() := ...


On Mon, Sep 18, 2017 at 8:46 AM, Anastasiou A. 
> wrote:
Hello

Many thanks, Dirk, Fabrice and Graydon.

I was going to look up ways of enabling the server to run as fast as possible 
anyway later on, so it is always good to know how is BaseX “thinking”.

I can see what you mean Graydon. This is a simple nested `for` to denormalise 
some of the structures of the XML file, where “some” is defined by
an XPath expression.

As far as I can tell, there is nothing being re-evaluated repeatedly within the 
inner loop that could be brought outside.

I have gone through the dot plans of the quickest and slowest versions of the 
query and the only thing they differ is in the addition of the CElems.

The “scaling” of the timings, in case it helps, is as follows:

Simple query, returning elements: 1100-1500 ms

Adding an `element` to what is returned just by the innermost `for`: 7500-9311 
ms
This means:
For…
   For….
Return element item{someElement|someOtherElement}

Adding an `element` to the whole block (no `element` to the innermost 
`for`):49000-67000ms
This means:
Element Items{
For…
For…
 Return someElement|someOtherElement
}

Adding an `element` to both places: 5-8ms
This means:
Element Items{
For…
For …
Return element Item {someElement|someOtherElement}
}


I don’t mind the ~8sec time but when we get to 1.5min, then yes…that’s going to 
be a bit annoying.

All the best







From: Graydon Saunders 
[mailto:graydon...@gmail.com]
Sent: 15 September 2017 17:04
To: Anastasiou A.; 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

As a follow-on to Dirk, it's amazing how much of a performance difference it 
can make to use typed variables when you're constructing something for output.  
(So far as I can tell, variables declarations function as an "optimize this!" 
flag for 

Re: [basex-talk] Basex Inner Workings

2017-09-18 Thread Anastasiou A .
No, but it did not make any difference.

But I will tell you what did make a difference, forcing everything to be a 
string and hard coding the names of the tags. That’s a ~3-4 sec query to return 
~5 million items.

I was led to this by what you said about computed elements because it makes 
perfect sense if BaseX
has to create the “document” it returns, in memory, as a prop




From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Graydon 
Saunders
Sent: 18 September 2017 14:01
To: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

Sorry for the fumble-fingers; let me try that again.

Have you tried creating literal elements?

Computed elements have overhead; it's presumptively akin to why you don't want 
to create untyped variables in XSLT 2.0 and 3.0 (an untyped variable might be 
anything and needs a whole document node to exist in, and this is expensive).  
In this case, I'd be darkly suspicious the computed elements are computing 
their contents every time.

I'd be trying
for ...
let $elem1 as element() := ...
let $elem2 as element() := ...

{$elem1,$elem2}

instead of the computed element.

The optimizer is really good in BaseX but it's also really complicated; the 
local maxima can be quite narrow.


On Mon, Sep 18, 2017 at 8:58 AM, Graydon Saunders 
> wrote:
Have you tried creating literal elements?

Computed elements have overhead; it's presumptively akin to why you don't want 
to create untyped variables in XSLT 2.0 and 3.0 (an untyped variable might be 
anything and needs a whole document node to exist in, and this is expensive).  
In this case, I'd be darkly suspicious the computed elements are computing 
their contents every time.

I'd be trying
for ...
let $elem1 as element() := ...
let $elem2 as element() := ...


On Mon, Sep 18, 2017 at 8:46 AM, Anastasiou A. 
> wrote:
Hello

Many thanks, Dirk, Fabrice and Graydon.

I was going to look up ways of enabling the server to run as fast as possible 
anyway later on, so it is always good to know how is BaseX “thinking”.

I can see what you mean Graydon. This is a simple nested `for` to denormalise 
some of the structures of the XML file, where “some” is defined by
an XPath expression.

As far as I can tell, there is nothing being re-evaluated repeatedly within the 
inner loop that could be brought outside.

I have gone through the dot plans of the quickest and slowest versions of the 
query and the only thing they differ is in the addition of the CElems.

The “scaling” of the timings, in case it helps, is as follows:

Simple query, returning elements: 1100-1500 ms

Adding an `element` to what is returned just by the innermost `for`: 7500-9311 
ms
This means:
For…
   For….
Return element item{someElement|someOtherElement}

Adding an `element` to the whole block (no `element` to the innermost 
`for`):49000-67000ms
This means:
Element Items{
For…
For…
 Return someElement|someOtherElement
}

Adding an `element` to both places: 5-8ms
This means:
Element Items{
For…
For …
Return element Item {someElement|someOtherElement}
}


I don’t mind the ~8sec time but when we get to 1.5min, then yes…that’s going to 
be a bit annoying.

All the best







From: Graydon Saunders 
[mailto:graydon...@gmail.com]
Sent: 15 September 2017 17:04
To: Anastasiou A.; 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

As a follow-on to Dirk, it's amazing how much of a performance difference it 
can make to use typed variables when you're constructing something for output.  
(So far as I can tell, variables declarations function as an "optimize this!" 
flag for BaseX.)

If you get good performance when you're just throwing the resulting nodes and 
lose it massively by adding structure, as you relate up there somewhere are:
The change was to go from simply returning the nodes themselves with a `return 
thisnode | thatnode |theothernode` to a "formatted" document that has an outer 
 with a number of `return 
{thisNode|thatNode|theOtherNode}` inside it.

my immediate thought was "it's querying the same thing multiple times".

Most programming languages it's good practice to not create variables when you 
can inline.  XQuery does not appear to be one of those languages. :)  I try to 
think of this as "how can I make things easy for the optimizer?"

-- Graydon

On Fri, Sep 15, 2017 at 11:55 AM, Kirsten, Dirk 
> wrote:
Hello Athanasios,

I think you should really check the actual query plan which is executed. If you 
have such a huge spike in performance surely they processor will be executing 
it differently. I don't think looking into file access patterns BaseX 

Re: [basex-talk] Startup hooks or persisting jobs

2017-09-18 Thread Christian Grün
Hi Kendall,

Coincidentally, we had a similar discussion in our team. I have added
a little issue; more feedback is welcome.

Cheers,
Christian

PS: Thanks everyone for keeping the mailing list alive!

[1] https://github.com/BaseXdb/basex/issues/1498



On Mon, Aug 28, 2017 at 9:28 PM, Kendall Shaw  wrote:
> Thanks. I would think that being able to schedule jobs would fit nicely with
> having scheduled jobs persist after restart.
>
>
>
> Kendall
>
>
>
> From: "Kirsten, Dirk" 
> Date: Monday, August 28, 2017 at 12:21 PM
> To: Kendall Shaw , BaseX
> 
> Subject: AW: Startup hooks or persisting jobs
>
>
>
> Hi Kendall,
>
>
>
> there is currently no way to do this using BaseX itself. But I also don’t
> think that should be the job of BaseX. Instead you can write a servlet and
> deploy it using Tomcat which runs some Java application, e.g. which could
> trigger some BaseXX command. See
> http://crunchify.com/how-to-run-java-program-automatically-on-tomcat-startup/
> for an example how to do this.
>
>
>
> Cheers
>
> Dirk
>
>
>
>
> Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht
> Frankfurt am Main - Reg.-Nr.: HRB 105546
> Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel
> Grözinger
>
> Von: basex-talk-boun...@mailman.uni-konstanz.de
> [mailto:basex-talk-boun...@mailman.uni-konstanz.de] Im Auftrag von Kendall
> Shaw
> Gesendet: Montag, 28. August 2017 06:46
> An: BaseX 
> Betreff: [basex-talk] Startup hooks or persisting jobs
>
>
>
> Am I missing an existing way to run xquery at startup (basex web service
> running under tomcat)? I have jobs that I schedule, but I have to schedule
> them again if basex is shutdown.
>
>
>
> I can test for basex being started outside of basex and then execute
> queries, but if there is already a way to do this within basex, I would
> rather do that.
>
>
>
> Kendall
>
>


Re: [basex-talk] Basex Inner Workings

2017-09-18 Thread Graydon Saunders
Sorry for the fumble-fingers; let me try that again.

Have you tried creating literal elements?

Computed elements have overhead; it's presumptively akin to why you don't
want to create untyped variables in XSLT 2.0 and 3.0 (an untyped variable
might be anything and needs a whole document node to exist in, and this is
expensive).  In this case, I'd be darkly suspicious the computed elements
are computing their contents every time.

I'd be trying
for ...
let $elem1 as element() := ...
let $elem2 as element() := ...

{$elem1,$elem2}

instead of the computed element.

The optimizer is really good in BaseX but it's also really complicated; the
local maxima can be quite narrow.


On Mon, Sep 18, 2017 at 8:58 AM, Graydon Saunders 
wrote:

> Have you tried creating literal elements?
>
> Computed elements have overhead; it's presumptively akin to why you don't
> want to create untyped variables in XSLT 2.0 and 3.0 (an untyped variable
> might be anything and needs a whole document node to exist in, and this is
> expensive).  In this case, I'd be darkly suspicious the computed elements
> are computing their contents every time.
>
> I'd be trying
> for ...
> let $elem1 as element() := ...
> let $elem2 as element() := ...
>
>
> On Mon, Sep 18, 2017 at 8:46 AM, Anastasiou A.  > wrote:
>
>> Hello
>>
>>
>>
>> Many thanks, Dirk, Fabrice and Graydon.
>>
>>
>>
>> I was going to look up ways of enabling the server to run as fast as
>> possible anyway later on, so it is always good to know how is BaseX
>> “thinking”.
>>
>>
>>
>> I can see what you mean Graydon. This is a simple nested `for` to
>> denormalise some of the structures of the XML file, where “some” is defined
>> by
>> an XPath expression.
>>
>>
>>
>> As far as I can tell, there is nothing being re-evaluated repeatedly
>> within the inner loop that could be brought outside.
>>
>>
>>
>> I have gone through the dot plans of the quickest and slowest versions of
>> the query and the only thing they differ is in the addition of the CElems.
>>
>>
>>
>> The “scaling” of the timings, in case it helps, is as follows:
>>
>>
>>
>> Simple query, returning elements: 1100-1500 ms
>>
>>
>>
>> Adding an `element` to what is returned just by the innermost `for`:
>> 7500-9311 ms
>>
>> This means:
>>
>> For…
>>
>>For….
>>
>> Return element item{someElement|someOtherElement}
>>
>>
>>
>> Adding an `element` to the whole block (no `element` to the innermost
>> `for`):49000-67000ms
>> This means:
>>
>> Element Items{
>>
>> For…
>>
>> For…
>>
>>  Return someElement|someOtherElement
>>
>> }
>>
>>
>>
>> Adding an `element` to both places: 5-8ms
>>
>> This means:
>>
>> Element Items{
>>
>> For…
>>
>> For …
>>
>> Return element Item {someElement|someOtherElement}
>>
>> }
>>
>>
>>
>>
>>
>> I don’t mind the ~8sec time but when we get to 1.5min, then yes…that’s
>> going to be a bit annoying.
>>
>>
>>
>> All the best
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Graydon Saunders [mailto:graydon...@gmail.com]
>> *Sent:* 15 September 2017 17:04
>> *To:* Anastasiou A.; basex-talk@mailman.uni-konstanz.de
>> *Subject:* Re: [basex-talk] Basex Inner Workings
>>
>>
>>
>> As a follow-on to Dirk, it's amazing how much of a performance difference
>> it can make to use typed variables when you're constructing something for
>> output.  (So far as I can tell, variables declarations function as an
>> "optimize this!" flag for BaseX.)
>>
>>
>>
>> If you get good performance when you're just throwing the resulting nodes
>> and lose it massively by adding structure, as you relate up there somewhere
>> are:
>>
>> *The change was to go from simply returning the nodes themselves with a
>> `return thisnode | thatnode |theothernode` to a "formatted" document that
>> has an outer  with a number of `return
>> {thisNode|thatNode|theOtherNode}` inside it.*
>>
>>
>>
>> my immediate thought was "it's querying the same thing multiple times".
>>
>>
>>
>> Most programming languages it's good practice to not create variables
>> when you can inline.  XQuery does not appear to be one of those languages.
>> :)  I try to think of this as "how can I make things easy for the
>> optimizer?"
>>
>>
>>
>> -- Graydon
>>
>>
>>
>> On Fri, Sep 15, 2017 at 11:55 AM, Kirsten, Dirk 
>> wrote:
>>
>> Hello Athanasios,
>>
>> I think you should really check the actual query plan which is executed.
>> If you have such a huge spike in performance surely they processor will be
>> executing it differently. I don't think looking into file access patterns
>> BaseX internally uses is very useful for an end user. You should let BaseX
>> handle that (but of course, if you find better/more efficient ways I am
>> sure Christian' gladly accepts Pull Requests). But the pattern you describe
>> sounds very much excepted, so reads if you open databases seem logical and
>> short write operations are also expected when just 

Re: [basex-talk] Basex Inner Workings

2017-09-18 Thread Graydon Saunders
Have you tried creating literal elements?

Computed elements have overhead; it's presumptively akin to why you don't
want to create untyped variables in XSLT 2.0 and 3.0 (an untyped variable
might be anything and needs a whole document node to exist in, and this is
expensive).  In this case, I'd be darkly suspicious the computed elements
are computing their contents every time.

I'd be trying
for ...
let $elem1 as element() := ...
let $elem2 as element() := ...


On Mon, Sep 18, 2017 at 8:46 AM, Anastasiou A. 
wrote:

> Hello
>
>
>
> Many thanks, Dirk, Fabrice and Graydon.
>
>
>
> I was going to look up ways of enabling the server to run as fast as
> possible anyway later on, so it is always good to know how is BaseX
> “thinking”.
>
>
>
> I can see what you mean Graydon. This is a simple nested `for` to
> denormalise some of the structures of the XML file, where “some” is defined
> by
> an XPath expression.
>
>
>
> As far as I can tell, there is nothing being re-evaluated repeatedly
> within the inner loop that could be brought outside.
>
>
>
> I have gone through the dot plans of the quickest and slowest versions of
> the query and the only thing they differ is in the addition of the CElems.
>
>
>
> The “scaling” of the timings, in case it helps, is as follows:
>
>
>
> Simple query, returning elements: 1100-1500 ms
>
>
>
> Adding an `element` to what is returned just by the innermost `for`:
> 7500-9311 ms
>
> This means:
>
> For…
>
>For….
>
> Return element item{someElement|someOtherElement}
>
>
>
> Adding an `element` to the whole block (no `element` to the innermost
> `for`):49000-67000ms
> This means:
>
> Element Items{
>
> For…
>
> For…
>
>  Return someElement|someOtherElement
>
> }
>
>
>
> Adding an `element` to both places: 5-8ms
>
> This means:
>
> Element Items{
>
> For…
>
> For …
>
> Return element Item {someElement|someOtherElement}
>
> }
>
>
>
>
>
> I don’t mind the ~8sec time but when we get to 1.5min, then yes…that’s
> going to be a bit annoying.
>
>
>
> All the best
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Graydon Saunders [mailto:graydon...@gmail.com]
> *Sent:* 15 September 2017 17:04
> *To:* Anastasiou A.; basex-talk@mailman.uni-konstanz.de
> *Subject:* Re: [basex-talk] Basex Inner Workings
>
>
>
> As a follow-on to Dirk, it's amazing how much of a performance difference
> it can make to use typed variables when you're constructing something for
> output.  (So far as I can tell, variables declarations function as an
> "optimize this!" flag for BaseX.)
>
>
>
> If you get good performance when you're just throwing the resulting nodes
> and lose it massively by adding structure, as you relate up there somewhere
> are:
>
> *The change was to go from simply returning the nodes themselves with a
> `return thisnode | thatnode |theothernode` to a "formatted" document that
> has an outer  with a number of `return
> {thisNode|thatNode|theOtherNode}` inside it.*
>
>
>
> my immediate thought was "it's querying the same thing multiple times".
>
>
>
> Most programming languages it's good practice to not create variables when
> you can inline.  XQuery does not appear to be one of those languages. :)  I
> try to think of this as "how can I make things easy for the optimizer?"
>
>
>
> -- Graydon
>
>
>
> On Fri, Sep 15, 2017 at 11:55 AM, Kirsten, Dirk 
> wrote:
>
> Hello Athanasios,
>
> I think you should really check the actual query plan which is executed.
> If you have such a huge spike in performance surely they processor will be
> executing it differently. I don't think looking into file access patterns
> BaseX internally uses is very useful for an end user. You should let BaseX
> handle that (but of course, if you find better/more efficient ways I am
> sure Christian' gladly accepts Pull Requests). But the pattern you describe
> sounds very much excepted, so reads if you open databases seem logical and
> short write operations are also expected when just reading a database,
> because e.g. BaseX has to lock the databases.
>
> So I think it would be more useful to look into the query plan. Of course
> you are more than welcome to ask about what is going on there on this list.
> I would expect that because of your rewrite maybe some indexes are not
> applied anymore (or if your rewrite is simply very big that most of the
> time is spent serializing the data).
>
> Cheers
> Dirk
>
>
> Senacor Technologies Aktiengesellschaft - Sitz: Eschborn -
> Amtsgericht Frankfurt am Main - Reg.-Nr.: HRB 105546
> Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender:
> Daniel Grözinger
>
>
> -Ursprüngliche Nachricht-
> Von: basex-talk-boun...@mailman.uni-konstanz.de [
> mailto:basex-talk-boun...@mailman.uni-konstanz.de
> ] Im Auftrag von Fabrice
> ETANCHAUD
> Gesendet: Freitag, 15. September 2017 17:35
> An: 'Anastasiou A.' 

[basex-talk] retrieving the name of the archive?

2017-09-18 Thread Graydon Saunders
Hello --

BaseX will happily consume zip archives; this is just splendid for loading
up a bunch of docx files.

Now I find myself wanting the name of the docx file -- the original name of
the archive -- and I don't know how to retrieve that.  (or if it can be!)
 But I think it must be there somewhere because db:path repeats the
standard OOXML file paths:

[Content_Types].xml
word/document.xml
word/footnotes.xml
word/footer1.xml
word/endnotes.xml
word/theme/theme1.xml
word/settings.xml
docProps/custom.xml
customXml/itemProps2.xml
docProps/app.xml
customXml/item2.xml
customXml/itemProps1.xml
word/fontTable.xml
customXml/item1.xml
customXml/item3.xml
customXml/itemProps3.xml
customXml/item4.xml
customXml/itemProps4.xml
word/numbering.xml
word/styles.xml
word/webSettings.xml
docProps/core.xml
word/people.xml

over and over; if they were all going to exactly that there'd be one copy,
and all several hundred docx files are there by content.  (db:list-details
tells me about > 4000 individual xml files.)

If I can get the name of the original archive, how do I do that?

Thanks!
Graydon


Re: [basex-talk] Basex Inner Workings

2017-09-18 Thread Anastasiou A .
Hello

Many thanks, Dirk, Fabrice and Graydon.

I was going to look up ways of enabling the server to run as fast as possible 
anyway later on, so it is always good to know how is BaseX “thinking”.

I can see what you mean Graydon. This is a simple nested `for` to denormalise 
some of the structures of the XML file, where “some” is defined by
an XPath expression.

As far as I can tell, there is nothing being re-evaluated repeatedly within the 
inner loop that could be brought outside.

I have gone through the dot plans of the quickest and slowest versions of the 
query and the only thing they differ is in the addition of the CElems.

The “scaling” of the timings, in case it helps, is as follows:

Simple query, returning elements: 1100-1500 ms

Adding an `element` to what is returned just by the innermost `for`: 7500-9311 
ms
This means:
For…
   For….
Return element item{someElement|someOtherElement}

Adding an `element` to the whole block (no `element` to the innermost 
`for`):49000-67000ms
This means:
Element Items{
For…
For…
 Return someElement|someOtherElement
}

Adding an `element` to both places: 5-8ms
This means:
Element Items{
For…
For …
Return element Item {someElement|someOtherElement}
}


I don’t mind the ~8sec time but when we get to 1.5min, then yes…that’s going to 
be a bit annoying.

All the best







From: Graydon Saunders [mailto:graydon...@gmail.com]
Sent: 15 September 2017 17:04
To: Anastasiou A.; basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

As a follow-on to Dirk, it's amazing how much of a performance difference it 
can make to use typed variables when you're constructing something for output.  
(So far as I can tell, variables declarations function as an "optimize this!" 
flag for BaseX.)

If you get good performance when you're just throwing the resulting nodes and 
lose it massively by adding structure, as you relate up there somewhere are:
The change was to go from simply returning the nodes themselves with a `return 
thisnode | thatnode |theothernode` to a "formatted" document that has an outer 
 with a number of `return 
{thisNode|thatNode|theOtherNode}` inside it.

my immediate thought was "it's querying the same thing multiple times".

Most programming languages it's good practice to not create variables when you 
can inline.  XQuery does not appear to be one of those languages. :)  I try to 
think of this as "how can I make things easy for the optimizer?"

-- Graydon

On Fri, Sep 15, 2017 at 11:55 AM, Kirsten, Dirk 
> wrote:
Hello Athanasios,

I think you should really check the actual query plan which is executed. If you 
have such a huge spike in performance surely they processor will be executing 
it differently. I don't think looking into file access patterns BaseX 
internally uses is very useful for an end user. You should let BaseX handle 
that (but of course, if you find better/more efficient ways I am sure 
Christian' gladly accepts Pull Requests). But the pattern you describe sounds 
very much excepted, so reads if you open databases seem logical and short write 
operations are also expected when just reading a database, because e.g. BaseX 
has to lock the databases.

So I think it would be more useful to look into the query plan. Of course you 
are more than welcome to ask about what is going on there on this list. I would 
expect that because of your rewrite maybe some indexes are not applied anymore 
(or if your rewrite is simply very big that most of the time is spent 
serializing the data).

Cheers
Dirk


Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht 
Frankfurt am Main - Reg.-Nr.: HRB 105546
Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel 
Grözinger

-Ursprüngliche Nachricht-
Von: 
basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] Im Auftrag von Fabrice 
ETANCHAUD
Gesendet: Freitag, 15. September 2017 17:35
An: 'Anastasiou A.' 
>; 
basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Basex Inner Workings


You can find the time spent in each step in the query info bar graph.

If you are looking for the schema and the facets of your dataset, you should 
have a look at the index module, and for sure at index:facets()

Best regards,
Fabrice

-Message d'origine-
De : Anastasiou A. [mailto:a.anastas...@swansea.ac.uk]
Envoyé : vendredi 15 septembre 2017 17:23 À : Fabrice ETANCHAUD; 
basex-talk@mailman.uni-konstanz.de
Objet : RE: Basex Inner Workings

Thank you Fabrice. I understand.

I have not tried querying from the command prompt or sending the output to a 
file directly, which I could also work with.