Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Andy Bunce
Great!
I believe the "!" option is best ignored...:)

>Note: On the Java platform, this can also be achieved using the flag "!";
this was never formally supported and is likely to be withdrawn in a future
Saxon version. [1]

/Andy
[1] https://www.saxonica.com/html/documentation/functions/fn/matches.html[1]

On 9 August 2018 at 18:54, Christian Grün  wrote:

> > A new snapshot will be available later tonight.
>
> …which is now.
>
> On Thu, Aug 9, 2018 at 7:02 PM Christian Grün 
> wrote:
> >
> > > +1 for the Java flag as this enables \b for word boundaries as
> mentioned here [1]
> >
> > True, I missed that one as well more than once.
> >
> > I’ve just support for Java’s default parser [1,2]. Apart from 'j'
> > (which doesn’t need to be prefixed with a semicolon, as in Saxon), '!'
> > is available as alternative. As it’s not officially documented in
> > Saxon, just keep this one as a secret :)
> >
> > A new snapshot will be available later tonight.
> >
> > [1] https://github.com/BaseXdb/basex/issues/1608
> > [2] http://docs.basex.org/wiki/XQuery_Extensions#Regular_expressions
>


Re: [basex-talk] Add Command: Resource not found

2018-08-09 Thread Christian Grün
Hi Florian,

Thanks for the hint. In fact, it wasn’t you, but BaseX that was acting
wrong. Client-side command parsing was faulty due to the whitespace
that occurred in the string input.

Things have improved in the latest stable snapshot [1]. The release of
BaseX 9.1 is currently scheduled for September.

Best,
Christian

[1] http://files.basex.org/releases/latest/



On Thu, Aug 9, 2018 at 9:28 AM Florian Peschka
 wrote:
>
> Hi all,
>
> I am trying to add content to an index directly (not from a file).
>
> Given the following setup:
>
> ClientSession session = new ClientSession("localhost", 1984, "admin", 
> "admin");
> session.execute(new CreateDB("test"));
> session.execute(new Open("test"));
> session.execute(new Add("test", "te st"));
>
> I receive this error:
>
> Exception in thread "main" org.basex.core.BaseXException: Resource 
> "C:/Users/FloPes/"te st"" not found.
> at org.basex.api.client.ClientSession.receive(ClientSession.java:191)
> at org.basex.api.client.ClientSession.execute(ClientSession.java:160)
> at org.basex.api.client.ClientSession.execute(ClientSession.java:165)
> at org.basex.api.client.Session.execute(Session.java:36)
> at basexerrordemo.BaseXErrorDemo.main(BaseXErrorDemo.java:45)
>
> However, when I change the Add command to send "test" (no space), it 
> works as expected.
>
> What am I doing wrong?
>
> --
> There are two hard things in computer science:
> cache invalidation, naming things, and off-by-one errors.
>
>


Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Christian Grün
> A new snapshot will be available later tonight.

…which is now.

On Thu, Aug 9, 2018 at 7:02 PM Christian Grün  wrote:
>
> > +1 for the Java flag as this enables \b for word boundaries as mentioned 
> > here [1]
>
> True, I missed that one as well more than once.
>
> I’ve just support for Java’s default parser [1,2]. Apart from 'j'
> (which doesn’t need to be prefixed with a semicolon, as in Saxon), '!'
> is available as alternative. As it’s not officially documented in
> Saxon, just keep this one as a secret :)
>
> A new snapshot will be available later tonight.
>
> [1] https://github.com/BaseXdb/basex/issues/1608
> [2] http://docs.basex.org/wiki/XQuery_Extensions#Regular_expressions


Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Christian Grün
> +1 for the Java flag as this enables \b for word boundaries as mentioned here 
> [1]

True, I missed that one as well more than once.

I’ve just support for Java’s default parser [1,2]. Apart from 'j'
(which doesn’t need to be prefixed with a semicolon, as in Saxon), '!'
is available as alternative. As it’s not officially documented in
Saxon, just keep this one as a secret :)

A new snapshot will be available later tonight.

[1] https://github.com/BaseXdb/basex/issues/1608
[2] http://docs.basex.org/wiki/XQuery_Extensions#Regular_expressions


Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Omar Siam
Sorry I got that wrong. I meant XQuery has greedy (the default) and 
reluctant. But not possessive.




Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Murray, Gregory
In https://www.w3.org/TR/xpath-functions-31/#regex-syntax you won't find the 
words "greedy" or "greediness" because the term used is "reluctant 
quantifiers." See section 5.6.1.2.


On 8/9/18, 11:59 AM, "BaseX-Talk on behalf of Omar Siam" 
 
wrote:

Hi!

My point was that greediness is *not* part of the XQuery RegExp 
standard. Java on the other hand has this feature: 

https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#greedy 
and others. And I don't know about Perl, PHP, Python and so on.

What I want to stress is: A beautiful RegExp from the internet may or 
may not work with a particular RegExp implementation.

Nevertheless as Saxon is well integrated in BaseX you can use it to do 
some RegExp work. Just getting data to and from Saxon may be not 
possible depending on the size of what you want to process. Saxon always 
works on a in-memory-representation of the data as far as I know and 
that is not an option with a 2.5 GB XML for example.

Best regards

Omar


Am 09.08.2018 um 16:32 schrieb Andreas Mixich:
> Omar Siam wrote:
>> Using the java regular expression implementation you can use greedy
>> and some other things. The XSL and XQuery implementation according to
>> the standards does not allow this and so misinterpretes the regular
>> expression. See here:
> I checked
>
>> https://www.w3.org/TR/xpath-functions-31/#regex-syntax
> and also the https://www.w3.org/TR/xmlschema-2/#regexs but did not find
> any mention of greediness. But then, I am not sure, whether I understood
> this from latter document:
>
>  A ·regular expression· R is a sequence of characters that denote a
>  set of strings  L(R). When used to constrain a ·lexical space·, a
>  regular expression  R asserts that only strings in L(R) are valid
>  literals for values of that type.
>
> For all ·atom·s S and non-negative integers n, m such that n <= m, valid
> ·piece·s R are:
>   Denoting the set of strings L(R) containing:
> S?
>   the empty string, and all strings in L(S).
>
>
>
> Now I am not quite sure what L(S) means.
>
>> You can tell Saxon to use a different regexp engine such as the
>> standard Java one:
>> https://www.saxonica.com/html/documentation/functions/fn/matches.html
> The hint is much appreciated, though BaseX is my actual development
> target. I just mentioned Saxon and eXist, because I cross checked them
> and found the result to be interesting enough to be taken to the list
> (and still hope, that Christian chimes in and may find a good reason, to
> do it the other way around in opposition to the way it is now)
>





Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Andreas Mixich
Am 09.08.2018 um 16:35 schrieb Christian Grün:
> Thanks, Omar, for the hint to the 'j' flag in Saxon. Sounds enticing;
> I think we can include it in BaseX as well.

Very good news! Thanks a lot!

-- 
Goody Bye, Minden jót, Mit freundlichen Grüßen,
Andreas Mixich



Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Omar Siam

Hi!

My point was that greediness is *not* part of the XQuery RegExp 
standard. Java on the other hand has this feature: 
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#greedy 
and others. And I don't know about Perl, PHP, Python and so on.


What I want to stress is: A beautiful RegExp from the internet may or 
may not work with a particular RegExp implementation.


Nevertheless as Saxon is well integrated in BaseX you can use it to do 
some RegExp work. Just getting data to and from Saxon may be not 
possible depending on the size of what you want to process. Saxon always 
works on a in-memory-representation of the data as far as I know and 
that is not an option with a 2.5 GB XML for example.


Best regards

Omar


Am 09.08.2018 um 16:32 schrieb Andreas Mixich:

Omar Siam wrote:

Using the java regular expression implementation you can use greedy
and some other things. The XSL and XQuery implementation according to
the standards does not allow this and so misinterpretes the regular
expression. See here:

I checked


https://www.w3.org/TR/xpath-functions-31/#regex-syntax

and also the https://www.w3.org/TR/xmlschema-2/#regexs but did not find
any mention of greediness. But then, I am not sure, whether I understood
this from latter document:

 A ·regular expression· R is a sequence of characters that denote a
 set of strings  L(R). When used to constrain a ·lexical space·, a
 regular expression  R asserts that only strings in L(R) are valid
 literals for values of that type.

For all ·atom·s S and non-negative integers n, m such that n <= m, valid
·piece·s R are:
Denoting the set of strings L(R) containing:
S?
the empty string, and all strings in L(S).



Now I am not quite sure what L(S) means.


You can tell Saxon to use a different regexp engine such as the
standard Java one:
https://www.saxonica.com/html/documentation/functions/fn/matches.html

The hint is much appreciated, though BaseX is my actual development
target. I just mentioned Saxon and eXist, because I cross checked them
and found the result to be interesting enough to be taken to the list
(and still hope, that Christian chimes in and may find a good reason, to
do it the other way around in opposition to the way it is now)





[basex-talk] Strategies for using BaseX & friends as XML CMS?

2018-08-09 Thread Andreas Jung
Hi there,

we are currently investigating options for using XML database as a backend in a 
publishing project
related to technical documentation - DITA in particular. The broad customer 
requirements are
checkin and checkout of documents, locking of documents and versioning. Has 
anyone worked with
an XML database in similar scenarios? Any pointers?

Andreas

signature.asc
Description: OpenPGP digital signature


Re: [basex-talk] BaseX-Talk Digest, Vol 104, Issue 15

2018-08-09 Thread Martin Lourduswamy
IZE: false
> >  MAXCATS: 100
> >  MAXLEN: 96
> >  SPLITSIZE: 0
> >
> > Best regards,
> > Michael
> >
> > Beachten Sie, dass Sie uns ab sofort unter einer ge?nderten Rufnummer
> > erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein
> unter *501
> > 65 1*, gefolgt von der gewohnten Durchwahl.
> > Dieses Mail ist ausschlie?lich f?r die Verwendung durch die/den darin
> > genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich
> > gesch?tzte Informationen enthalten, deren Verwendung ohne Genehmigung
> durch
> > den/ die AbsenderIn rechtswidrig sein kann. Falls Sie dieses Mail
> > irrt?mlich erhalten haben, informieren Sie uns bitte und l?schen Sie die
> > Nachricht. UID: ATU 16209706 I
> > https://wien.arbeiterkammer.at/Datenschutz_(DSGVO).html
> >
> -- next part --
> An HTML attachment was scrubbed...
> URL: <http://mailman.uni-konstanz.de/pipermail/basex-talk/
> attachments/20180808/e707b585/attachment-0001.html>
>
> --
>
> Message: 3
> Date: Wed, 8 Aug 2018 22:31:42 +0200 (CEST)
> From: Marc Coenegracht 
> To: "basex-talk@mailman.uni-konstanz.de"
> 
> Subject: [basex-talk] Transaction management in BaseX 8.6.4
> Message-ID: 
> Content-Type: text/plain; format=flowed; charset=US-ASCII
>
> Hi,
>
> A CMS occasionally recreates some existing databases of a production site.
> The databases are deleted and again created with the new content within
> a few seconds.
>
> What happens if a read operation is taking place during this process? Can
> it cause problems with the recreation of the DB or with the BaseX server
> instance?
>
> Of course it is possible to update the databases instead, but this process
> is a lot simpler and probably faster too.
>
> All operations are executed running xquery scripts with REST using the
> BaseX http server.
>
>
> Marc
>
>
> --
>
> Message: 4
> Date: Wed, 8 Aug 2018 23:51:47 +0200
> From: Christian Gr?n 
> To: m...@crosseyed.nl
> Cc: BaseX 
> Subject: Re: [basex-talk] Transaction management in BaseX 8.6.4
> Message-ID:
>  gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Marc,
>
> As one XQuery expression is one transaction, the best approach is to
> define your operations in a single query. If you call db:create, an
> existing database will be overwritten, and the function allows you to
> specify some initial input.
>
> Hope this helps,
> Christian
>
>
>
> On Wed, Aug 8, 2018 at 10:31 PM Marc Coenegracht 
> wrote:
> >
> > Hi,
> >
> > A CMS occasionally recreates some existing databases of a production
> site.
> > The databases are deleted and again created with the new content within
> > a few seconds.
> >
> > What happens if a read operation is taking place during this process? Can
> > it cause problems with the recreation of the DB or with the BaseX server
> > instance?
> >
> > Of course it is possible to update the databases instead, but this
> process
> > is a lot simpler and probably faster too.
> >
> > All operations are executed running xquery scripts with REST using the
> > BaseX http server.
> >
> >
> > Marc
>
>
> --
>
> Message: 5
> Date: Thu, 9 Aug 2018 08:28:59 +0200
> From: Christian Gr?n 
> To: Marc Coenegracht ,   BaseX
> 
> Subject: Re: [basex-talk] Transaction management in BaseX 8.6.4
> Message-ID:
>  gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Marc (cc to the list),
>
> If the database replacement is defined as a single REST operation, you
> won?t encounter any problems; other transactions will need to wait until
> your database has been fully created.
>
> Best,
> Christian
>
>
>
> Marc Coenegracht  schrieb am Do., 9. Aug. 2018, 00:21:
>
> > Hi Christian,
> >
> > Thanks for the quick answer.
> >
> > Defining the operations in a single query would be preferable but isn't
> > possible, since the read operation is simply triggered by a website
> > visitor, and the creation of the new DB (indeed overwriting the old DB
> > with db:create) is performed by an action of the CMS admin.
> >
> > So, these operations can happen at the exact same moment. The
> > inconvenience at the front-end will be minimal, I'm just wondering if
> > these concurrent operations can cause problems with BaseX or the
> > db:create operation.
> >
> > best,
> > Marc
> >
> > On Wed, 8 Aug 2018, Christian Gr?n wrote:
> >
> > > Hi Marc,
> > >
> > > As one XQuery expression is one transaction, the best approach is to
> > > define your operations in a single query. If you call db:create, an
> > > existing database will be overwritten, and the function allows you to
> > > specify some initial input.
> > >
> > > Hope this helps,
> > > Christian
> > >
> > >
> > >
> > > On Wed, Aug 8, 2018 at 10:31 PM Marc Coenegracht 
> > wrote:
> > >>
> > >> Hi,
> > >>
> > >> A CMS occasionally recreates some existing databases of a production
> > site.
> > >> The databases are deleted and again created with the new content
> within
> > >> a few seconds.
> > >>
> > >> What happens if a read operation is taking place during this process?
> > Can
> > >> it cause problems with the recreation of the DB or with the BaseX
> server
> > >> instance?
> > >>
> > >> Of course it is possible to update the databases instead, but this
> > process
> > >> is a lot simpler and probably faster too.
> > >>
> > >> All operations are executed running xquery scripts with REST using the
> > >> BaseX http server.
> > >>
> > >>
> > >> Marc
> > >
> -- next part --
> An HTML attachment was scrubbed...
> URL: <http://mailman.uni-konstanz.de/pipermail/basex-talk/
> attachments/20180809/d28a69de/attachment.html>
>
> End of BaseX-Talk Digest, Vol 104, Issue 15
> ***
>


Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Christian Grün
Thanks, Omar, for the hint to the 'j' flag in Saxon. Sounds enticing; I
think we can include it in BaseX as well.


Omar Siam  schrieb am Mi., 8. Aug. 2018, 12:58:

> Hi
>
> I think the problem is: There are numerous implemetations of regular
> expressions which have a common subset but are different in the more
> advanced features.
>
> Using the java regular expression implementation you can use greedy and
> some other things. The XSL and XQuery implementation according to the
> standards does not allow this and so misinterpretes the regular
> expression. See here:
> https://www.w3.org/TR/xpath-functions-31/#regex-syntax
>
> You can tell Saxon to use a different regexp engine such as the standard
> Java one:
> https://www.saxonica.com/html/documentation/functions/fn/matches.html
>
> Best regards
>
> Omar
>
>
> Am 07.08.2018 um 21:38 schrieb Andreas Mixich:
> > Hi
> >
> > [rfc3986](https://tools.ietf.org/html/rfc3986#appendix-B) defines a nice
> > regular expression, which groups any URI, including URN, by URI
> component.
> >
> > Interesting about this regex is the use of the '?' quantifier which
> > makes every preceding group/component optional, thus matching either an
> > URI or any other(!) string, since anything, that does not match one of
> > the special groups, goes into a catch-all group (no.5), which keeps
> > either the path or the full, arbitrary string. This is neglectable,
> > since the input to this regex is guaranteed to be of the right type
> > (a/@href/string()).
> >
> > Here is the relevant part from the RFC.
> >
> >Appendix B
> >
> >^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
> >   123  4  5   6  78 9
> >
> >   The numbers in the second line above are only to assist
> >   readability; they indicate the reference points for each
> >   subexpression (i.e., each paired parenthesis).  We refer to the
> >   value matched for subexpression  as $.  For example, matching
> >   the above expression to
> >
> >  http://www.ics.uci.edu/pub/ietf/uri/#Related
> >
> >   results in the following subexpression matches:
> >
> >  $1 = http:
> >  $2 = http
> >  $3 = //www.ics.uci.edu
> >  $4 = www.ics.uci.edu
> >  $5 = /pub/ietf/uri/
> >  $6 = 
> >  $7 = 
> >  $8 = #Related
> >  $9 = Related
> >
> >   where  indicates that the component is not present,
> >   as is the case for the query component in the above example.
> >   Therefore, we can determine the value of the five components as
> >
> >  scheme= $2
> >  authority = $4
> >  path  = $5
> >  query = $7
> >  fragment  = $9
> >
> >   Going in the opposite direction, we can recreate a URI reference
> >   from its components by using the algorithm of Section 5.3.
> >
> >
> > I tested this regex with Saxon, eXist and BaseX. eXist successfully
> > parsed all the test-cases, I threw at it, into the right groups, Saxon
> > and BaseX did not. The failure is:
> >
> >  [FORX0003] Pattern matches empty string..
> >
> > And that got me baffled, since all three processors use Java underneath
> > and since the definition of the '?' quantifier, when used like this,
> > seems to be:
> >
> >  Makes the preceding item optional. Greedy, so the optional item
> >  is included in the match if possible.
> >
> > Which means, that *if* any of the group's contents match, they should be
> > included, rather than producing an empty string.
> >
> > Why is it like that? And what can I do about it? I found no other URI
> > parsing regex, that componentizes this way and would be compatible with
> > XQuery.
> >
> > See, attached, a test-case.
> >
>
>


Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-09 Thread Andreas Mixich
Omar Siam wrote:
> Using the java regular expression implementation you can use greedy
> and some other things. The XSL and XQuery implementation according to
> the standards does not allow this and so misinterpretes the regular
> expression. See here: 

I checked

> https://www.w3.org/TR/xpath-functions-31/#regex-syntax

and also the https://www.w3.org/TR/xmlschema-2/#regexs but did not find
any mention of greediness. But then, I am not sure, whether I understood
this from latter document:

A ·regular expression· R is a sequence of characters that denote a
set of strings  L(R). When used to constrain a ·lexical space·, a
regular expression  R asserts that only strings in L(R) are valid
literals for values of that type.

For all ·atom·s S and non-negative integers n, m such that n <= m, valid
·piece·s R are:
Denoting the set of strings L(R) containing:
S?
the empty string, and all strings in L(S).



Now I am not quite sure what L(S) means.

> You can tell Saxon to use a different regexp engine such as the
> standard Java one:
> https://www.saxonica.com/html/documentation/functions/fn/matches.html

The hint is much appreciated, though BaseX is my actual development
target. I just mentioned Saxon and eXist, because I cross checked them
and found the result to be interesting enough to be taken to the list
(and still hope, that Christian chimes in and may find a good reason, to
do it the other way around in opposition to the way it is now)

-- 
Goody Bye, Minden jót, Mit freundlichen Grüßen,
Andreas Mixich



[basex-talk] Add Command: Resource not found

2018-08-09 Thread Florian Peschka

  
  
Hi all,

I am trying to add content to an index directly (not from a file).

Given the following setup:

        ClientSession session = new ClientSession("localhost", 1984,
"admin", "admin");
        session.execute(new CreateDB("test"));
        session.execute(new Open("test"));
        session.execute(new Add("test", "te
st"));

I receive this error:

Exception in thread "main" org.basex.core.BaseXException: Resource
"C:/Users/FloPes/"te st"" not found.
    at
org.basex.api.client.ClientSession.receive(ClientSession.java:191)
    at
org.basex.api.client.ClientSession.execute(ClientSession.java:160)
    at
org.basex.api.client.ClientSession.execute(ClientSession.java:165)
    at org.basex.api.client.Session.execute(Session.java:36)
    at basexerrordemo.BaseXErrorDemo.main(BaseXErrorDemo.java:45)

However, when I change the Add command to send
"test" (no space), it works as expected.

What am I doing wrong?

--
  There are two hard things in computer science:
cache invalidation, naming things, and off-by-one errors.



  




Re: [basex-talk] Transaction management in BaseX 8.6.4

2018-08-09 Thread Christian Grün
Hi Marc (cc to the list),

If the database replacement is defined as a single REST operation, you
won’t encounter any problems; other transactions will need to wait until
your database has been fully created.

Best,
Christian



Marc Coenegracht  schrieb am Do., 9. Aug. 2018, 00:21:

> Hi Christian,
>
> Thanks for the quick answer.
>
> Defining the operations in a single query would be preferable but isn't
> possible, since the read operation is simply triggered by a website
> visitor, and the creation of the new DB (indeed overwriting the old DB
> with db:create) is performed by an action of the CMS admin.
>
> So, these operations can happen at the exact same moment. The
> inconvenience at the front-end will be minimal, I'm just wondering if
> these concurrent operations can cause problems with BaseX or the
> db:create operation.
>
> best,
> Marc
>
> On Wed, 8 Aug 2018, Christian Grün wrote:
>
> > Hi Marc,
> >
> > As one XQuery expression is one transaction, the best approach is to
> > define your operations in a single query. If you call db:create, an
> > existing database will be overwritten, and the function allows you to
> > specify some initial input.
> >
> > Hope this helps,
> > Christian
> >
> >
> >
> > On Wed, Aug 8, 2018 at 10:31 PM Marc Coenegracht 
> wrote:
> >>
> >> Hi,
> >>
> >> A CMS occasionally recreates some existing databases of a production
> site.
> >> The databases are deleted and again created with the new content within
> >> a few seconds.
> >>
> >> What happens if a read operation is taking place during this process?
> Can
> >> it cause problems with the recreation of the DB or with the BaseX server
> >> instance?
> >>
> >> Of course it is possible to update the databases instead, but this
> process
> >> is a lot simpler and probably faster too.
> >>
> >> All operations are executed running xquery scripts with REST using the
> >> BaseX http server.
> >>
> >>
> >> Marc
> >