[basex-talk] Prolog and XQuery

2023-06-28 Thread Ben Engbers

Hi,

My ultimate goal is to investigate the advantages of using Prolog and 
XQuery together when querying XML files. In doing so, I want to take 
advantage of BaseX's XQuery engine.
Since there is no Prolog client, I initially started writing a Prolog 
client. But in SWI-prolog, I did not manage to connect using the 
webconnection tools. So I first started to write a C++ client and then 
start writing a bridge in Prolog that uses that client. The starting 
point for the C++ client was the RBaseX client I published earlier. I 
converted all functionalities from that client to C++. A preliminary 
version of the C++ client can be found at 
https://github.com/BenEngbers/BasexCpp. As soon as I manage to create a 
real shared object using CMake I will publish a final version.


Regarding the server protocol for BaseX, however, I still have a 
question. According to that protocol, the ADD command accepts 4 
arguments ({09 {name} {path} {input} ). However, neither in R nor in C++ 
have I succeeded in using the {name} argument.  My question is whether 
this is a bug in the protocol?


I don't know if there is a need for a C++ client that implements the 
full server protocol but in any case I enjoyed working on this project.


Have fun,

Ben Engbers


Re: [basex-talk] BaseX and Fedora 38

2023-04-28 Thread Ben Engbers
After entering "~/basex/bin/basexgui &" in a terminal, BasexGUI started 
as usual.
After quiting BaseX I could restart Basex in the usual way (from the 
Gnome Shell). But after a cold reboot starting Basex from the Gnome 
shell resulted again in a logoff.


I've reported this as a bug to Fedora.

Greetings,
Ben

Op 27-04-2023 om 20:29 schreef Ben Engbers:

My linux-session.

Since basex 103 gave no problems I guess that maybe they changed the 
Jave-version that is installed? I don't know. I made a bug-report in 
Bugzilla and will let you know what the say.


For all other Fedora users, be warned!


Re: [basex-talk] BaseX and Fedora 38

2023-04-27 Thread Ben Engbers

My linux-session.

Since basex 103 gave no problems I guess that maybe they changed the 
Jave-version that is installed? I don't know. I made a bug-report in 
Bugzilla and will let you know what the say.


For all other Fedora users, be warned!

Ben

Op 27-04-2023 om 18:56 schreef Christian Grün:

Hi Ben,


my session logs off.


What kind of session is this?

Groetjes,
Christian


[basex-talk] BaseX and Fedora 38

2023-04-27 Thread Ben Engbers

Hi,

Today I upgraded my PC to linux Fedora 38. The only problem I 
encountered is that when trying to start basexgui, my session logs off.
I can start the server and the RbaseX-client works well so it probably 
is only the GUI that's giving problems.


Is it possible to see what's causing the failure?

Ben Engbers


[basex-talk] Binding multiple items to 1 variable (server protocol)

2023-04-27 Thread Ben Engbers

Hi,

My ultimate goal is to investigate, using a SWI-Prolog client and Basex, 
whether a combined use of Prolog and XQuery offers advantages. To write 
a Prolog client, I first had to learn C++ and the spin-off from that is 
that I am now almost done writing a C++ client.
As with writing the R client at the time, I have questions when applying 
the 'Bind' command.

The server protocol contains the following sentence:
"the two items xs:integer(123) and xs:string('ABC') are encoded as 123, 
\02, xs:integer, \01, ABC, \02, xs:string and \00"
Does this mean that multiple items of different types can be bound to 
one (1) {name} variable?
If so, in what situation could this be applied? Where can I find an 
example FLWOR query that uses this?


Ben Engbers


Re: [basex-talk] Socket specifications?

2023-01-21 Thread Ben Engbers

Thanks,

In the meantime I have narrowed my problem down to the code that reads 
from the socket.

I'll see if this works better when usen poll() instead of select().

Ben

Op 20-01-2023 om 19:20 schreef Liam R. E. Quin:

On Fri, 2023-01-20 at 18:31 +0100, Ben Engbers wrote:



Whether reading from a socket is non-blocking is a function of the API
you use on the client, not the server end.

I didn't know



[basex-talk] Socket specifications?

2023-01-20 Thread Ben Engbers

Hi,

When I developed the Basex client for R, I ran into problems with the 
socket for a long time. In the end it turned out that in R I had to 
configure the socket as a non-blocking socket. This solved all 
performance issues!
I am now trying to develop a client for SWI-prolog. Because that 
low-level compiler doesn't support using sockets enough, I need to 
develop a library in C++ first. And in doing so, I again run into 
problems with the socket.


The basex documentation just says to use a socket. But there is no 
information on how to configure the socket itself.


My question is how do I configure the client side of the socket for 
optimal use?


Ben Engbers


[basex-talk] RBaseX version 1.1.2

2022-12-02 Thread Ben Engbers

Hi,

Version 1.1.2 of RBaseX-client has been accepted by CRAN.
Differences with version 1.1.1 are that 'Store' and 'Replace' have been 
replaced by 'put' and 'putBinary' and that now the tests have to been 
executed with Test/testBasex credentials.


The daily download-average of RBaseX is 10. But since I haven't received 
any feedback yet, I don't know to what extent this package contributes 
to Basex's popularity.


Ben Engbers


[basex-talk] Client protocol updated

2022-11-04 Thread Ben Engbers

Hi,

Thanks to the mail from Erik Peterson on the basex client 
implementation, I learned that the client protocol has been updated and 
that 'replace/store' have been renamed to 'put/putBinary'.


In my RBaseX-client, I used the 'retrieve' command to read binary data. 
I can't remember where I found that 'retrieve' was used for retrieving 
binary data. But while implementing 'put/putBinary' I noticed that 
'retrieve' is no longer accepted as command.


I have added a remark on this to the Clients page.

Ben Engbers
PS. RBaseX is loaded on average 10 times a day. I have never had any 
feedback on this package so I don't know in howfar it is really used.

Any feedback is welcome!


Re: [basex-talk] Client auth debugging

2022-10-25 Thread Ben Engbers
Any information on the platform, programming language and the way in which the 
socket is opened would be helpfull

Ben

Erik Peterson  schreef op 26 oktober 2022 00:42:41 CEST:
>I'm implementing a client per basex api shown here:
>https://docs.basex.org/wiki/Server_Protocol#Authentication
>
>I'm working on digest auth and I can get back the realm and timestamp. I'm
>getting an access denied however when I send the username and token. My
>questions are:
>
>1) What tips are there for debugging authentication implementation? I've
>set logs.debug to true in .basex and and tail them. I can see access denied
>but there's no info to help me debug. I'd like to at least see what the
>server is receiving. Any way to do that? BTW, my token creation is correct
>per the example in the docs.
>
>2) It's taking a long time, 60s, for the test to run. Any way to speed that
>up?


[basex-talk] Closing a socketconnection

2022-04-13 Thread Ben Engbers

Hi,

While reading the basexdbc.c code from Alexander Holupirek, I saw that 
he explicitly sends the 'exit' command to the server before closing the 
socket. I couldn't find anything on this command in the client server 
protocol.


Is it necessary to send this command to the server? If so what is the 
effect of sending this command?


Ben Engbers


[basex-talk] RBaseX-client

2022-03-25 Thread Ben Engbers

Hi,

Version 1.1.1 is finally stable and available at CRAN 
(https://cran.r-project.org/package=RBaseX).
Hopefully this blog 
"https://r-posts.com/rbasex-a-basex-client-written-in-r/; will 
contribute to more support for BaseX from the R community.


Ben Engbers


Re: [basex-talk] How to return/use the value of a nested counter?

2022-03-10 Thread Ben Engbers

Hi,

In the GUI, I couldn't see if all the //al/text() elements  were really 
displayed as one (1) concatenated objected or just repeated.
Only after importing the result to an R-dataframe, I saw that 
//al/text() was displayed as separate elements.
Adding 'fn:string-join($Beurt//al/text(), "")' to the statement did 
the trick.


Ben

for $Debat in collection("Parl_Test")
  let $debate-id := fn:analyze-string(
 $Debat/officiele-publicatie/metadata/meta/@content, 
"(\\d{8}-\\d*-\\d*)")//fn:match/*:group[@nr="1"]/text()

  for $Beurt at $CountInner in $Debat//spreekbeurt
let $tekst := fn:string-join($Beurt//al/text(), "")
order by $debate-id
return($debate-id, $CountInner, $tekst)

Op 09-03-2022 om 22:46 schreef Ben Engbers:

Hi

for $Debat at $CountOuter in collection("Parl_Test")
     (: where $CountOuter <= 3:)
   let $debate-id := fn:analyze-string(
     $Debat/officiele-publicatie/metadata/meta/@content, 
"(\d{8}-\d*-\d*)")//fn:match/*:group[@nr="1"]/text()

     order by $debate-id
     for $Beurt at $CountInner in $Debat//spreekbeurt
    let $tekst := $Beurt//al/text()
     return($debate-id, $CountInner, $tekst)

:-)

Ben
Op 09-03-2022 om 17:45 schreef Zimmel, Daniel:

Are you simply counting the wrong items?

It seems to me you wanted to count:
for $Beurt at $CountInner in $Debat//spreekbeurt

Daniel


Re: [basex-talk] How to return/use the value of a nested counter?

2022-03-09 Thread Ben Engbers

Hi

for $Debat at $CountOuter in collection("Parl_Test")
(: where $CountOuter <= 3:)
  let $debate-id := fn:analyze-string(
$Debat/officiele-publicatie/metadata/meta/@content, 
"(\d{8}-\d*-\d*)")//fn:match/*:group[@nr="1"]/text()

order by $debate-id
for $Beurt at $CountInner in $Debat//spreekbeurt
   let $tekst := $Beurt//al/text()
return($debate-id, $CountInner, $tekst)

:-)

Ben
Op 09-03-2022 om 17:45 schreef Zimmel, Daniel:

Are you simply counting the wrong items?

It seems to me you wanted to count:
  
	for $Beurt at $CountInner in $Debat//spreekbeurt


Daniel


[basex-talk] XQuery versus XSL

2022-03-09 Thread Ben Engbers

Hi,

I have a collection of 740 documents with the following structure:


xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:noNamespaceSchemaLocation="http://technische-documentatie.oep.overheid.nl/schema/op-xsd-2012-1;>

  
content="https://zoek.officielebekendmakingen.nl/h-tk-20202021-102-2/metadata.xml; 
/>

  
  

  

  
Allereerst hebben we het traditionele mondelinge 
vragenuur. 

  

  
  

  
Voorzitter. Het was altijd al een eer om hier te 
staan.

  
  
De vragen die ik ga stellen, gaan over stikstof.
  
  
We zijn allemaal 100 kilometer per uur gaan rijden, 
maar er is nog geen gram ammoniak uit de veehouderij minder 
uitgestoten.

  

  
  
  

  
U heeft helaas maar één vraag, meneer Ephraim, als 
Groep Van Haga.

  
  
Ik wil de minister bedanken voor haar beantwoording.
  

  

  


I want to experiment with textmining and for these experiments, it would 
be usefull if for every , all /text() elements were 
concated.The first option is to use XQuery for concatenating.


Another option is to use XSL to transform the original documents to the 
following structure:



xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:noNamespaceSchemaLocation="http://technische-documentatie.oep.overheid.nl/schema/op-xsd-2012-1;>

  
content="https://zoek.officielebekendmakingen.nl/h-tk-20202021-102-2/metadata.xml; 
/>

  
  

  

  Allereerst hebben we het traditionele mondelinge vragenuur.

  
  

  Voorzitter. Het was altijd al een eer om hier te staan.
  De vragen die ik ga stellen, gaan over stikstof.
  We zijn allemaal 100 kilometer per uur gaan rijden, maar er 
is nog geen gram ammoniak uit de veehouderij minder uitgestoten.


  
  
  

  U heeft helaas maar één vraag, meneer Ephraim, als Groep Van 
Haga.

  Ik wil de minister bedanken voor haar beantwoording.

  

  


Question:
What are the pros and cons of both methods?
Is it difficult to do this in XSL (I have only used very simple 
transformations)?


Ben



[basex-talk] How to return/use the value of a nested counter?

2022-03-09 Thread Ben Engbers

Hi,

I have a collection of 740 documents with the following structure:


xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:noNamespaceSchemaLocation="http://technische-documentatie.oep.overheid.nl/schema/op-xsd-2012-1;>

  
content="https://zoek.officielebekendmakingen.nl/h-tk-20202021-102-2/metadata.xml; 
/>

  
  

  

  
Allereerst hebben we het traditionele mondelinge 
vragenuur. 

  

  
  

  
Voorzitter. Het was altijd al een eer om hier te 
staan.

  
  
De vragen die ik ga stellen, gaan over stikstof.
  
  
We zijn allemaal 100 kilometer per uur gaan rijden, 
maar er is nog geen gram ammoniak uit de veehouderij minder 
uitgestoten.

  

  
  
  

  
U heeft helaas maar één vraag, meneer Ephraim, als 
Groep Van Haga.

  
  
Ik wil de minister bedanken voor haar beantwoording.
  

  

  


I am trying to concatenate all the // childs from  
elements. Together with an ID that I construct from //meta/@content and 
a counter for , I want this output:


20202021-102-2, 1, Allereerst ...
20202021-102-2, 2, Voorzitter... + De vragen ... + We zijn ...
20202021-102-2, 3, U heeft ... + Ik wil..

I expected that the following XQuery-statemnt would do it.

import module namespace  functx = "http://www.functx.com;;

for $Debat at $CountOuter in collection("Parliament")
(: where $CountOuter <= 3:)
  let $debate-id := fn:analyze-string(
$Debat/officiele-publicatie/metadata/meta/@content, 
"(\d{8}-\d*-\d*)")//fn:match/*:group[@nr="1"]/text()

order by $debate-id
for $Beurt at $CountInner in $Debat
   let $tekst := $Beurt//spreekbeurt//al/text()
return($debate-id, $CountInner, $tekst)

Instead it returns:
20202021-102-2, 1, Allereerst ...+ Voorzitter... + De vragen ... + We 
zijn ... + U heeft ... + Ik wil..


How can I use the value of $CountInner?

Ben


Re: [basex-talk] Develop a module, HOWTO

2022-02-25 Thread Ben Engbers

The link to expath is new to me!

Thanx

Ben

Op 25-02-2022 om 14:35 schreef Bridger Dyson-Smith:

Hi Ben,


On Fri, Feb 25, 2022 at 8:25 AM Ben Engbers <mailto:ben.engb...@be-logical.nl>> wrote:


Hi,

I know that it is possible to create a module with functions (I have
even done that once), but I can't find the documentation anymore on how
to do that.
Could please someone provide the URL to this information?

Here is the specification: http://expath.org/spec/pkg 
<http://expath.org/spec/pkg>
and the BaseX-specific documentation: 
https://docs.basex.org/wiki/Repository 
<https://docs.basex.org/wiki/Repository>


If those aren't the pages you're thinking of, please say! :)

Thanks,

Ben Engbers


HTH

best,
Bridger


[basex-talk] Develop a module, HOWTO

2022-02-25 Thread Ben Engbers

Hi,

I know that it is possible to create a module with functions (I have 
even done that once), but I can't find the documentation anymore on how 
to do that.

Could please someone provide the URL to this information?

Thanks,

Ben Engbers


Re: [basex-talk] string-join with a newline separator?

2022-02-24 Thread Ben Engbers

Ok, at least in the GUI using  as separator works.
Is this a HTML-specific separator?


I use 'string-join' in R in the following statement:
Query_Stmt <-paste(
'import module namespace  functx = "http://www.functx.com;;',
'for $Debat at $CountOuter in collection("Parliament"),',
'$Turn in collection("Parliament")',
'where $Turn/officiele-publicatie/metadata/meta/@content = 
$Debat/officiele-publicatie/metadata/meta/@content',

'and $CountOuter <=2',
'  let $debate-id := fn:analyze-string(',
'$Debat/officiele-publicatie/metadata/meta/@content, 
"(\\d{8}-\\d*-\\d*)")//fn:match/*:group[@nr="1"]/text()',
'  for $Speach at $CountInner in 
$Turn/officiele-publicatie/handelingen/agendapunt/spreekbeurt',

'let $Spreker := $Speach/spreker/naam/achternaam/text()',
'let $Pol := $Speach/spreker/politiek/text()',
'order by $debate-id, $CountInner',
'for $par at $CountPar in 
$Turn/officiele-publicatie/handelingen/agendapunt/spreekbeurt/tekst',

'   let $tekst := fn:string-join(fn:data($par//al/text()), "")',
'return($debate-id, $Spreker, ($Pol, "n.v.t")[1], $CountPar, $tekst)'

When I use "." as item separator, this statement returns:
$debate-id, $Spreker, ($Pol, "n.v.t")[1], $CountPar, $tekst1.$tekst2
But when I use "" it returns:
$debate-id, $Spreker, ($Pol, "n.v.t")[1], $CountPar, $tekst1
NA  NANA  NA $tekst2

(NA means Not Available)

So in R the  is interpreted as a splitter. ;-(

I'll take a look at this and will let you know if I can find a solution.

Ben



[basex-talk] string-join with a newline separator?

2022-02-24 Thread Ben Engbers

Hi,

My xml has the structure

  

  bla

  
  

  bla


  bla

  


The  element contains 1 to many  elements.

let $tekst := fn:string-join(fn:data($par//al/text()), ".") concatenates 
this to:

bla.bla.bla
But I want it to return:
bla
bla
bla

Is it possible to add a newline item-separator to fn:string-join?

Ben Engbers


Re: [basex-talk] Content is not allowed in prolog

2022-02-23 Thread Ben Engbers

Hi Christian,

I have added “The input can be a UTF-8 encoded XML document, a binary 
resource, or any other data (such as JSON or CSV) that can be 
successfully converted to a resource by the server.” to my documentation.

Create() add(), replace() and store() now all use
exec <- c(as.raw(), addVoid(name), addVoid(input_to_raw(input))) 
as basic pattern.
The 'Execute' command has been renamed to 'Execute' (better alignment 
with the server protocol)


Op 23-02-2022 om 11:18 schreef Christian Grün:

Hi Ben,



(Writing a test took half an hour ;-()


Good tests are sometimes more valuable than the implementation itself ;)
I have looked at QueryParser.java and probably it should not be that 
difficult to convert to an R-version (it is still a lot of work).


Do you have a test-set with xquery-statements?

Cheers, Ben


Re: [basex-talk] Content is not allowed in prolog

2022-02-22 Thread Ben Engbers



Op 22-02-2022 om 18:39 schreef Christian Grün:

So you distinguish a XML-DOCUMENT from a XML-FILE and that was something
I didn't know.


I guess so. Do we use these two terms in our documentation? 
I don't know. If I find places where it is confusing (at least for me), 
I'll let you know


Or did you

want to point out that you used “document” and “files” for describing
the same thing in our conversation?
No, they are different. A 'file' lives on the file-system (and a 
file-pointer points to a file). A 'document' however lives in the 
memory. It can for example be a string which is constructed by Xquery by 
adding elements or attributes to the result of a query or by writing 
valid xml-code with a text-editor.

I thought that the client could deal with both files and documents.




Are there more places in the server protocol where this difference is
relevant?
Could you please make a note of this in the documentation for the server
protocol?


We’ll be glad to improve the documentation. I’m not sure which of the
formulations were misleading to you, so feel free to share them with
us.

From the server protocol (https://docs.basex.org/wiki/Server_Protocol)
Command Protocol

The following byte sequences are sent and received from the client 
(please note that a specific client may not support all of the presented 
commands):

Command Client Request  Description
COMMAND {command}Executes a database command.
QUERY 	\00 {query} 	 Creates a new query instance and 
returns its id.
CREATE 	\08 {name} {input} 	 Creates a new database with the 
specified input (may be empty).

ADD \09 {name} {path} {input}   {Adds a new resource to the opened 
database.
REPLACE \0C {path} {input}  Replaces a resource with the specified 
input.
STORE 	\0D {path} {input} 	Stores a binary resource in the 
opened database.


Everywhere where you use 'input', It is unclear what is valid input, a 
file or a document?



I already have this function which checks if input is already a raw

vector or if the input can be transformed into a vector.


Is "raw vector" a byte array or something else? What does is.VALID do?

A raw vector is a Bytearray.
is.Valid is a set of regular expressions. It checks if a URL is valid 
(https://asf.dfg.dfhg/ is valid. htp:/ery/ery is not). In R, before 
being able to read from the URL (httr::GET(input)) I had to check wether 
the URL was valid.





Feature request: Could you implement the same functionality in the
server protocol?


I’m hesitant to change the server protocol at this stage, as almost
all other client bindings are based on the current definitions, and
would possibly need to be updated. But maybe you need to get more
specific in your wording (or it’s my task to spend more time and find
out what you mean):

The "protocol" is the set of rules that are implemented by the various
bindings to communicate with the server. If you say we should
implement the functionality in the protocol, would you like to see new
rules added? Or would you expect the server-side implementation of the
protocol rules to check if the input for a CREATE command can be
interpreted as file reference?
I understand. I don't believe you really have to update the protocol. It 
is only the client that needs to be updated.


As said before, I consistenly use this pattern:
exec <- c(as.raw(0x08), addVoid(name), addVoid(input))

It took me 2 minutes to change this into:
raw_input <- input_to_raw(input)
exec <- c(as.raw(0x08), addVoid(name), addVoid(raw_input))

Now I can use session$Create() with a document, an URL or a file-descriptor.

(Writing a test took half an hour ;-()



I think we shouldn’t resolve client file references on the server, as
clients and servers usually reside on different machines. You can
provide file paths with CREATE DB, but the only reason is that this
command was initially designed to work with the standalone version of
BaseX. We even had thoughts on rejecting local file references if they
are passed on by a client.


I think BaseX is an excellent standalone tool for xquery and xml-related 
applications...


Hope this helps,

Cheers,
Ben


Re: [basex-talk] Content is not allowed in prolog

2022-02-22 Thread Ben Engbers



Op 22-02-2022 om 16:58 schreef Christian Grün:

(On close reading I see that "Session$Execute(paste("Create

db", DB_Name, Single_File))" should have been
"Session$Execute(paste("Create db", DB_Name, Single_File))"
The "paste() function just concatenates the strings)

Does that solve your problem?

No, this line executed without problems.



The BaseX user command CREATE DB differs from the technical CREATE
command that’s defined in the server protocol. With the latter one,
the optional input must be a (single) XML document. The reason is that
the client usually resides on a different system than the server, and
specifying a file path wouldn’t work.


 That sounds better!!!

This works:
Session$Create(DB_Name, "Content 1")

"Database 'Parl_Test' gemaakt in 8.64 ms."

So you distinguish a XML-DOCUMENT from a XML-FILE and that was something 
I didn't know.
Are there more places in the server protocol where this difference is 
relevant?


Could you please make a note of this in the documentation for the server 
protocol?


I already have this function which checks if input is already a raw 
vector or if the input can be transformed into a vector. Even with 
limited R-knowledge this shpuld be readable ;-)


input_to_raw <- function(input) {
  type <- typeof(input)
  switch (type,
  "raw"   = raw_input <- input,   # Raw
  "character" = {
if (input == "") {# Empty input
  raw_input <- raw(0)
} else if (file.exists(input)) {  # File on filesystem
  finfo <- file.info(input)
  toread <- file(input, "rb")
  raw_input <- readBin(toread, what = "raw", size = 1, n = 
finfo$size)

  close(toread)
} else if (is.VALID(input)) {
  get_URL <- httr::GET(input)
  raw_input <- get_URL$content
}
else {# String
  raw_input <- charToRaw(input)
}
  },
  default = stop("Unknown input-type, please report the type of 
the input."

 )
  )
  return(raw_input)
}

I'll see if I can use this function in Session$Create().

Feature request: Could you implement the same functionality in the 
server protocol?


Cheers,
Ben


Re: [basex-talk] Content is not allowed in prolog

2022-02-22 Thread Ben Engbers
I don't believe that the problem is R-related. It is probably more a 
misunderstanding from my side.


I looked at https://docs.basex.org/wiki/Commands#CREATE_DB.
According to that page, it is possible to create a db with all the 
documents in the input-directory (i.e XML-Files) or with one initial 
document (On close reading I see that "Session$Execute(paste("Create 
db", DB_Name, Single_File))" should have been 
"Session$Execute(paste("Create db", DB_Name, "Single", Single_File))"

The "paste() function just concatenates the strings)

My guess was that the some conventions for specifying input would also 
be valid for the Sessipn$Create() command.


That is still my question?

Ben

Op 22-02-2022 om 16:30 schreef Christian Grün:

My R knowledge is very limited, so it’s difficult to give you advice
(maybe someone else can).

Does "XML_Files" mean that you are trying to pass on more than a
single document?


Re: [basex-talk] Content is not allowed in prolog

2022-02-22 Thread Ben Engbers

Yes  I did ;-)

Both commands use the same set of xml-files. 
Session$Execute(paste("Create db", DB_Name, XML_Files)) accepts them.

Session$Create(DB_Name, XML_Files) don't

Ben

Op 22-02-2022 om 16:15 schreef Christian Grün:

Hi Ben,


The server protocol does not specify the format that is to be used for input.


In order to understand the syntax of "{input}", you can have a look at
the Conventions paragraph:

   {...}: utf8 strings or raw data, suffixed with a \00 byte. To avoid
confusion with this end-of-string byte, all transferred \00 and \FF
bytes are prefixed by an additional \FF byte.

Maybe you don’t take care of 00 and FF bytes in the input yet?

Best,
Christian


Re: [basex-talk] Content is not allowed in prolog

2022-02-22 Thread Ben Engbers

Hi Christian,

There are two differences between the server protocol and my implementation.
1 I use "Execute" instead of "Command" as in the command protocol (When 
I started with this project I thought of it as "Executing" a Command. It 
is still possible to change Execute to Command if you prefer that).
2 I introduced a little bit of scripting. The last byte of the response 
indicates success or failure. When the 'intercept' that I introduced is 
set to TRUE, the success indicator can be used in a R-script to avoid 
abortion (a very basic form of exception handling and scripting)


Apart from that I have followed the server protocol to the letter.
ALL commands from the command - and the query protocol are implemented 
and follow this pattern:

exec <- c(as.raw(0x09), addVoid(path), addVoid(input_to_raw(input)))
response <- private$sock$handShake(exec) %>% split_Response()

All input-parameters are converted to a raw vector and each parameter 
has a 00 appended. Together with the preceding byte, this is sent to the 
server.
The server returns a raw vector. This vector is splitted on 00. The last 
byte of the response indicates success.


R6, the R object orientation system I used does not know polymorphism 
but copying the Java source to R6 was not very difficult.


I am now really using the package. And it is now that I sometimes see 
bugs but this is the first bug I don't understand.
According to the protocol and the general BaseX documentation, there are 
two ways to create a database. 1) you can send a specific "Create" 
command (preceding byte is \08 or 2) you can execute a "Create db" 
command (no preceding byte).


These variables are used in the examples:
DB_Name <- "Parl_Test"
XML_Files <- system.file("extdata", "xml_files", package="RBaseX")
Single_File <- paste(XML_Files, "h-tk-20202021-102-12.xml", sep="/")

Session$Execute(paste("Create db", DB_Name, Single_File)) # => success
Session$Execute(paste("Create db", DB_Name, XML_Files))   # => success

Session$Create(DB_Name)   # => success

Session$Create(DB_Name, Single_File) # => error
Session$Create(DB_Name, XML_Files)   # => error

The server protocol does not specify the format that is to be used for 
input. It only says that input may be empty.

Do I use the wrong format?

Gruesse,
Ben

Op 22-02-2022 om 14:07 schreef Christian Grün:

Hi Ben,

I guess this could be caused by a little error in your implementation
of the R client. Did you already have a look at the documentation of
the server protocol [1] and an alternative implementation [2]?

Cheers,
Christian

[1] https://docs.basex.org/wiki/Server_Protocol
[2] 
https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/org/basex/examples/api/BaseXClient.java


On Mon, Feb 21, 2022 at 1:03 PM Ben Engbers  wrote:


Hi,

I have a directory with 12 testfiles. In the BaseX-GUI, the command:
CREATE DB Parl_Test
/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files/
Creates database "Parl_Test" and loads the xml-files.

In my R-client,
Session$Create("Parl_Test") creates database "Parl_test"=> OK

I want to create the same database with my client.

I initialize the variable "XML_Files" with
"/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files".

The client translates the command:
Session$Create("Parl_Test", XML_Files) into a raw vector:
'\bParl_Test\0/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files'
which is sent to the server.
But the server responds with:
"\"Parl_Test.xml\" (Regel 1): Content is not allowed in prolog."

I didn't touch the xml-files. Where is the content inserted?

Ben Engbers


[basex-talk] Content is not allowed in prolog

2022-02-21 Thread Ben Engbers

Hi,

I have a directory with 12 testfiles. In the BaseX-GUI, the command:
CREATE DB Parl_Test 
/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files/

Creates database "Parl_Test" and loads the xml-files.

In my R-client,
Session$Create("Parl_Test") creates database "Parl_test"=> OK

I want to create the same database with my client.

I initialize the variable "XML_Files" with 
"/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files". 


The client translates the command:
Session$Create("Parl_Test", XML_Files) into a raw vector:
'\bParl_Test\0/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files'
which is sent to the server.
But the server responds with:
"\"Parl_Test.xml\" (Regel 1): Content is not allowed in prolog."

I didn't touch the xml-files. Where is the content inserted?

Ben Engbers


Re: [basex-talk] Syntax-checker

2022-02-17 Thread Ben Engbers

Hi Christian

I know that the R community is still looking for a XQuery tool. I won't 
say that RBaseX is the best but for the moment it is the best option I 
know of ;-). And bugs are becoming more and more rare (and difficult to 
resolve ;-().
Even after my retirement I spent a lot of time programming and working 
on the client gives me great fun. And introducing a syntax-checker would 
improve the usability.
I'll take a look at het QueryParser class and see if I can manage to 
implement it in R.


Gruesse,

Ben

PS. I've nearly completed a text that I mean to present to R-bloggers 
and in which I present my client. Would you care giving it a look?


Op 17-02-2022 om 14:38 schreef Christian Grün:

Hi Ben,

An XQuery string is parsed by the QueryParser class [1]. It’s the
largest Java class in the project, so it might take some time to get
it reimplemented in R.

Groetjes,
Christian




[basex-talk] Syntax-checker

2022-02-17 Thread Ben Engbers

Hi,

After I had formulated the query in Basex-GUI, I tried to execute the 
same multi-line query in R/RbaseX. Nada ;-(
In R, one can use the paste function to concatenate strings. I use this 
function to build a string which is passed to the RbaseX-client.

Example:
Stmt_1 <- "for $i in 1 to 2 return $i" => OK
Stmt_2 <- paste0("for $i in 1 to 2",   => ERROR
 "return $i")
It took 2 days of debugging before I found the error. Stmt_2 is 
concatenated to "for $i in 1 to 2return $i" and it is clear that this 
can't be executed. Instead of using the "paste0"-function I should have 
used "paste" which introdus a space between the strings to be 
concatenated. This works fine.


My problem is that the server/my client does not give an error-message. 
And this leads me to following question:
It would be helpfull if the syntax for the XQuery statement was checked 
before sending to the server. Where in the BaseX sources can I find the 
code for XQuery checking? Is it possible to translate tist code into R 
or is that way to difficult?


Ben Engbers


Re: [basex-talk] 'Flatten' a collection

2022-02-14 Thread Ben Engbers




Op 14-02-2022 om 19:30 schreef Sebastian Albert:

Hi Ben,

I learned about the `count` feature just from your example. It does not
seem to do what you want; I would try the "at" in a "for" loop.
According to XQuery 2nd Edition, Priscilla Walmsley pg. 135, 'count' was 
introduced with XQuery version 3.0



2: How can I formulate the query for getting the correct output?


Your example is not well-formed, you're probably missing a closing
 in the second  around the second .
No, the missing  was due to a typo while composing the 
mail ;-)



Anyway, I think what you want is to iterate over $Turn//spreker/text(),
not just use the entire sequence. Here's how I transformed your first
query (I stored your example in a variable called $file for
experimentation):


My intention was to iterate over 2 sequences; $Blog and $Turn. Why do 
you see this as 1 sequence?



Hope this helps,
Sebastian


for $Blog in collection("Blog"),
$Turn in collection("Blog")
where $Turn//datum/@date = $Blog//datum/@date
order by $Blog//datum/@date
count $CountOuter
  let $Id := $Blog/handeling/@id
  let $Datum := $Blog//datum/@date

  for $Speaker at $CountInner in $Turn//spreker/text()
return($CountOuter, $Id, $Datum, $CountInner, $Speaker)

returns =>
1, id="h_1", date="d_1", 1, spreker_1
1, id="h_1", date="d_1", 2, spreker_3
2, id="h_2", date="d_2", 1, spreker_2
2, id="h_2", date="d_2", 2, spreker_1
2, id="h_2", date="d_2", 3, spreker_4
3, id="h_3", date="d_3", 1, spreker_2
3, id="h_3", date="d_3", 2, spreker_3
3, id="h_3", date="d_3", 3, spreker_2
3, id="h_3", date="d_3", 4, spreker_1

OK!

With
  for $Speaker in $Turn//spreker/text()
count $CountInner
return($CountOuter, $Id, $Datum, $CountInner, $Speaker)

it returns =>
1, id="h_1", date="d_1", 1, spreker_1
1, id="h_1", date="d_1", 2, spreker_3
2, id="h_2", date="d_2", 3, spreker_2
2, id="h_2", date="d_2", 4, spreker_1
2, id="h_2", date="d_2", 5, spreker_4
3, id="h_3", date="d_3", 6, spreker_2
3, id="h_3", date="d_3", 7, spreker_3
3, id="h_3", date="d_3", 8, spreker_2
3, id="h_3", date="d_3", 9, spreker_1

ERROR :-(

While searching for a solution I also tried the following with a nested 
FLWOR: (It does not return what I want)


for $Blog at $countOuter in collection("Blog")
order by $Blog//datum/@date
  let $BlogId := $Blog/handeling/@id
  let $BlogDatum := $Blog//datum/@date
  count $countOuter
  return
   ( for $Turn at $countInner in collection("Blog")
   where $Turn//datum/@date = $Blog//datum/@date
   let $Speaker := $Turn//spreker/text()
   return ($countOuter, $BlogId, $BlogDatum, $countInner, $Speaker)
   )

I see your solution also as a nested 'for' loop but in your solution I 
am missing the 'LWO'.
Do you know what is the fundamenta difference between the two nested 
FOR-loops?


Ben
(Thanks for the help)


[basex-talk] 'Flatten' a collection

2022-02-14 Thread Ben Engbers

Hi,

I have a collection of 740 XML-documents which I want to flatten. The 
files all have the same structure:



  
  spreker_1
  spreker_3



  
  
spreker_2
spreker_1
spreker_4



  
  spreker_2
  spreker_3
  spreker_2
  spreker_1


The following query gives this result:
import module namespace  functx = "http://www.functx.com;;

let $Blogs := collection("Blog")
let $Turns := collection("Blog")

for $Blog in collection("Blog"),
$Turn in collection("Blog")
where $Turn//datum/@date = $Blog//datum/@date
order by $Blog//datum/@date
count $Count
  let $Id := $Blog/handeling/@id
  let $Datum := $Blog//datum/@date

  let $Speaker := $Turn//spreker/text()

return($Id, $Datum, $Speaker, $Count)

id="h_1"
date="d_1"
spreker_1
spreker_3
1
id="h_2"
date="d_2"
spreker_2
spreker_1
spreker_4
2
id="h_3"
date="d_3"
spreker_2
spreker_3
spreker_2
spreker_1
3

But what I eventually need is this (for clarity shown as a table):

1, id="h_1", date="d_1", 1, spreker_1
1, id="h_1", date="d_1", 2, spreker_3
2, id="h_2", date="d_2", 1, spreker_2
2, id="h_2", date="d_2", 2, spreker_1
2, id="h_2", date="d_2", 3, spreker_4
3, id="h_3", date="d_3", 1, spreker_2
3, id="h_3", date="d_3", 2, spreker_3
3, id="h_3", date="d_3", 3, spreker_2
3, id="h_3", date="d_3", 4, spreker_1

The first counter indicates the position in $Blog. and the second 
counter indicates the position in $Turn


I expected that the following query would return what I was looking for:

for $Blog in collection("Blog")
order by $Blog//datum/@date
  let $Id := $Blog/handeling/@id
  let $Datum := $Blog//datum/@date
  count $countOuter
  return (
  for $Turn in collection("Blog")
where $Turn//datum/@date = $Blog//datum/@date
let $Speaker := $Turn//spreker/text()
return ($countOuter, $Id, $Datum, $Speaker))

Instead it returns
1, id="h_1", date="d_1", 1, spreker_1, spreker_3
2, id="h_2", date="d_2", 1, spreker_2, spreker_1, spreker_4
3, id="h_3", date="d_3", 1, spreker_2, spreker_3, spreker_2, spreker_1


I have 2 questions:
1: Is it possible to use separate counters for the inner and the outer 
loop? (How should I define the $countInner?)

2: How can I formulate the query for getting the correct output?

Ben Engbers


Re: [basex-talk] How to extract value from fn:analyze-string

2022-02-10 Thread Ben Engbers
I'll experiment a little with the namespace but for the moment adding *: 
works!


Thanks,
Ben

Op 10-02-2022 om 18:46 schreef Imsieke, Gerrit, le-tex:
It’s a namespace thing. The analyze-string() result is in the 
http://www.w3.org/2005/xpath-functions namespace, which is bound to the 
fn prefix. So you should write fn:match etc. instead of match, or, as 
Bridger suggested, *:match. But such a wildcard always seems a bit 
desperate to me (no offense, Bridger ;).
Whereas you don’t need to use the privileged fn prefix when you invoke 
analyze-string(), it’s only important when you select the namespaced 
results.


Gerrit

On 10.02.2022 18:30, Ben Engbers wrote:

Hi,

This query produces the following result:

let $debates := collection("Parliament")
for $debate-item in $debates
   let $item-file := $debate-item/officiele-publicatie//meta/@content
   let $debate-id := fn:analyze-string(
 $debate-item/officiele-publicatie//meta/@content, 
"(\d{8}-\d*)-(\d*)")

   return ($debate-id)

=>

http://www.w3.org/2005/xpath-functions;>
   https://zoek.officielebekendmakingen.nl/h-tk-
   
 20202021-102-1
   
   /metadata.xml

...

I am trying to extract the values from group 1 and 2 but this query 
returns 0 results:


let $debates := collection("Parliament")
for $debate-item in $debates
   let $item-file := $debate-item/officiele-publicatie//meta/@content
   let $debate-id := fn:analyze-string(
 $debate-item/officiele-publicatie//meta/@content, 
"(\d{8}-\d*)-(\d*)")


 let $debate-nr := $debate-id//match/group[@nr="1"]/text()
 let $item-nr := $debate-id//match/group[@nr="2"]/text()

   return ($debate-nr, $item-nr)

My guess is that analyze-string inserts new elements in the query and 
that that is the reason why this does not work.


How can I extract debate-nr and item-nr from $debate-id?

Ben Engbers




[basex-talk] How to extract value from fn:analyze-string

2022-02-10 Thread Ben Engbers

Hi,

This query produces the following result:

let $debates := collection("Parliament")
for $debate-item in $debates
  let $item-file := $debate-item/officiele-publicatie//meta/@content
  let $debate-id := fn:analyze-string(
$debate-item/officiele-publicatie//meta/@content, 
"(\d{8}-\d*)-(\d*)")

  return ($debate-id)

=>

http://www.w3.org/2005/xpath-functions;>
  https://zoek.officielebekendmakingen.nl/h-tk-
  
20202021-102-1
  
  /metadata.xml

...

I am trying to extract the values from group 1 and 2 but this query 
returns 0 results:


let $debates := collection("Parliament")
for $debate-item in $debates
  let $item-file := $debate-item/officiele-publicatie//meta/@content
  let $debate-id := fn:analyze-string(
$debate-item/officiele-publicatie//meta/@content, "(\d{8}-\d*)-(\d*)")

let $debate-nr := $debate-id//match/group[@nr="1"]/text()
let $item-nr := $debate-id//match/group[@nr="2"]/text()

  return ($debate-nr, $item-nr)

My guess is that analyze-string inserts new elements in the query and 
that that is the reason why this does not work.


How can I extract debate-nr and item-nr from $debate-id?

Ben Engbers


[basex-talk] BaseX and 'view'?

2022-02-08 Thread Ben Engbers

Hi,

I am writing a blog for R-bloggers with the aim of raising awareness of 
my RbaseX package. The latest version is super fast! Loading and saving 
740 xml documents from the R environment took only 34 seconds!
Before I can use those documents I will probably have to convert them 
using xsl.
I was wondering if BaseX, like Oracle, also has a 'view' over the 
original data? I can then define multiple views that are better adapted 
to the intended use.


ben


Re: [basex-talk] Copy data from MariaDB into BaseX

2021-12-23 Thread Ben Engbers
Sorry, I should have been more precise in my question (and it would have 
been better not to talk about parity ;-()


I have 4 schemas/databases in MariaDB which I want to copy to BaseX.
The first one (schema = 'Innovate') uses 4 tables.
The first table (='Dienst') has 2 attributes) and 2 rows. In total there 
are 6 tables in this schema.
I try to copy this first schema to BaseX - "Relational" which is created 
as an empty database (create db Relational)


The result from :

let $doc := element { $db } {
  for $table in $tables
return element { $table } {
  let $rows := sql:execute($con, 'select * from ' || $table)
  for $row in $rows
  return element row {
for $col in $row/sql:column
return element { $col/@name } { $col/data() }
  }
}
}
return $doc

is:

  

  1
  CIO Office


  2
  Dictu

  
  .. other tables ..


But the result from:
return db:add($MariaBase, $doc, $db)
is:
BaseX database "Relational"

  

  
..
  
  .. other tables ..
  


At the end It was my intention to have created:
BaseX database "Relational"
  

.. other schemas/tables ..
  
  <3 other schemas>

Op 22-12-2021 om 20:53 schreef Christian Grün:

With your query, you seem to add a single document into your database
that contains the contents of all tables. 


Correct. One document in database "Relational" should represent a 
complete schema in MariaDB.


That’s fine in general; but

is it what you are trying to achieve, or would it probably be better
to represent a single table as document?


What would be the advantage of representing single tables as a document? 
Aren't both approaches equivalent?


Ben



Re: [basex-talk] Copy data from MariaDB into BaseX

2021-12-22 Thread Ben Engbers

Hi Christian,

For the time being, I ended up with this:

sql:init("org.mariadb.jdbc.Driver"),

let $MariaBase := 'Relational'
let $db:= 'Innovate'
let $user  := ''
let $pass  := 'let $con := sql:connect('jdbc:mariadb://localhost:3306/' || $db, $user, 
$pass)

let $tables := sql:execute($con, 'show tables')/sql:column/text()

let $doc := element { $db } {
  for $table in $tables
return element { $table } {
  let $rows := sql:execute($con, 'select * from ' || $table)
  for $row in $rows
  return element row {
for $col in $row/sql:column
return element { $col/@name } { $col/data() }
  }
}
}

(: return db:add($MariaBase, $doc, $db). :)
return $doc gives



return $doc gives:


  

  1
  CIO Office


But
return db:add($MariaBase, $doc, $db)
results in my database in Relational -> Innovate/1 -> Innovate/1 -> 
Dienst/1 -> row/n (1 and n indicate parity)


I expected that return db:add($MariaBase, $doc) would add $doc at the 
top-level, resulting in Relational -> Innovate/1 -> Dienst/1 -> row/n 
but this results in an error (path is missing)


According to the documentation ommitting the third parameter in db:add 
should be allowed, or did I misinterpret something?


Cheers, Ben


Op 21-12-2021 om 13:20 schreef Christian Grün:

Thanks. Does the query do what you are looking for?

On Tue, Dec 21, 2021 at 12:47 PM  wrote:


Christian Grün schreef op 21-12-2021 10:18:

Hi Ben,



return db:add($db, $doc, $table || '.xml')

Could you give us little examples for ,  and
 ?

Best,
Christian


To the best of my knowledge in MySQL and/or MariaDB DB-name and
DB-schema are identical? The schema-name I use is 'Innovate'.
Table-names are
++
| Tables_in_Innovate |
++
| Dienst |
| Mdw_Probleem   |
| Mdw_Wens   |
| Medewerker |
| Medewerker_dienst  |
| Probleem   |
| Wens   |
++

Ben
PS.I hope you'll see this reply. Since a few days all mail from
basex-talk is refused by Thunderbird. At least I don't see them anymore



Re: [basex-talk] Copy data from MariaDB into BaseX

2021-12-21 Thread Ben Engbers
At least this is a very good start! I'll see if I can manage to transfer 
all the tables in on nested command. But first I'll have to refresh my 
XPath or XQuery knowledge.

I'll let you know about the results.

Have a nice holiday,
Ben

Op 21-12-2021 om 13:20 schreef Christian Grün:

Thanks. Does the query do what you are looking for?


[basex-talk] Copy data from MariaDB into BaseX

2021-12-20 Thread Ben Engbers

Hi,

After completing my work on the R-client, I started working on a 
Prolog-client.
Long ago I wrote an application in SWI-Prolog which operated on data 
from a MySQL-database. (In the meantime I changed from MySQL to 
MariaDb). My goal is to write a new version of that application but now 
based on data which is stored in Basex.


In the basexgui, I created an empty database "MariaBases"

The following code can be used to select data in MariaDb:
sql:init("org.mariadb.jdbc.Driver"),
let $con := sql:connect('jdbc:mariadb://localhost:3306/', 
'', '')

return sql:execute($con, "select * from Mdw_Wens")

returns:
http://basex.org/modules/sql;>
  1
  5
  1


Is it possible to change the query-statement in such a way that the 
results are added to MariaBases//?


--
Ben Engbers


Re: [basex-talk] Authentication in server protocol

2021-12-08 Thread Ben Engbers

Hi Christian,

As far as I now understand, a socketConnection is not a single 
connection but in fact a pool of connections. And I believe this is 
language-independent.
In R, Tthe command socketSelect(list()) waits for the first 
of several socket connections and server sockets to become available.
After inserting this command in my code, there is no need anymore to 
explicitly insert a sleep. Execution-time for all the results has been 
reduced to 1.4 seconds instead of 120 as before.


Now I can really start using RbaseX!

Ben

Op 08-12-2021 om 12:55 schreef Christian Grün:

Hi Ben,

I assume this challenge needs to be tackled in the R realm: If the
Java client is used, no sleep is required at all.

Hope this helps,
Christian


[basex-talk] Authentication in server protocol

2021-12-08 Thread Ben Engbers

Hi Christian,

All my previous packages for RBaseX were based on using a blocking 
socket. Every attempt to use a non-blocking socket failed because I 
couldn't authenticate.
In R each read-operation on a blocking socket uses a timeout of at least 
1 second. Consequence was that executing 53 tests on my pacakge took at 
least 116 seconds on my machine.


I finally managed to use a non-blocking socket. Execution of the same 
tests now take 3.8 seconds.
It showed that the crucial needed step was to introduce a sleep/wait 
between sending the authentication nonce and checking the statusbyte:


  code <- md5(paste(md5(code), nonce, sep = "")) %>% charToRaw()
  # send username + code
  auth <- c(charToRaw(username), as.raw(0x00), code, as.raw(0x00))
  writeBin(auth, private$conn)
==>  Sys.sleep(.1)
  Accepted <- readBin(conn, what = "raw", n = 1) ==0

My knowledge of working with sockets is limited so maybe you can answer 
my question.
Does the need of using a sleep means I need to fix a bug in the R code 
or should I use a setting in BaseX that takes into account the required 
delay?


Ben Engbers


[basex-talk] R-client RBaseX version 0.9.2

2021-12-06 Thread Ben Engbers

Hi,
I have completely rewritten my R-client for BaseX. This new version can 
be downloaded from 
https://cran.r-project.org/web/packages/RBaseX/index.html or 
https://github.com/BenEngbers/RBaseX.
This version should comply more with the server specification. Compared 
to the previous version, there are (only) a few changes to the interface.


Ben Engbers


Re: [basex-talk] Access to "https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/first.xml" blocked?

2021-11-11 Thread Ben Engbers
They are both rejected. I also tried with "https://www.cnn.com; and 
"https://nos.nl;. The first is accepted, the second is rejected.


In firefox all URL's are accepted.

Ben

Op 11-11-2021 om 16:24 schreef Imsieke, Gerrit, le-tex:

Hi Ben,

What about other resources there, like:

https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/input.xml 

https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/response.txt 



Do they pass on Windows?

Gerrit


On 11.11.2021 16:14, Ben Engbers wrote:

Hi,

Allthough I have never had any feedback on my R-client to BaseX, I 
have steadily been working on a new version (even while being retired, 
I still like to program :-).
At present all tests are passed, except one test on adding content to 
a database.


On my Linux-machine 'url 
exists("https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/first.xml;)' 
returns TRUE and 'first.xml' is added to a database.


When executed on a Windows machine, the same test returns FALSE.
I have tested other URL's - starting with https or http - and they are 
all accepted.


Any clue why 
"https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/first.xml; 
is blocked?


Ben




Re: [basex-talk] Access to "https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/first.xml" blocked?

2021-11-11 Thread Ben Engbers

On both machines I use the most recent versions of R and RStudio.
I'll also drop a question in the R-community and (tomorrow) I'll look at 
the URL's you suggested.


Ben

Op 11-11-2021 om 16:45 schreef Imsieke, Gerrit, le-tex:

Maybe related to the HTTP header field x-content-type-options: nosniff

https://docs.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/compatibility/gg622941(v=vs.85)?redirectedfrom=MSDN 



What is the tool/library you are using on Windows? Is it an R HTTP 
client that is interfacing some Windows DLL? Maybe they put this 
rejection in the DLL.


Maybe you can find another proxying server for raw Github files?

https://stackoverflow.com/questions/40728554/resource-blocked-due-to-mime-type-mismatch-x-content-type-options-nosniff 



Gerrit


[basex-talk] Access to "https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/first.xml" blocked?

2021-11-11 Thread Ben Engbers

Hi,

Allthough I have never had any feedback on my R-client to BaseX, I have 
steadily been working on a new version (even while being retired, I 
still like to program :-).
At present all tests are passed, except one test on adding content to a 
database.


On my Linux-machine 'url 
exists("https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/first.xml;)' 
returns TRUE and 'first.xml' is added to a database.


When executed on a Windows machine, the same test returns FALSE.
I have tested other URL's - starting with https or http - and they are 
all accepted.


Any clue why 
"https://raw.githubusercontent.com/BaseXdb/basex/master/basex-api/src/test/resources/first.xml; 
is blocked?


Ben


Re: [basex-talk] Call for install/setup stories from users

2021-03-17 Thread Ben Engbers
Since I never had the need to keep older versions alive, apart from step 
1 my install and upgrade procedure are the same:


1: mkdir ~/Programs/basex
2: Download BaseX.zip
3: Unzip to ~/Programs

I never needed to (re)create the data directory or the symbolic link
After adding ~/Programs/basex/bin: to my path, I can start basexgui or 
basexserver & from the coammandline.


Ben

Op 17-03-2021 om 20:26 schreef Graydon:

On Wed, Mar 17, 2021 at 03:05:58PM -0400, Bridger Dyson-Smith scripsit:

Per the recent thread about installing, I was hoping to convince some of
you to share your experiences installing and running BaseX. Whether you use
Mac OS, Windows, a Linux, or something else: how are you installing and
running BaseX?


This is on Fedora; it's pretty much strictly an update process by now,
though the install process only skips step 3.

1. Download BaseX.zip from the website into ~/bin/basex
2. cd ~/bin/basex
3. mv basex BaseX$VERSION
4. unzip BaseX$VERSION.zip
5. cd basex
6. rmdir data
7. ln -s ../data .

Because the executables are always on the same path --
~/bin/basex/basex/bin -- I don't have to update the shortcut icons when
I update versions.

Every now and again I'll go through and prune old versions from
bin/basex.

It would be _better_ if there was a Fedora package and I didn't have to
think about performing the update, but, well, BaseX is on a very
short list of software that's useful enough to use even if it's not
available as a Fedora RPM via dnf.




[basex-talk] Afmelden

2020-11-01 Thread Ben Engbers
Ik ben me er niet van bewust dat ik me ooit aangemeld heb voor 
'redhetpensioenstelsel' en kan me ook niet voorstellen dat ik een 
account aangemaakt heb. Wel weet ik dat ik met regelmaat lastig gevallen 
worden door deze lijst.


Kunt u het adres 'ben.engb...@be-logical.nl' verwijderen van de lijst?

Ben Engbers


Re: [basex-talk] RbaseX Client software, reading from a socket

2020-06-30 Thread Ben Engbers
Hi Christian,

R provides a package which makes it rather easy to use C++ code. That is
why I focused on C++.
I first tried to understand the BaseXCPPAPI as provided by Jean-Marc
Mercier but for a complete novice on C++, that code was way too
complicated for an old man like me (I'm retiring TODAY ;-)).
The C-code from Alexander Holupirek is much easier to understand and for
the moment I'm trying to convert his code to a C++-variant that can be
both used by my RbaseX and a new C++-client.

Usually, I first experiment in the GUI to learn which statements I have
to use for a query. After that I use the same statements in my client. I
noticed that often execution in the GUI only took miliseconds while
execution in the client could take minutes (depending on the size of the
input or the results). It is my guess that this boiles down to
read/write operations on the connection.

In R, I have isolated all actions upon the stream into one R-class. And
my first goal is to create a C++ class that is functionally equivalent.
Hopefully that will improve performance.
If I manage in that, I am halfways into building a C++ client that
offers the same functionality as my RbaseX-client. Who knows If I'll
succeed in that ;-) .

Cheers,
Ben

Op 30-06-2020 om 11:56 schreef Christian Grün:
> Hi Ben,
> 
> The BaseX server protocol was specified without focus on any
> particular programming language.
> 
> If there is no way to speed up stream processing with R, you could
> have a look at the existing C++ client implementation [1]. Maybe
> you’ve done so already?
> 
> Cheers,
> Christian
> 
> [1] https://docs.basex.org/wiki/Clients



[basex-talk] RbaseX Client software, reading from a socket

2020-06-29 Thread Ben Engbers
Hi,

I have no idea if it is used by others, but last march my most recent
version of my RbaseX library was accepted by CRAN. To my knowledge there
are no errors (all tests are passed). The only problem is that
performance is bad ;-(. Uploading a file or downloading the result from
a query can take several minutes.
I can understand why it takes so long. According to the server protocol,
the end of a stream is indicated by a terminating 0-byte. And to
distinguish a 'regular' 0-byte in a binary stream from the stop-0,
0-bytes (and FF-bytes) are preceded by an extra FF-byte. The only way to
deal in R with these FF-byte was to proces each character/byte
separately and that takes much time.

I am trying to speed up everything by using C++ for all direct
read/write operations. But I never have worked with C++ before. And
neither do I understand exactly how streams are to be used.
According to some posts on internet, when reading from a stream the
first 8 bytes are used to pass information on the length of the stream.

My question is if this a standard way to pass information on that
length? Or is it specific to C++ or Java?

Ben


[basex-talk] Error in the text for the server protocol?

2020-04-16 Thread Ben Engbers
Hi Michael,

I only noticed today that in the server procol
(https://docs.basex.org/wiki/Server_Protocol) there are 2 different
instructions for adding a new resource.

On page 3:
ADD \09 {name} {path} {input}

On page 6:
void add(String path, InputStream input)

I has been pure luck that I mixed both instructions. I have implemented
ADD as \09 {path} {input} and that works perfectly.
Session$Add("Test.xml", "Content 1") adds
resource Test.xml with the xml as content.

Meanwhile it is very easy for me to change the instructions that are
send to the server so I tried to add {name} to the ADD command. That
results in errors.

My guess is that you should use either {name} or {path} and that ADD
only works on the database  in use. Is that correct?

Cheers,
Ben


[basex-talk] Binding a variable to a sequence

2020-04-14 Thread Ben Engbers
Hi,

I am still trying to improve my R-package and at the moment I am working
(again) on the 'Binding' command.

I have added this XML to a database:



This query:
for $t in collection("TestDB/Books")/book
where $t/@author = "Walmsley"
return $t/@title/string()

returns:
XQuery

My command:
  Bind(Query_3, "$name", list("Walmsley", "Wickham"))

sends the following byte-sequence to the server:
57 61 6c 6d 73 6c 65 79 01 57 69 63 6b 68 61 6d 00 00

And this sequence is accepted by the server.
In Query_3, I try to bind $name to the sequence:
declare variable $name external;
for $t in collection('TestDB/Books')/book
where $t/@author = $name
return $t/@title/string()

The query is executed but no results are given. (I had expected that it
would return the sequence {"XQuery", "Advanced R"}

How should I correct the query-statement?

Cheers,
Ben


Re: [basex-talk] How to apply array:for-each on a - sequence - of arrays? SOLVED

2020-03-31 Thread Ben Engbers
Hi,

> To insert the third value into each array I think you want
> 
>   let $result := $idf ! array:append(., math:log($count div .(2) ))

This works!

Martin and Graydon, thanks for the help and the explanation.

Ben

import module namespace tidyTM = 'http://www.be-logical.nl';

declare function local:step_one($nodes as node()*) as array(*)*
{
  let $text := for $node in $nodes
 return $node/text() =>
 tokenize() => distinct-values()
  let $idf := $text   =>
 tidyTM:wordCount_arr()
  return $idf
};

declare function local:wordFreq_idf($nodes as node()*)  as array(*)
{
  let $count := count($nodes)
  let $idf := local:step_one($nodes)
  let $result := $idf ! array:append(., math:log($count div .(2) ))
  return $result
};

let $nodes :=
collection('IncidentRemarks/Incidenten-180101-190630.csv')/csv/record/INC_RM
let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()

return local:wordFreq_idf(
  tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers))

--

declare function tidyTM:wordCount_arr(
  $Words as xs:string*)
  as array(*)* {
for $w in $Words
  let $f := $w
  group by $f
  order by count($w) descending
return ([$f, count($w)])
} ;

---

["probleem", 703, 9.362885817944681e-1]
["opgelost.", 248, 1.9782167274401508e0]
...



Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Hi,

For (my personal) clarity, I have split up the original function in two
parts:

declare function local:step_one($nodes as node()*) as array(*)*
{
  let $text := for $node in $nodes
 return $node/text() =>
 tokenize() => distinct-values()
  let $idf := $text   =>
 tidyTM:wordCount_arr()
  return $idf
};

In local:step_one(), I first create a sequence with the distinct tokens
for each $node. All the sequences are joined in $text.
I then call wordCount_arr to count the occurences of each word in $text:

declare function tidyTM:wordCount_arr(
  $Words as xs:string*)
  as array(*) {
for $w in $Words
  let $f := $w
  group by $f
  order by count($w) descending
return ([$f, count($w)])
} ;

I would say that tidyTM:wordCount_arr returns a sequence of arrays but I
am not certain if I have specified the correct return-type?

Calling local:step_one(tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers))
returns:
["probleem", 703]
["opgelost.", 248]


I had hoped that calling  the following local:wordFreq, would add the
idf to each element but instead I get an error

declare function local:wordFreq_idf($nodes as node()*)  as array(*)
{
  let $count := count($nodes)
  let $idf := local:step_one($nodes)
  let $result := for-each( $idf,
function($z) {array:append ($z, math:log($count div $z(2) ) ) } )
  return $result
};
[XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): $idf
:= ([ "probleem", 703 ], [ "opgelost.", 248 ], ...).


Cheers, Ben

Op 31-03-2020 om 16:29 schreef Martin Honnen:
> So does the working function return a sequence of arrays? That doesn't
> match the
>   as array(*)
> return type declaration, it seems.
> 
> What does tidyTM:wordCount_arr() return, a single array (of atomic items)?




Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Hi,

> => means "take the thing on the left and substitute it for the first
> parameter of the function on the right, so 
I thought it meant "The first parameter on the right will be subsituted
with the thing on the left"?

> ('weasels') => replace('weasels','mustelids')  works
> 
> ('weasels','badgers') => replace('weasels','mustelids')  DOES NOT work
> 
> This is because a one-item sequence can be treated as the single string
> value the first parameter of replace() requires, but a
> greater-then-one-item sequence can't be.  (This one gives you "item
> expected, sequence found" if you try it from the GUI.)

The following is quite similar to the 'piping' mechanism in R.
I'll start experimenting with it.

Thanx,
Ben
> ! means "take each item of the sequence on the left and pass it to the
> thing on the right in turn", so
> 
> ('weasels','badgers') ! replace(.,'weasels','mustelids')  works.
> 
> (note that replace() got its first parameter back as the context item
> dot.)
> 
> so if you take
> 
> => array:for-each(function($idf) {array:append($idf,math:log($count div 
> $idf[2]) )})
> 
> and replace it with 
> ! array:for-each(.,function($idf) {array:append($idf,math:log($count div 
> $idf[2]) )})
> 
> (note the context-item dot!)
> 
> you should at least get a different error message.
> 
> -- Graydon
> 



Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Op 31-03-2020 om 01:18 schreef Graydon:
> On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit:
> [snip]
>> For "probleem", the idf should be calculated as ln($count/703). Since
>> there are 1780 nodes this would result in 0.929011751.
>> I tried to exten the 'let $idf' line with:
>>=> array:for-each(function($idf) {array:append($idf,
>> math:log($count div $idf[2]) )})
>> which should result in ["probleem", 703, 0.929011751]
>>
>> but no mather what I do, every time I get this error:
>> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
>> "probleem", 703 ], [ "opgelost.", 248 ], ...).
> 
> The errors says you're trying to feed a sequence of arrays to an array
> function; maybe you want ! where you have => ?
> 
> -- Graydon
> 

Hi,
Upon your remark about feeding a sequence of arrays, I first tried to
apply 'for-each' instead of 'array:for-each'. Alas, that didn't help
;-(, the error was still the same.
I then tried to understand what you mean with the '!'.
In the book from Priscilla Walmsley, the ! is mentioned as a simple map
operator. How is that related to this problem?

Cheers,
Ben


[basex-talk] How to apply array:for-each on an array of arrays?

2020-03-30 Thread Ben Engbers
Hi,

In textmining, the 'idf' or inverse document frequency is defined as
idf(term)=ln(ndocuments / ndocuments containing term). I am working on a
function that should return this idf.

This function:

declare function local:wordFreq_idf($nodes as node()*) as array(*) {
  let $count := count($nodes)
  let $text := for $node in $nodes
 return $node/text() => tokenize() => distinct-values()
 let $idf := $text   => tidyTM:wordCount_arr()
  return $idf
};

returns:

["probleem", 703]
["opgelost.", 248]
["dictu", 235]
["opgelost", 217]
["medewerker", 193]
...

For "probleem", the idf should be calculated as ln($count/703). Since
there are 1780 nodes this would result in 0.929011751.
I tried to exten the 'let $idf' line with:
   => array:for-each(function($idf) {array:append($idf,
math:log($count div $idf[2]) )})
which should result in ["probleem", 703, 0.929011751]

but no mather what I do, every time I get this error:
[XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
"probleem", 703 ], [ "opgelost.", 248 ], ...).

Is it possible to apply array:for-each on an array of arrays?

Ben



[basex-talk] New version for RBaseX

2020-03-13 Thread Ben Engbers
Hi,

I am glad that version 0.2.4 from my R-package 'RBaseX' has been
accepted by CRAN (https://cran.r-project.org/package=RBaseX)!

Large parts of the earlier version from the R-package 'RBaseX' have been
rewritten and the resulting code is much cleaner. I have added tests and
thanks to those test, I found (and fixed) several bugs. To my knowledge,
the full server-protocol has now been implemented.

One of the main differences concerns error-handling.
All client-requests to the basexserver end with either a \00 byte or a
\01. I have used this feature to add an extra layer of error-handling.
The default is still the regular tryCatch method. But after setting
'intercept' to TRUE, you can now define your reaction upon errors.

See the following example:
Session <- BasexClient$new("localhost", 1984L, username = "admin",
password = "admin")

Session$set_intercept(TRUE)
Session$Execute("drop DB TestDB")
Session$Execute("Open TestDB")
if (!Session$get_success()) {
  Session$Create("TestDB")
  Session$Add("Test.xml", "Content 1")
}
Session$Execute("Close")
Session$restore_intercept()

I am already working on a new version in which I will implement more
specific R-related topics (populating dataframes with XQuery-results.

Ben




Re: [basex-talk] Is it possible to use 'Stopwords' in a query?

2020-03-03 Thread Ben Engbers
Op 02-03-2020 om 13:27 schreef Christian Grün:
> Hi Ben,
> 
> Here is an alternative version that, as I believe, should match your
> requirements better:
> 
>   let $words := distinct-values(
> for $text in db:open('Incidents')/csv/record/INC_RM
> return ft:tokenize($text)
>   )
>   let $stopwords := db:open('Stopwords')/text/line
>   let $result := $words[not(. = $stopwords)]
>   return sort($result)
>
Hi Christian,

I don't have a separate database 'Stopwords'. The file 'Stopwoorden.txt'
was used as option while creating the 'Incidents'-database. Since I have
several lists with stopwords and several lists that can be used with
sentiment-analysts, I have stored all those files in a 'Textmining'
database.

Without caring about stopwords, this query works:

let $words :=
  for $text in collection('IncidentRemarks/Incidents')/csv/record/INC_RM
  return ft:tokenize($text)
return $words

("sort($words)" returns a long list of numbers)

In an article, ("Full-Text Search in XML Databases" by Skoglund, Robin,
2009), I saw this example on page 23:
1 (: will match "propagating few errors" :)
2 /books /book [@number="1"]//p ftcontains" propagation of errors"
3 with stemming with stop words ("a" , "the" , "of")

The query may be changed to "stemming without stop words".

What I would like to see in BaseX, is that similar as in xquery,
'Stopwords' could be used as if it were a separate resource in the
'Incidents'-database and that it could be used as follows in the query:

let $words :=
  for $text in collection('IncidentRemarks/Incidents')/csv/record/INC_RM
with stemming without stop words
  return ft:tokenize($text)
return $words

As far as I understand, 'stemming' has alrady been made available in the
ft:module.
Would it also be possible to use STOPWORDS in a similar way?

Cheers,
Ben



Re: [basex-talk] Should it be possible to declare a function in the client?

2020-03-03 Thread Ben Engbers
Op 02-03-2020 om 13:27 schreef Christian Grün:
> Hi Ben,
>  
> Here is an alternative version that, as I believe, should match your
> requirements better:
> 
>   let $words := distinct-values(
> for $text in db:open('Incidents')/csv/record/INC_RM
> return ft:tokenize($text)
>   )
>   let $stopwords := db:open('Stopwords')/text/line
>   let $result := $words[not(. = $stopwords)]
>   return sort($result)
> 
> There is no need to remove nbsp substrings as they’ll never occur in
> your input, and the ft:tokenize function will ensure that your input
> (case, special characters, diacritics) will be normalized (see [1,2]
> for more details). Using functx is perfectly valid; I only removed the
> reference to make the code a bit shorter.
> 
> Hope this helps,
> Christian
> 
> [1] http://docs.basex.org/wiki/Full-Text_Module#ft:tokenize
> [2] http://docs.basex.org/wiki/Full-Text

Hi Christian,

Since my primary goal for this is moment is to see how basex/XQuery can
be used for full text analysis (and compare the results or needed
efforts with similar tasks in R), I am very glad that you brought the
fn:tokenize() function to my attention!

Ben

PS,
Just for fun, I created a repository with this tiny function:
declare function tidyTM:wordFreqs(
  $Words as xs:string*)
{
for $w in $Words
  let $f := $w
  group by $f
  order by count($w) descending
return ($f, count($w))
} ;

It took less than 10 minutes to create a repository and populate with
this function.
Creating a R-package takes much longer time!!!



Re: [basex-talk] Should it be possible to declare a function in the client?

2020-02-28 Thread Ben Engbers
Op 28-02-2020 om 14:39 schreef Christian Grün:
> I was wondering about nbsp as well. Maybe you don’t need it at all,
> but we’d need to have a look at your files.
> 
> Could you additionally provide us with minimized instances of your
> Incidents and Stopwoorden.txt XML documents? They should have the same
> structure, but contain only a few lines of contents.

It should be relatively easy to create a database with the
(approximately 500) stopwords and another database with with the Incidents.
Shall I send you a backup of those two databases?

Ben



Re: [basex-talk] Should it be possible to declare a function in the client?

2020-02-28 Thread Ben Engbers
Op 27-02-2020 om 22:03 schreef Majewski, Steven Dennis (sdm7g):
> Also: is ‘(nbsp;)’ what you want as part of you regex to also catch the 
> ampersand ?  
> I’m just guessing your intent here. 
> You could also try  ‘(\W|nbsp;)+’ - i.e. non-word, but I’m kind of 
> assuming that it handles non-normalized unicode accented characters correctly 
> and reads them as word chars and not delimiters. That would be, of course, 
> the right thing, but I’ld probably test it first. 
> 
> — Steve. 

I just copied the regex-expression from this page
"https://en.wikibooks.org/wiki/XQuery/Tag_Cloud; (using regex always
gives me headaches ;-( ). But even after removing the "|[n][b][s][p][;]"
from the regex, basexgui still returns 5843.

Ben





Re: [basex-talk] Should it be possible to declare a function in the client?

2020-02-27 Thread Ben Engbers
Op 27-02-2020 om 19:19 schreef Christian Grün:
> It’s difficult to understand what’s going on here. Could you please
> provide us self-contained queries without the R wrapper code?

Version 1:

import module namespace functx = 'http://www.functx.com';
(: Extract the text :)
let $txt := collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text()
(: Convert to lower-case and tokenize :)
let $INC_RM := tokenize(lower-case(string-join($txt)),
'(\\s|[,.!:;]|[n][b][s][p][;])+')
(: Read Stopwords :)
let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()
(: Remove Stopwords :)
let $Stop :=  functx:value-except($INC_RM, $Stoppers)
return $Stop"

My R-code first executes this as XQUERY and then calculates the length
of the returned list (=5842).

Version 2:

import module namespace functx = 'http://www.functx.com';
let $txt := collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text()
let $INC_RM := tokenize(lower-case(string-join($txt)),
'(\\s|[,.!:;]|[n][b][s][p][;])+')
let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()
let $Stop :=  functx:value-except($INC_RM, $Stoppers)
return count($Stop)

Returns the length of the sequence (counts 5843 words).

The '\\' in the regular expression is intentional (R-specific). With a
single '\' the query can be executed in BaseXGUI.

Does this help?

Ben



Re: [basex-talk] Should it be possible to declare a function in the client?

2020-02-27 Thread Ben Engbers
Op 27-02-2020 om 16:41 schreef Christian Grün:
> Hi Ben,
> 
> …create a query object, and attach the actual function call to your
> query string.
I already thougth about that but what would be the benefit of repeating
the function-definition, every time I want to call the function ;-( ?

> If you want to make XQuery code persistent for future invocations, you
> can include your function in an XQuery library module and install this
> module in the repository [1].
I will probably go for this.

While experimenting (I try to speed up the querys), I compared the
results from these 2 querys:

Word_Inc_Rm_Stop_txt <- "import module namespace functx =
'http://www.functx.com';
  let $txt :=
collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text()
  let $INC_RM := tokenize(lower-case(string-join($txt)),
'(\\s|[,.!:;]|[n][b][s][p][;])+')
  let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()
  let $Stop :=  functx:value-except($INC_RM, $Stoppers)
  return $Stop"
Word_Inc_Rm_Stop <- Session$Execute(as.character(glue("xquery
{Word_Inc_Rm_Stop_txt}")))$result[[1]]
Word_Inc_Rm_Stop_Count <- length(Word_Inc_Rm_Stop)

Word_Inc_Rm_Stop_txt_2 <- "import module namespace functx =
'http://www.functx.com';
  let $txt :=
collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text()
  let $INC_RM := tokenize(lower-case(string-join($txt)),
'(\\s|[,.!:;]|[n][b][s][p][;])+')
  let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()
  let $Stop :=  functx:value-except($INC_RM, $Stoppers)
  return count($Stop)"
Word_Inc_Rm_Stop_Count_2 <- Session$Execute(as.character(glue("xquery
{Word_Inc_Rm_Stop_txt_2}")))$result[[1]]

These are the processing-times:

Version 1:
> print(proc.time() - ptm)
   user  system elapsed
  2.903   0.022   3.160
Version 2:
> print(proc.time() - ptm)
   user  system elapsed
  0.041   0.004   1.089

I guess it makes sense to put effort in speeding up my code. But what
bothers me is the following.

The first query computes the length from the vector that is returned,
The result is 5842.
The second query returns the length as computed by basex. This result is
5843. The GUI also returns 5843 as result.

I copied the output from
  ..
  return $Stop

to a new LibreOffice-document. That document counts 5842 words.

Who is right?

Cheers,
Ben






[basex-talk] Should it be possible to declare a function in the client?

2020-02-27 Thread Ben Engbers
Hi,

My RBaseX client is finally stable enough to use it for real
development. All regular commands are executed without errors.
But now I am facing another problem.

In a client-session, I want to use the following function:
fn_get_words_txt <- "declare function local:cloudWords( $Veld as
xs:string) as xs:string* {
  let $base := collection('IncidentRemarks/Incidents')/csv/record
  let $txt := string-join( $base/*[name() = $Veld]/text(), ' ')
  let $words := tokenize($txt,'(\\s|[,.!:;]|[n][b][s][p][;])+')
  return ($words)};"
(Doubling the '\' in the regular expression-string is R-specific.)

Session$Execute(fn_get_words_txt) returns:
Gestopt bij , 1/8:
Onbekend commando: declare. Probeer 'help'.
Error in Session$Execute(fn_get_words_txt) : Gestopt bij , 1/8:
Onbekend commando: declare. Probeer 'help'.

fn_get_words_Query <- Session$Query(fn_get_words_txt)
fn_get_words_Query$queryObject$ExecuteQuery() returns:
 Error in private$default_query_pattern(match.call()[[1]]) :
  Gestopt bij ., 5/20:
[XPST0003] Expecting expression.

Since fn_get_words_txt neither represents a regular command nor a
regular function-all, I understand these errors.

Before I even start trying to implement this in my package, my question
is if it should be able to create local functions for that session?
If so, any idea how to tackle this problem? Could the problem be
genaralized to the question how a prolog can be added or changed?

Cheers,
Ben


[basex-talk] Dynamic evaluation?

2020-02-26 Thread Ben Engbers
Hi,

I want to declare a function that can operate on various elements of a
record. It should be possible to pass the element-name as parameter to
the function.

I tried this:

declare function local:cloudWords(
  $Veld as xs:string
) as xs:string*
{
  let $base := collection('IncidentRemarks/Incidentsv')/csv/record
  let $txt := string-join( $base/$Veld/text(), " ")
  let $words := tokenize($txt,'(\s|[,.!:;]|[n][b][s][p][;])+')
  return ($words)
};

let $retValue := local:cloudWords("INC_RM")
return $retValue

But I get this error:
[XPTY0019] text(): node expected, xs:string found: "INC_RM".

Should I use xquery:eval to transform "$base/$Veld/text()" into
"$base/INC_RM/text()"

Ben


[basex-talk] BaseX GUI language settings

2020-02-26 Thread Ben Engbers
Hi,

My default language for basexgui is Dutch but I want to create
screenshots from a GUI that uses English.

How can I switch the language temporarily?

Cheers,
Ben


Re: [basex-talk] update:apply, Context is undeclared. (Newbie)

2020-02-19 Thread Ben Engbers
Op 19-02-2020 om 12:08 schreef Ben Engbers:
> Hi,
> 
> I have a database that contains several thousand records with elements
> that I will never need so I want to remove them.
> The following code snippet returns the expected element:
> 

> I tried to use update:apply to update the database but when I execute
> the following function, I get this message:
> [XPDY0002] element(functx:remove-elements): Context is undeclared
> 
---
> import module namespace functx = 'http://www.functx.com';
> declare %updating function local:clean_verbs(
>   $old  as node(),
>   $rem  as xs:string*
> ) as empty-sequence() {
>   update:apply(functx:remove-elements, [$old, $rem])
> };
> 
> let $p := collection("TextMining/nl-verbs.csv")/csv/record[1]
> let $remove := ("onbekend1", "onbekend2", "onbekend3", "onbekend4")
> 
> return local:clean_verbs($p, $remove)
--
> 
> I have two questions:
> 1: If I want to use the update module, how should I provide the context
> to the query?
> 2: How can I update all records without making use of update:apply or
> update:for-each (what is the befit of the update-module)?

With this code, I managed to replace all records:
--
import module namespace functx = 'http://www.functx.com';

let $old := collection("TextMining/nl-verbs.csv")/csv/record
let $remove := ("onbekend1", "onbekend2", "onbekend3", "onbekend4")

for $o in $old
  return replace node $o with functx:remove-elements($o, $remove)
---

Remains my questions:
1: How can I achieve the same task, using functx:remove-elements and
update:for-each?
2: What's the benefit of using the update module?

Cheers,
Ben


[basex-talk] update:apply, Context is undeclared. (Newbie)

2020-02-19 Thread Ben Engbers
Hi,

I have a database that contains several thousand records with elements
that I will never need so I want to remove them.
The following code snippet returns the expected element:

import module namespace functx = 'http://www.functx.com';
let $p := collection("Patterns/nl-verbs.csv")/csv/record[1]
let $remove := ("onbekend1", "onbekend2", "onbekend3", "onbekend4")
return functx:remove-elements($p, $remove)

I tried to use update:apply to update the database but when I execute
the following function, I get this message:
[XPDY0002] element(functx:remove-elements): Context is undeclared

import module namespace functx = 'http://www.functx.com';
declare %updating function local:clean_verbs(
  $old  as node(),
  $rem  as xs:string*
) as empty-sequence() {
  update:apply(functx:remove-elements, [$old, $rem])
};

let $p := collection("Patterns/nl-verbs.csv")/csv/record[1]
let $remove := ("onbekend1", "onbekend2", "onbekend3", "onbekend4")

return local:clean_verbs($p, $remove)

I have two questions:
1: If I want to use the update module, how should I provide the context
to the query?
2: How can I update all records without making use of update:apply or
update:for-each (what is the befit of the update-module)?

Cheers,
Ben


[basex-talk] Restore original lay-out

2020-02-18 Thread Ben Engbers
Hi,

I don't know how ;( but somehow I managed to change the layout for the
GUI. Now I have the Result-panel in top of the Info-panel.

How can I restore the original lay-out (Result to bottom-left and Info
to bottom-right)?

Cheers,
Ben


Re: [basex-talk] XDM metadata (SOLVED)

2020-02-13 Thread Ben Engbers
Op 12-02-2020 om 10:23 schreef Ben Engbers:

While reading the online-documentation, I saw that there was a internal
link to
http://www.docs.basex.org/wiki/Server_Protocol:_Types#XDM_Meta_Data in
which was described that in most case, the XDM meta data is nothing else
than the Type ID.

So my code is giving the expected results.

Ben


Re: [basex-talk] Full text and stopwords

2020-02-12 Thread Ben Engbers
Op 12-02-2020 om 10:21 schreef Ben Engbers:
> Hi Christian,
> 
Would it be a good approach to create a separate database for stop words
and sentiments?

> 
> Cheers,
> Ben
> 



[basex-talk] XDM metadata

2020-02-12 Thread Ben Engbers
Hi Christian,

According to the server protocol, when first sending \04 to the server,
the resulting items from the query are returned as strings, prefixed by
a single byte.
With my RBaseX-package, the result from
"for $i in 1 to 2 return Text { $i }"
is
"0b" "Text 1" "0b" "Text 2"

When sending \1F the strings should be prefixed by the XDM metadata.

In my case however, the output is the samen as with \04.

Is it possible to get the same output for querys in the basexgui so that
I can see which output should be expected?

This is the last remaining problem in my package. If I can resolve it, I
can upload a new version.

Cheers,
Ben



[basex-talk] Full text and stopwords

2020-02-12 Thread Ben Engbers
Hi Christian,

According to the docs, a stopword list can be used to decrease the size
of the full text index. I had no problems when using this list while
creating a database.

Is it also possible to use this list for other purposes?

1
According to XQueryX 3.1.pdf it is possible to use a sequence of
stopwords in a query:
/books/book[@number="1"]//p contains text "propagating of errors"
using stop words ("a", "the", "of").

How can I use this list in BaseX while building querys?

2
Is it possible to add words to the list, after that is has been loaded?
Suppose that it shows that my text contains a lot of names that I want
to exclude. How can I add those names to the stopwords list?

3
If I want to create a Wordcloud, I want to use all the words that remain
after tokenization and removing all the words from the stopwords list.

(I found this item 'https://en.wikibooks.org/wiki/XQuery/Tag_Cloud'. It
might be a good starting point for creating a wordcloud)

Cheers,
Ben



Re: [basex-talk] basexclient "Failed to construct terminal"

2020-02-11 Thread Ben Engbers
Op 11-02-2020 om 13:05 schreef Graydon:
> My guess -- stress "guess" -- is that lucene-stemmers is presumably
> Apache Lucene, which BaseX might well use -- writing your own stemmer
> seems like unnecessary suffering, and BaseX does do word stemming as
> part of the full-text capability -- and since current Apache Lucene is
> version 8.4.1 -- https://lucene.apache.org/ -- it seems likely you could
> be running into an error between a Lucene that BaseX expects to be using
> and the (way-old) version 3.4.0 on the CLASSPATH getting loaded instead.
> 
> But I don't know.
> 
> Can you take lucene-stemmers-3.4.0.jar off your CLASSPATH and see what
> happens?
> 
> -- Graydon
> 

If you want to use stemming in Dutch (as I do),
http://docs.basex.org/wiki/Full-Text tells that in addition to the
already present stemming-support, you have to add
http://files.basex.org/maven/org/apache/lucene-stemmers/3.4.0/lucene-stemmers-3.4.0.jar
to your CLASSPATH.
I have removed that jar from the CLASSPATH but that didn't make any
difference.
At a later time, I'll gradually will remove all java-stuff from my PATH
and see what happens.

Ben


Re: [basex-talk] basexclient "Failed to construct terminal"

2020-02-11 Thread Ben Engbers
Op 10-02-2020 om 15:40 schreef Graydon:

> In general, it looks like this is your environment rather than the
> package, but it'd be nice to be able to prove it.
> 
> Grab a fresh copy of current stable basex via the zip archive, unpack
> that in some other directory entirely -- ideally belonging to a
> different user, no reason you can't create a test user if you have root
> on the machine -- and see what it does there?
> 
> -- Graydon (who will admit to being rather baffled)

I created a new user, copied and unpacked Basex931.zip.
Out of the box, everything works fine. :-)

I switched back to my personal account. There I get the same errors as
before ;-(. What surprises me however is that despite the errors, basex
is fully operational.

So you are right, it is my environment that causes the error.

The only real difference between regular and test-account, is that for
the regular account, I have added in CLASSPATH an entry for
lucene-stemmers-3.4.0.jar

Could it be that this jar causes the error?

Ben


Re: [basex-talk] basexclient "Failed to construct terminal"

2020-02-10 Thread Ben Engbers
Op 10-02-2020 om 14:59 schreef Graydon:
> On Mon, Feb 10, 2020 at 02:28:08PM +0100, Ben Engbers scripsit:

> What it looks like you've got going on is a situation where basex uses
> modules and the java it's getting doesn't, but that doesn't explain why
> the gui runs fine.  It makes me suspect that you're having an
> interaction with httpd somehow.
> 
> -- Graydon
> 
The only thing that might use hhtpd somehow, is my RBaseX-package (I
have again rewritten large portions, cleaned up the classes, added tests
and so on).
I can't think of anything else that uses httpd. And even after
rebooting, basex/basexclient still give errors.

What else should I try?

Ben


Re: [basex-talk] basexclient "Failed to construct terminal"

2020-02-10 Thread Ben Engbers
Op 10-02-2020 om 14:07 schreef Graydon:
> On Mon, Feb 10, 2020 at 11:47:55AM +0100, Ben Engbers scripsit:
>> Whenever I try to start basex or basexclient on my Fedora 31 linux
>> distribution, I get this output:
> [error message snipped]
>>
>> Am I missing something?
> 
> I'm having no issues on Fedora 31, so I can at least say it's not
> inherent to the distro.  But then again I'm using the gui; let's check.

The GUI works fine (always has). It was only last week that I first had
to use the basexclient.

> 08:03 bin % ./basex
> /home/graydon/bin/basex/basex/.basex: writing new configuration file.
> BaseX 9.3.1 [Standalone]
> Try 'help' to get more information.
>>
> 
> That's from inside the basex bin directory.
> 
> How are you installing basex?  I always use the zip version and just
> unpack it in $HOME/bin.  Does the gui -- initialized from the basexgui
> script -- run for you?
> 
> -- Graydon
> 

I added the directory, containing basex/bin, to my path. 'basexserver
&', 'basexserver stop' and  'basexserverstop' execute without returning
errors.
This is the output from basexhttp (this is the first time I tried this):

BaseX 9.3.1 [HTTP Server]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/bengbers/Programs/basex/lib/slf4j-simple-1.7.26.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/bengbers/Programs/basex/lib/slf4j-simple-1.7.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/bengbers/Programs/basex/lib/slf4j-simple-1.7.28.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/bengbers/Programs/basex/lib/slf4j-simple-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
[main] INFO org.eclipse.jetty.util.log - Logging initialized @432ms to
org.eclipse.jetty.util.log.Slf4jLog
[main] INFO org.eclipse.jetty.util.TypeUtil - JVM Runtime does not
support Modules
java.lang.UnsupportedOperationException

Did you add something to CLASSPATH?

Ben


[basex-talk] basexclient "Failed to construct terminal"

2020-02-10 Thread Ben Engbers
Hi,

It probably has been asked before but I found nothing on this topic.

Whenever I try to start basex or basexclient on my Fedora 31 linux
distribution, I get this output:

BaseX 9.3.1 [Client]
Probeer 'help' om informatie te krijgen.
[ERROR] Failed to construct terminal; falling back to unsupported
java.lang.NumberFormatException: For input string: "0x100"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.valueOf(Integer.java:766)
at jline.internal.InfoCmp.parseInfoCmp(InfoCmp.java:59)
at jline.UnixTerminal.parseInfoCmp(UnixTerminal.java:233)
at jline.UnixTerminal.(UnixTerminal.java:64)
at jline.UnixTerminal.(UnixTerminal.java:49)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at jline.TerminalFactory.getFlavor(TerminalFactory.java:209)
at jline.TerminalFactory.create(TerminalFactory.java:100)
at jline.TerminalFactory.get(TerminalFactory.java:184)
at jline.TerminalFactory.get(TerminalFactory.java:190)
at jline.console.ConsoleReader.(ConsoleReader.java:240)
at jline.console.ConsoleReader.(ConsoleReader.java:232)
at jline.console.ConsoleReader.(ConsoleReader.java:220)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.basex.util.ConsoleReader$JLineConsoleReader.(ConsoleReader.java:146)
at org.basex.util.ConsoleReader.get(ConsoleReader.java:55)
at org.basex.BaseX.console(BaseX.java:166)
at org.basex.BaseX.(BaseX.java:152)
at org.basex.BaseXClient.(BaseXClient.java:35)
at org.basex.BaseXClient.main(BaseXClient.java:22)

'basex help' gives this output:

Gestopt bij /home/bengbers/, 1/5:
[XPDY0002] element(help): Context is undeclared.

Am I missing something?

Cheers,
Ben


Re: [basex-talk] No difference for output from 'FULL' or 'RESULTS'

2020-02-04 Thread Ben Engbers
Op 04-02-2020 om 08:12 schreef Christian Grün:
> Hi Ben,
> 
> The client API code hasn’t changed since BaseX 8. Maybe you need to
> revise your code.
> 
> If you believe something wrong happens in the API, I’d still need some
> more information on what you believe has changed exactly?
> 
> Best,
> Christian

Hi Christian,
It shouldn't be too difficult to read this code:

More = function() {
  if (is.null(private$cache)) { # The cache has to be filled
in_stream <- private$sock$get_socket()
private$write_code_ID(0x04)
cache <- c()
while ((rd <- readBin(in_stream, what = "raw", n =1)) > 0) {
  cache <- c(cache, as.character(rd))
  cache <- c(cache, private$sock$str_receive())
}
success <- private$parent$get_socket()$bool_test_sock()
private$parent$set_success(success)
private$cache <- cache
private$pos <- 0
  }
  if ( length(private$cache) > private$pos) return(TRUE)
  else {
private$cache <- NULL
return(FALSE)
  }}

Next = function() {
  if (self$More()) {
private$pos <- private$pos + 1
result <- private$cache[private$pos]
  }
  return(result)}

Full = function() {
  in_stream <- out_stream <- private$sock$get_socket()
  private$write_code_ID(0x1F)
  cache <- c()
  while ((rd <- readBin(in_stream, what = "raw", n =1)) > 0) {
cache <- c(cache, as.character(rd))
cache <- c(cache, private$sock$str_receive())
  }
  private$parent$get_socket()$bool_test_sock()
  result <- cache
  return(result)
}

Both More() and Full() start by filling the cache. Next() is used by
More() to iterate over the results. The main difference is the code that
is sent to the database (0x04 versus 0x1F).

Query_1 <- Query(Session, "for $i in 1 to 2 return Text { $i }")
fullResult <- Full(Query_1)

results in:
"0b""Text 1" "0b""Text
2"

The result from:
iterResult <- c()
while (More(Query_1)) {iterResult <- c(iterResult, Next(Query_1))}

is identical but as far as I can remember, it should have been:
"Text 1" "Text 2"

Can you tell if the results should be identical or different? If
different, I'll have to install older versions from my code ;-(

Cheers,
Ben





Re: [basex-talk] Finalizing Query-Objects

2020-02-04 Thread Ben Engbers
Op 04-02-2020 om 08:17 schreef Christian Grün:
> It makes no difference for the BaseX server if you close the session and
> have open query objects (query objects exclusively reside in the client).
> 
> It can make a difference in client implementations, though. If you have
> a chance to always close queries after the execution, I think you should
> do so. I assume your are caching the query results before iterating over
> them, as it’s some in the other client implementations?

Hi Christian,

I used the java-client as example, so yes, I cam caching the query results.
I will begin by explicitly closing all the querys, closing the
socketconnection and removing the session-objects. Hopefully this will
show what's causing the failure.

Ben
(The people from CRAN warned that this can be very difficult and can
cause severe headache ;-( )




[basex-talk] No difference for output from 'FULL' or 'RESULTS'

2020-02-03 Thread Ben Engbers
Hi,

As far as I can remember when using early versions from my
client-software, the main difference in output after sending \04 or \1F
to the database, was that in the latter case the output was preceded
with XDM Meta data.

# Full
query_txt <- "for $i in 1 to 2 return Text { $i }"
query_obj <- Query(Session, query_txt)
result <- Full(query_obj)

resulted in:
"0b" "Text 1" "0b" "Text 2"

# Iterate over query
query2 <- "for $i in 3 to 4 return Iter { $i }"
query_iterate <- Query(Session, query2)   # <== Alternative call to
query-object
while (More(query_iterate)) {
  cat(Next(query_iterate), "\n")
}

resulted in:
Iter 3
Iter 4

Now, iterating over the same query gives:
0b
Iter 3
0b
Iter 4

Did something change in the client/server protocol or did I introduce an
error somewhere?

Ben


[basex-talk] Finalizing Query-Objects

2020-02-03 Thread Ben Engbers
Hi,

The people from CRAN strongly suggested to add tests (comparable to
Unit-tests) to my package (RBaseX). Their request led me to take another
critical look at my code.
So far the tests do not give an error message. But after completing the
last test, 'testthat' reports 1 failure without further explanation.
After changing the order in which the tests are executed, the failure is
always caused by the last test. Therefore I think that it are not the
tests that cause an error, but the finalize-process.

At this moment, my code is based upon 3 classes: 'RBaseXClient' creates
a new client-session. This session use 'SocketClass' to communicate with
basexserver.  When used in query-mode, the session uses 'QueryClass' to
create new query-objects. Due to this architecture, it is easy to
explicitly close a regular query-object, but (at least in R) it is
difficult to close query-objects when finalizing the session-object.

How does the basexserver respond to closing the session without first
explicitly closing all open querys? Does this result in an error?

Ben


[basex-talk] Load LibreOffice- and Word-documents?

2020-01-28 Thread Ben Engbers
Hi,

While we were discussing possible usecases for basex, a colleague asked
me if it is also possible to load libreoffice and Word documents into
Basex and then perform full-text analysis on them. In essence, these are
both XML files, so it should be possible.

Does anybody have experience with this?

Ben


Re: [basex-talk] Client software and command scripts

2020-01-06 Thread Ben Engbers
Op 06-01-2020 om 11:39 schreef Christian Grün:
> What kind of arguments would you like to set?
> 
> The easiest option may be to prefix your script string with some
> additional SET commands.
> 
My suggestion/question on aassing arguments to a command script was not
restricted to passing 'set' options with the client API, but was more
general.
Suppose I have a script with commands that add the content from a
csv-file to a basex-db. It would be nice if I could pass the name/path
of the file as argument to the script. That would make it easier to
automate the update-process.

Ben



Re: [basex-talk] Client software and command scripts

2020-01-06 Thread Ben Engbers
Hi Christian,

Best wishes and thanks for your answer.

Ben

Op 03-01-2020 om 16:27 schreef Christian Grün:
> Hi Ben,
> 
>> I read in the documentation that a client should not only be able to
>> execute commands but should also be able to execute command scripts.
> 
> Could you give me a link to the part of the documentation you refer to?
I could not find it anymore, I guess that somewhere I mixed up
information the API client bindings and the commandline interface.

>> My question is if a command script (a file with extension '.bxs') is
>> passed as a 'path' to the execute-command or is the client supposed to
>> read the file, line by line, and then executing each line separately?

> If you refer to the execute function in the API client bindings [1],
> the argument must be a BaseX command string. 
That is the way I have implemented it already.

I used my client to load a csv-file into a db-file. In order to get the
same result as with the GUI, I had to set some options. And at that
moment, I thought that it might be handy to save all the commands in a
script so that at a later moment, the script could be reused again.

A condition for the usability of this option is that there is a
possibility to pass arguments to the script.
Is that already possible in BaseX?



[basex-talk] Client software and command scripts

2020-01-03 Thread Ben Engbers
Hi Christian,

I read in the documentation that a client should not only be able to
execute commands but should also be able to execute command scripts.

My question is if a command script (a file with extension '.bxs') is
passed as a 'path' to the execute-command or is the client supposed to
read the file, line by line, and then executing each line separately?

Cheers,
Ben


Re: [basex-talk] Binding a variable

2019-10-29 Thread Ben Engbers
Hi Christian,

Thanks for the explanation.

I am glad to learn that - at least for this moment - I don't have to
change my code :-)

Ben

> Hi Ben,
> 
> If you...
> 
>> and bind $p to root
>>   Bind(query_obj, "$p", "root")
> 
> …you’ll need to add another external variable declaration in your query:
> 
>   "declare variable $p external;"
>   ...
> 
> Please note, in addition, that your query won’t be executable as you
> are trying to assign a dynamic path expression (e.g., 'root') to your
> query. If you need to build dynamic query strings, you’ll have to
> modify your original query string and send the result to the server.
> 
> Hope this helps,
> Christian



[basex-talk] Binding a variable

2019-10-28 Thread Ben Engbers
Hi,

When experimenting with my RBaseX-package (I had hoped to submit it to
CRAN today), I use the following pattern:
1 Define a query
2 Create a query-object
3 Bind variables (optional)
4 Execute the query

When used this pattern on the following query, everything functions as
expected:
declare variable $name external; for $i in 1 to 3 return element { $name
} { $i }

The following query is also functioning:
paste("declare variable $greet external;",
  "declare variable $friend external;",
  "declare variable $into external;",
  "let $greet := 'Greetings, my '",
  "let $friend := 'friend'",
  "for $i in 1 to 3",
  "let $friend_num := $greet || $friend || $i",
  "return insert nodes element { $friend } { $friend_num }",
  "into root",
  sep = " ")

When I modify
  "into root" to "into $p"
and bind $p to root
  Bind(query_obj, "$p", "root")

I get the following error:

[XPST0008] Undeclared variable $p

The Bind-function returns with code \00, indicating that it is executed
without errors.

Does this mean that there is a bug in my code or am I violating
XQuery-syntax?

Ben Engbers


[basex-talk] binding-types?

2019-09-12 Thread Ben Engbers
Hi,

While creating a R-package, based on my R client-implementation, I found
that the binding function was malfunctioning.

After rewriting that function, the following is accepted:
query_txt <- "declare variable $name external; for $i in 1 to 5 return
element { $name } { $i }"
query_obj_1 <- Query(Sess, query_txt)
success <- query_obj_1$queryObject$Bind("name", "number")
print(query_obj_1$queryObject$ExecuteQuery())

results in:
 "1" "2" "3"
"4" "5"

When I change the line
'success <- query_obj_1$queryObject$Bind("name", "number")' in
success <- query_obj_1$queryObject$Bind("name", "number", "xs:integer")
the following error is produced:
"[XPST0081] No namespace declared for '\002xs:integer'."

What types are accepted?

Ben


Re: [basex-talk] Test if basexserver is running? (Partially solved)

2019-09-02 Thread Ben Engbers
Hi  Michael,

When unit-testing a package, the first test should be to test if a
connection to basexserver can be established. This is not difficult, in
fact the first thing I do in my code, is opening a connection so I
already know that my code works.
But what does it mean when an attempt to open a connection fails? Does
this mean that there is an error in my code or does the attempt fail
because there is no Basexserver running? So if you want to test the
code, you first have to be certain that a server is running.

Using Google, I found this solution when using Linux.
In linux, you can use  the 'ps -fC java' command to see which processes
are running in java. 'ps -fC java | grep basex | echo $?' returns 0,
meaning that a basexserver-instance is running. I guess that it will be
easy to incorporate this command in a R-function.

Do you know if a similar command is available for Windows?

Ben
PS. What do you mean with BaseX:123456789?

Op 30-08-19 om 14:44 schreef Michael Seiferle:
> Hi Ben, 
> 
> I maybe don’t fully get your question right (and I admin I do not know
> much about R), but I’d simply open the socket on the port I expect BaseX
> to be listening on and see whether or not I receive a `BaseX:123456789`
> response and close the connection immediately after.
> 
> Best
> Michael 
> 
>> Am 29.08.2019 um 15:03 schrieb Ben Engbers > <mailto:ben.engb...@be-logical.nl>>:
>>
>> Hi,
>>
>> Last year I have written a R-client for basex
>> (https://github.com/BaseXdb/basex/tree/master/basex-api/src/main/r/RbaseXClient.R).
>> The present version uses no exception handling and you have to include
>> the source-file in your R-code. A much cleaner solution would be catch
>> all the errors and to pack the sources in a package. At this moment, I
>> am working on such a R-package.
>>
>> The first test that should be executed in the package, is to test if a
>> basexserver is available.
>>
>> How can I test on Linux, Apple and Windows if a baseserver is running?
>>
>> Ben
> 




[basex-talk] Test if basexserver is running?

2019-08-29 Thread Ben Engbers
Hi,

Last year I have written a R-client for basex
(https://github.com/BaseXdb/basex/tree/master/basex-api/src/main/r/RbaseXClient.R).
The present version uses no exception handling and you have to include
the source-file in your R-code. A much cleaner solution would be catch
all the errors and to pack the sources in a package. At this moment, I
am working on such a R-package.

The first test that should be executed in the package, is to test if a
basexserver is available.

How can I test on Linux, Apple and Windows if a baseserver is running?

Ben


[basex-talk] EXECUTE syntax

2018-06-08 Thread Ben Engbers
Hi,

I want to use my R-client to insert csv in a database.

These lines works:

csv_add_run <- 'RUN "./DataScience/RBaseX/CSVexample.xq"'
Session$command(csv_add_run)

When I take the content from CSVexample.xq and incorporate that into a
EXECUTE command, I get this:

csv_add_exe <- 'EXECUTE "let $root :=
'/home/bengbers/DataScience/RBaseX/Examples/Parse/;)'; for $path in
file:children($root)[ends-with(., '.csv')] return db:add('CSV_test',
$path, 'CSV_API', map { 'parser': 'csv', 'csvparser': map { 'header':
'yes', 'separator': ';' }) "'

Session$command(csv_add_exe)

Stopped at , 1/9:
Unknown command: 'EXECUTE. Did you mean 'EXECUTE'?


My question is how to define the input for the EXECUTE-command?

Cheers,
Ben



Re: [basex-talk] Insert CSV into database

2018-06-01 Thread Ben Engbers
Hi Christian,

The alternative worked so my first question is answered. But the second
question still remains.

Why does BaseX-GUI use an old path
(/home/bengbers/DataScience/Eindopdracht/Data/file), a path I didn't
even enter and does not use the path I entered in the query
(/home/bengbers/DataScience/RBaseX/Examples/Parse/)?

Cheers,
Ben

Op 01-06-18 om 12:38 schreef Christian Grün:
> Hi Ben,
> 
> As file:list only returns relative file paths, you will have to prepend
> the root path later on:
> 
>   let $root := "/home/bengbers/DataScience/RBaseX/Examples/Parse/"
>   for $file in file:list($root, false(), "*.csv")
>   return db:add("CSV_test", $root || $file, "", map {
>     'parser': 'csv',
>     'csvparser': map { 'header': 'yes', 'separator': ';' }
>   })
> 
> Another alternative is to use the file:children function:
> 
>   let $root := "/home/bengbers/DataScience/RBaseX/Examples/Parse/"
>   for $path in file:children($root)[ends-with(., ".csv")]
>   return db:add("CSV_test", $path, "", map { ... })
> 
> Cheers,
> Christian


[basex-talk] Insert CSV into database

2018-05-31 Thread Ben Engbers
Hi,
My goals is to use my R clientdriver to insert csv-files into a new
databases. But before that, I'm experimenting with the GUI.

>From the documentation for the CSV-parser, I have taken this code:

for $file in
file:list("/home/bengbers/DataScience/RBaseX/Examples/Parse", false(),
"*.csv")
return db:add("CSV_test", $file, "", map {
  'parser': 'csv',
  'csvparser': map { 'header': 'yes', 'separator': ';' }
})

BaseX returns:  
Error:
Stopped at /home/bengbers/DataScience/Eindopdracht/Data/file, 2/14:
[FODC0002] Resource 'Test_Parse.csv' does not exist.

I didn't enter this path. It was used yesterday when browsing to the
datafiles that were inserted into another test-database.

let $file :=
file:list("/home/bengbers/DataScience/RBaseX/Examples/Parse", false(),
"*.csv")
return $file

BaseX returns:
Test_Parse.csv
Test_Parse (exemplaar).csv

If I create a new database, it neatly adds the two csv-files.

My questions are:
- which query I have to use to insert csv-files?
- obviously, BasexGUI uses the wrong path. How should I adjust this path?

Cheers,
Ben


[basex-talk] How to use BaseX on MacBook? (Urgent!)

2018-05-16 Thread Ben Engbers
Hi,

If we manage to install BaseX on a MacBook, chances are great that we
will use BaseX dor our final project.

I know how to install BaseX on linux but I have no experience with
Apple. My fellow-students know how to use applications but don't know
how to deal with java-applications.

My question is if BaseX can be used on a MacBook. If so, where can I
find instructions?

Cheers,
Ben Engbers


[basex-talk] Unlock database

2018-04-30 Thread Ben Engbers
Hi,

Somehow, I managed to lock a (test)-database and now I can't get it
unlocked.

Is it possible to manually remove the lock? If so, how?

Cheers,
Ben Engbers


Re: [basex-talk] Missing 'DELETE' in server protocol?

2018-04-24 Thread Ben Engbers
Hi Christian,

I changed my code to:
add = function(path = path, input = input) {
  writeBin(as.raw(0x09), private$sock)
  writeBin(private$raw_terminated_string(path), private$sock)
  writeBin(private$raw_terminated_string(input), private$sock)
  private$info <- self$str_receive()
  return(list(info = private$info, success = self$bool_test_sock()))
}
and tested the new code with:
Path1 <- "Test1.xml"
Path2 <- "test/Test1.xml"
Simple1 <- "Hello World!"
Simple2 <- "/home/bengbers/DataScience/RBaseX/Test1.xml"
Simple3 <- "Test1.xml"
Added <- Session$add(path = Path1, input = Simple1)

(Simple2 is Simple1 written to Test1.xml

When used with either Path1 or Path 2, Added$info returns:
"Improper use? Potential bug? Your feedback is welcome:\nContact:
basex-talk@mailman.uni-konstanz.de\nVersion: BaseX 9.0\nJava: Oracle
Corporation, 1.8.0_162\nOS: Linux, amd64\nStack Trace:
\njava.lang.RuntimeException: Learn: lock file does not exist.\n\tat
org.basex.util.Util.notExpected(Util.java:61)\n\tat
org.basex.data.DiskData.finishUpdate(DiskData.java:246)\n\tat
org.basex.core.cmd.ACreate.update(ACreate.java:97)\n\tat
org.basex.core.cmd.Add.run(Add.java:56)\n\tat
org.basex.core.Command.run(Command.java:257)\n\tat
org.basex.core.Command.execute(Command.java:93)\n\tat
org.basex.core.Command.execute(Command.java:116)\n\tat
org.basex.server.ClientListener.execute(ClientListener.java:343)\n\tat
org.basex.server.ClientListener.add(ClientListener.java:314)\n\tat
org.basex.server.ClientListener.run(ClientListener.java:96)\n"

With Path1/Simple2 or Path1/Simple3:
"\"Test1.xml.xml\" (Line 1): Content is not allowed in prolog."

With Path2/Simple2 or Path2/Simple3:
"\"test/Test1.xml.xml\" (Line 1): Content is not allowed in prolog."

In all cases Added$success returns FALSE

In an old mail someone suggested that maybe this was caused by the used
encoding. I converted the encoding for Test1.xml from US-ASCII to UTF-8
but this had no effect.

Cheers,
Ben

Op 24-04-18 om 13:46 schreef Christian Grün:
> Hi Ben,
> 
> I assume that this part of the server protocol is indeed outdated. I
> have just checked out our Java client, which only sends the target
> path to the server (which includes the name of the document) [1].
> 
> Could you check out if this solves the problem? If yes, I’ll be happy
> to update our documentation.
> 
> Best,
> Christian


Re: [basex-talk] Missing 'DELETE' in server protocol?

2018-04-24 Thread Ben Engbers
Hi Christian,

Thanks for your answer. It helped.

Now I have another question.

According to the server protocol, I have coded the 'add'-command as follows:
writeBin(as.raw(0x09), private$sock)
writeBin(private$raw_terminated_string(name), private$sock)
writeBin(private$raw_terminated_string(path), private$sock)
writeBin(private$raw_terminated_string(input), private$sock)
private$info <- self$str_receive()
return(list(info = private$info, success = self$bool_test_sock()))

When executing these lines:
Name1 <- "Name1.xml"
Path1 <- "path/test"
Simple <- "Hello World!"
test <- Session$add(name = "Name1.xml", path = "path/test", input = Simple)
I would expect that a new reource was created with name, path and
content as specified by the parameters.

However I receive:
> test
$info
[1] "\"Name1.xml.xml\" (Line 1): Content is not allowed in prolog."
$success
[1] FALSE

Using Name1 <- "Name1" produces no error but still fails.

Can you give any clue in which direction i should search (using the
debugger didn't help)

Ben

Op 23-04-18 om 16:08 schreef Christian Grün:
> Hi Ben,
> 
> You are right, there is no DELETE entry in the client binding. The
> reason is that you can simply send a DELETE command [1], as there is
> no need to transfer additional binary data.
> 
> Does this help?
> Christian


[basex-talk] Missing 'DELETE' in server protocol?

2018-04-23 Thread Ben Engbers
Hi,

It was only after starting to implement my R-client implementation in
examples, that I noticed there is no 'DELETE'-command specified in the
server protocol.

Is this a deliberate ommission?

I would guess that implementing such a command would come down to
something like this:
delete = function(name = name) {
  writeBin(as.raw(---BYTE---), private$sock)
  writeBin(private$raw_terminated_string(name), private$sock)
  return(list(info = private$info, success = self$bool_test_sock()))
}

If this is correct, where can I find (a list with) the required byte-codes?

Ben Engbers




Re: [basex-talk] baseX vs ExistDB

2018-04-18 Thread Ben Engbers
Hi,

Look at http://vschart.com/compare/basex/vs/exist-db

If you want, you can add other comparisons

Cheers,
Ben

Op 18-04-18 om 16:34 schreef Alexander Holupirek:
>> On 18. Apr 2018, at 15:39, Feargal Hogan  wrote:
>>
>> Hi
>>
>> Is anyone aware of any comparisons between baseX and Exist?
>> I have some familiarity with Exist and I’d like o understand what are the 
>> benefits of each.
>>
>> Thanks
>>
>> Feargal
> 
> Both are, of course, excellent systems.
> Do you have something special in mind that you would like to compare?
> Besides, I'm not aware of a general feature comparison site or something like 
> that.
> 
> Cheers,
>   Alex
> 
> 



[basex-talk] RFC, Client for R

2018-04-18 Thread Ben Engbers
Hi,

Last month I have been working on a R client. Results from my work are
attached.

The 'add, replace and store'-commands should be working but haven't been
tested yet since I don't have any good example-commands at hand.

I'm looking forward for comments!

Ben Engbers
 
[[1]]
 [1] "General Information:"   " Version: 9.0"   
  
 [3] " Used Memory: 38 MB"""
  
 [5] "Global options:"" AUTHMETHOD: Basic"  
  
 [7] " CACHETIMEOUT: 3600"" DBPATH: 
/home/bengbers/Programs/basex/data"   
 [9] " DEBUG: false"  " FAIRLOCK: false"
  
[11] " HOST: localhost"   " HTTPLOCAL: false"   
  
[13] " IGNORECERT: false" " IGNOREHOSTNAME: false"  
  
[15] " KEEPALIVE: 600"" LANG: English"  
  
[17] " LANGKEYS: false"   " LOG: true"  
  
[19] " LOGMSGMAXLEN: 1000"" LOGPATH: .logs" 
  
[21] " NONPROXYHOSTS: "   " PARALLEL: 8"
  
[23] " PARSERESTXQ: 3"" PASSWORD: " 
  
[25] " PORT: 1984"" PROXYHOST: "
  
[27] " PROXYPORT: 0"  " REPOPATH: 
/home/bengbers/Programs/basex/repo" 
[29] " RESTPATH: "" RESTXQPATH: "   
  
[31] " SERVERHOST: "  " SERVERPORT: 1984"   
  
[33] " STOPPORT: 8985"" TIMEOUT: 30"
  
[35] " USER: "" WEBPATH: 
/home/bengbers/Programs/basex/webapp"
[37] ""   "Local options"   
  
[39] " ADDARCHIVES: true" " ADDCACHE: false"
  
[41] " ADDRAW: false" " ARCHIVENAME: false" 
  
[43] " ATTRINCLUDE: " " ATTRINDEX: true"
  
[45] " AUTOFLUSH: true"   " AUTOOPTIMIZE: false"
  
[47] " BINDINGS: "" CASESENS: false"
  
[49] " CATFILE: " " CHECKSTRINGS: true" 
  
[51] " CHOP: true"" COMPPLAN: true" 
  
[53] " COPYNODE: true"" CREATEFILTER: *.xml"
  
[55] " CREATEONLY: false" " CSVPARSER: "
  
[57] " DEFAULTDB: false"  " DIACRITICS: false"  
  
[59] " DOTCOMPACT: false" " DOTPLAN: false" 
  
[61] " DTD: false"" ENFORCEINDEX: false"
  
[63] " EXPORTER: "" FORCECREATE: false" 
  
[65] " FTINCLUDE: "   " FTINDEX: false" 
  
[67] " HTMLPARSER: "  " INLINELIMIT: 100"   
  
[69] " INTPARSE: false"   " JSONPARSER: "   
  
[71] " LANGUAGE: en"  " LSERROR: 0" 
  
[73] " MAINMEM: false"" MAXCATS: 100"   
  
[75] " MAXLEN: 96"" MAXSTAT: 30"
 

  1   2   >