from:"George Sofianos"

Re: [basex-talk] Stack Overflow: Try tail recursion

2019-02-18 Thread George Sofianos


You can also increase the JVM stack size for the BaseX GUI.
Example, if you are on Linux go to the BaseX bin directory, edit the 
basexgui file, and change these lines


# Options for virtual machine (can be extended by global options)
BASEX_JVM="-Xmx6g -Xss4m $BASEX_JVM"

There isn't really a specific number you need to use. The default is 1MB 
on a 64bit PC.

More information about Xss JVM flag on [1]

[1] 
https://stackoverflow.com/questions/3700459/how-to-increase-the-java-stack-size


On 2/18/19 5:24 PM, Giuseppe G. A. Celano wrote:


I am writing a recursive function which is similar to the one here:

https://stackoverflow.com/questions/27702718/to-add-values-in-cumulative-format

Interestingly, local:sum() works if there are not many . However with 38000 book 
element I get the error "Stack Overflow: Try tail recursion".

Any idea?

Ciao,
Giuseppe

Re: [basex-talk] If clause

2019-02-18 Thread George Sofianos

It is for BaseX, take a look at [1]. There is also an elvis operator on 
the same wiki page. Personally I don't like to deviate from 
specifications so I try to avoid it.


[1] - http://docs.basex.org/wiki/XQuery_Extensions#If_Without_Else

On 2/18/19 4:19 PM, Giuseppe G. A. Celano wrote:

Hi,

I see that in BaseX 9.1.2 an expression such as "if (3) then 4 " does 
not raise an error, even if the "else" part is missing. Is this correct?


Ciao,
Giuseppe


Best,

George

Re: [basex-talk] starts-with in a satisfies statement fails to fail

2019-02-13 Thread George Sofianos


Hi again,

My mistake, I failed to see an error in my code which returned empty 
elements, which resulted in the quantifier not executing. Sorry for 
wasting your time and have a good evening!


On 2/13/19 6:19 PM, Christian Grün wrote:

I'm not sure if you can find anything from the Query plan for why this is 
happening. This fails either with or without inlining.
Regards,

Difficult to tell without investing some considerable time I guess…
How does the original query look like that produces this query plan?
Maybe we still have a chance to get this reproducible?

Re: [basex-talk] starts-with in a satisfies statement fails to fail

2019-02-13 Thread George Sofianos


Hi Christian,

On 2/13/19 6:19 PM, Christian Grün wrote:

Difficult to tell without investing some considerable time I guess…
How does the original query look like that produces this query plan?
Maybe we still have a chance to get this reproducible?


There are just two function involved in this.

declare function common:string($elements) as xs:string* {
    let $values := data($elements)
    let $values := for $i in $values return 
lower-case(normalize-space(string($i)))

    return $values
};

declare function common:startsWith($elements, $string) as xs:boolean {
    let $string := common:string($string) => trace("STRING: ")
    let $data := common:string($elements)
    return some $i in $data satisfies starts-with($i, $string) => 
trace("SATISFY: ")

};

The main query is 5823 lines so it's not very easy to find out what's 
going on. My assumption is that for some reason the Quantifier 
expressions get optimized to ignore errors. What I noticed is that if I 
change the following:


satisfies starts-with($i, $string)
to
satisfies 1 = "a"

It will finally fail as expected, even with the optimized final query 
plan.I will keep trying to create a reproducible example. The 
alternative is to send you the main query along with the source XML file.

Best,

George

[basex-talk] starts-with in a satisfies statement fails to fail

2019-02-13 Thread George Sofianos


Hi,

I have a specific piece of code which fails to run as expected. 
Unfortunately i didn't manage to create an MCVE so far.

The part of code that should fail in my script is:

    some $i in $data satisfies starts-with($i, $string)

Where $data is a list of strings, and $string is a sequence of two 
strings - example ("S1", "S2"). The Query executes without error, but 
the trace message is never visible.
The part of the Query plan that executes successfully (while it 
shouldn't) is:


  arg0="elements" arg1="string">

    
  
    
    
  type="xs:string*">

    
  
    
  
  STRING: 
    
  
  
    
  
    
    type="xs:string*">

  
    promote="true"/>

  
    
  
  
    type="xs:boolean" size="1">

  
    
  
  
    
  
    
    SATISFY: 
  
    
  
    
  

I'm not sure if you can find anything from the Query plan for why this 
is happening. This fails either with or without inlining.

Regards,

George

[basex-talk] fn:serialize() behaviour

2019-01-29 Thread George Sofianos


Hi,

This is probably a non issue, but I thought I should report it anyway. I 
was playing around with serialization options today and I noticed that:


let $head := 
let $body := 
return serialize(($head, $body), map { "method": "html", "version": "5.0"})

will return






I don't think fn:serialize() is defined in the xquery spec so it's 
implementation specific so I guess it also could be correct :)


Also I have a question, I remember in the past a discussion about need 
for extra testing (XQuery spec wise) in BaseX? Is this still an issue? 
Hopefully I can find some time and help outwith that.


Regards,

George

Re: [basex-talk] BaseX 9.1.2: Maintenance Release

2019-01-25 Thread George Sofianos


Hi Christian,

If I'm not mistaken it only works if you declare it inside the query, 
which is not an option for my use case, because the scripts can run in 
different environments with paths that are setup in environment variables.


This probably only affects the QueryProcessor constructor. But I will 
try to doublecheck that and send an email again if it's still doing the 
same.



On 1/25/19 2:40 PM, Christian Grün wrote:

Are you sure about that? If you run the following query (after
updating the base uri to your local environment)…

   declare base-uri 'file:///c:/users/user/desktop/';
   import module namespace a = 'a' at 'a.xqm';
   doc('x.xml')

…both the a.xqm module and the x.xml document will be looked up in the
specified directory. If you remove the trailing slash from the URI,
the files will be looked up in the parent directory (in accordance
with the official specs).

Feel free to provide us with such an example, we’ll be happy to
include it in our collection of Java examples.

Best,
Christian

Re: [basex-talk] BaseX 9.1.2: Maintenance Release

2019-01-23 Thread George Sofianos


Thanks Christian,

I think I managed to make it work, if I'm not missing anything. The 
problem was (the last time I tried it, over a year ago), that I need to 
work with relative paths, and location uris would not work properly. I 
managed to set it up today, by using a QueryProcessor constructor that 
includes a base-uri parameter.


The problem is, that when you call the QueryProcessor [1] the base URI 
should be set to a fake uri or file path, for example 
"/home/user/application/queries/fake". This way, both location URIs 
(import module declarations), and document requests to relative urls - 
doc(concat("../myxml.xml")) work. If base URI is just set as 
"/home/user/application/queries" - which I thought it would be correct - 
then only doc() function will look at the correct base-uri directory, 
while the import module statements, will try to import files from the 
parent of the base directory. I think it has something to do with how 
baseURI function works in StaticContext.java [2], but I'm not sure.


Maybe a Java example showing the correct usage of the base-uri, or a 
wiki page will help others in the future.


[1]: 
https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/org/basex/examples/local/BindVariables.java


[2]: 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/StaticContext.java


I'm happy I got this out of the way, my next email will be about some 
issues I found with parallel xquery processing :)


Regards,

George

On 1/23/19 3:04 PM, Christian Grün wrote:

Hi George,

Sorry, we have no plans to reintroduce the old option (to much has 
changed in our architecture). Maybe there are other solutions for your 
use case in the latest version?


Best
Christian

Re: [basex-talk] BaseX 9.1.2: Maintenance Release

2019-01-23 Thread George Sofianos

Oops, I just noticed it already is available. I need to stop looking 
into mvnrepository first. Sorry about that!


George

On 1/23/19 12:42 PM, George Sofianos wrote:

Hi Christian,

thanks for the new version. Will it be available in maven soon?

Also another question that I hate to make, but I have to, are there 
any plans to re-introduce QUERYPATH? This has been keeping me from 
upgrading to a more recent version than 8.4.4, but we have another 
need that requires 9.1.2 at the moment.


Regards,

George

Re: [basex-talk] BaseX 9.1.2: Maintenance Release

2019-01-23 Thread George Sofianos


Hi Christian,

thanks for the new version. Will it be available in maven soon?

Also another question that I hate to make, but I have to, are there any 
plans to re-introduce QUERYPATH? This has been keeping me from upgrading 
to a more recent version than 8.4.4, but we have another need that 
requires 9.1.2 at the moment.


Regards,

George

On 1/22/19 11:06 AM, Christian Grün wrote:

Hi all,

A new BaseX maintenance release is available. Minor bugs have been
fixed, and some performance tweaks have been added. In particular,
access to large WebDAV directories should be faster now.

As usual, you find the latest version on our homepage basex.org. Maven
artifacts have been added, various other distributions will follow
soon.

Have fun everyone,
Christian, BaseX Team

Re: [basex-talk] file:read-text-lines performance

2019-01-16 Thread George Sofianos

Just posting to say I'm having a lot of fun with the updated 
read-text-lines function.


On 1/16/19 1:37 PM, Christian Grün wrote:

This code will potentially create thousands or millions of Java
threads. Maybe you are getting better results by splitting your input
into 4 or 8 parts, and process each part in a dedicated function.


I refactored the code to the following, and it completes in 60 seconds, 
of which 20 are for counting the lines and only 40 seconds for parsing 
and returning the correct data!!! So I get a 3x improvement from 
multiple threads. I have no idea if it stresses the SSD at all.


let $file := "/path/to/large.txt"
let $count := prof:time(count(file:read-text-lines($file, "UTF-8", 
false())), "COUNTING: ")


let $cpus := 15
let $parts := ($count div $cpus) => xs:integer() => trace("PER CORE: ")

let $all :=
xquery:fork-join(
for $cpu in 0 to $cpus
return function() {
let $offset := $cpu * $parts
let $length := $parts

for $line in file:read-text-lines($file, "UTF-8", false(), $offset, $length)
return parse-json($line)?('obj1')?*?('obj2')?('obj3')
}) => prof:time("CALCULATING: ")
return distinct-values($all)


I would indeed assume that the following code…

distinct-values(
   for $line in file:read-text-lines($file, "UTF-8", false())
   return parse-json($line)?('object1')?*?('object2')?('object3')
)

…will be most efficient, even if you process files of 100 GB or more
(especially with the new, iterative approach).


Indeed, it is also using tiny amounts of memory and completes in the 
same time (120 seconds) with loading the whole file into memory  on a 
single core :)


George.

Re: [basex-talk] file:read-text-lines performance

2019-01-16 Thread George Sofianos

Thanks Christian, I will check the code examples you posted tonight, 
your explanation makes it easier to understand.


I can see there is a list with the deterministic functions in the specs 
[1] but not so sure about the BaseX specific functions. Is it possible 
to know if a function is deterministic or not?


I tried file:read-text-lines("/path.txt")  is 
file:read-text-lines("/path.txt") but it doesn't work.


George.

[1] - https://www.w3.org/TR/xpath-functions-31/#dt-deterministic

On 1/16/19 1:41 PM, Christian Grün wrote:

The reason for that: file:read-text-lines is a non-deterministic
function. Each invocation might yield different results (as the file
contents may change in the background). This is different with
non-deterministic function calls, such as fn:doc('abc.xml'). If you
call such a function repreatedly, it will always access the same
document, which has been opened and parsed by the first call of this
function.


1) return count(file:read-text-lines($file, "UTF-8", false()))

Here, file processing will be iterative.


2) let $data := file:read-text-lines($file, "UTF-8", false())
  return count($data)

The file contents will be bound to $data, and counted in a second
step. If the expression of your let clause was deterministic, the
variable would be inlined, and the resulting query plan would be
identical to the one of your first query.

Re: [basex-talk] file:read-text-lines performance

2019-01-15 Thread George Sofianos

There also looks to be a difference on how the read-text-lines is used. 
The following similar queries produce different Query paths, and have 
different memory usage. This is probably why I can't benefit from the 
update on more complex queries.


1) return count(file:read-text-lines($file, "UTF-8", false()))

Memory usage - about 20 megabytes

Query path:


  
    name="read-text-lines(path[,encoding[,fallback[,offset[,length)" 
type="xs:string*">
  type="xs:string">/home/lumiel/eworx/betmechs/bme/webservice/samples/betfair/September-2015/output.json

  UTF-8
  false
    
  


2) let $data := file:read-text-lines($file, "UTF-8", false())
    return count($data)

Memory  usage: 4.5GB

Query path:


  
    
  
  name="read-text-lines(path[,encoding[,fallback[,offset[,length)" 
type="xs:string*">

    /full/path/file.txt
    UTF-8
    false
  
    
    
  
    
  
    
  


On 1/15/19 1:48 PM, Christian Grün wrote:

Hi George,

I’m glad to announce that files are now processed in an iterative
manner [1,2]. That’s something I wanted to try a while ago, and your
mail was another motivation to get it done.

It works pretty fine: I reduced the JVM memory to a tiny maximum of
4mb, and I managed to count the line numbers of a file with several
gigabytes:

   count(file:read-text-lines('huge.txt'))

I’d be interested to hear if your code runs faster with the latest snapshot.
Christian

[1] http://files.basex.org/releases/latest/
[2] 
https://github.com/BaseXdb/basex/commit/cfb7a7965de85139ec9595a6e79a45d873da7c25

Re: [basex-talk] file:read-text-lines performance

2019-01-15 Thread George Sofianos


Hi Christian,

what I failed to mention last time was that I was using the offset / 
limit mode of the file:read-text-lines. I never tried to load the whole 
file into memory with the previous version, because I thought it would 
be inefficient. I just tried now with the latest snapshot using a single 
core and while the whole file is being loaded into memory (4GB+), the 
process completes in about 120 seconds, which is fine for me. Using the 
offset mode looks to still be more memory efficient (stays around 
1-1,3GB), but is very slow (both single core and multi core).


One issue, I can't make the non offset version work with fork-join. It 
fills the whole memory quickly, so I guess it reads the whole file into 
memory for each thread(?) - I tried up to 12GB. I've also noticed that 
in both versions (old and new snapshot), interrupting the fork-join mode 
will keep the threads running until I manually kill the BaseX process. 
Maybe I'm doing something wrong, or maybe I'm asking too much from 
fork-join :) I will try with the window clause tomorrow, maybe it will 
help. I'm posting an example of my code to help explain better my use 
case. For now, it is fine because I'm only reading a 4GB file, but 
potentially I might have to read up to 200GB files so having multi-core 
capabilities will help.


let $data := file:read-text-lines($file, "UTF-8", false())
let $count := count($data)

let $all :=
xquery:fork-join(
  for $i in $data return function() {
  parse-json($i)?('object1')?*?('object2')?('object3')
  }
)
return distinct-values($all)

Regards,

George

On 1/15/19 1:48 PM, Christian Grün wrote:

Hi George,

I’m glad to announce that files are now processed in an iterative
manner [1,2]. That’s something I wanted to try a while ago, and your
mail was another motivation to get it done.

It works pretty fine: I reduced the JVM memory to a tiny maximum of
4mb, and I managed to count the line numbers of a file with several
gigabytes:

   count(file:read-text-lines('huge.txt'))

I’d be interested to hear if your code runs faster with the latest snapshot.
Christian

[1] http://files.basex.org/releases/latest/
[2] 
https://github.com/BaseXdb/basex/commit/cfb7a7965de85139ec9595a6e79a45d873da7c25

Re: [basex-talk] file:read-text-lines performance

2019-01-15 Thread George Sofianos


Wow, thanks for your fast response! I will give it a try tonight,

George.

On 1/15/19 1:48 PM, Christian Grün wrote:

Hi George,

I’m glad to announce that files are now processed in an iterative
manner [1,2]. That’s something I wanted to try a while ago, and your
mail was another motivation to get it done.

It works pretty fine: I reduced the JVM memory to a tiny maximum of
4mb, and I managed to count the line numbers of a file with several
gigabytes:

   count(file:read-text-lines('huge.txt'))

I’d be interested to hear if your code runs faster with the latest snapshot.
Christian

[1] http://files.basex.org/releases/latest/
[2] 
https://github.com/BaseXdb/basex/commit/cfb7a7965de85139ec9595a6e79a45d873da7c25

Re: [basex-talk] file:read-text-lines performance

2019-01-15 Thread George Sofianos


Hi Christian,

On 1/15/19 12:43 PM, Christian Grün wrote:

What are your experiences with using a single thread? If memory
consumption is too exhaustive, you could play with the window clause
of the FLWOR expression [2,3]. It takes some time to explore the full
magic of this XQuery 3.0 extension (the syntax is somewhat verbose),
but it’s often a good alternative to complex functional code.


Using a single thread looks to be OK too, about 10k lines per second, 
and I'm not sure reading the same file with 16 threads (on SSD) is the 
way to go from an I/O point of view. Searching on stackoverflow there 
are many suggestions on how to read a file with one or multiple threads 
e.g [1]


I immediately return the data I need for each line (a small string for 
example) so the memory consumption is low, I have provided 12GB but I 
never see over 2-3GB of memory usage. My initial thoughts were that 
maybe garbage collection was causing delays but after profiling BaseX I 
don't think this is an issue. It's interesting to know about the window 
function though, I will certainly find a use for it. While I know most 
of these functions exist, I can always learn much more about a language. 
Only yesterday I managed to use fork-join successfully and I think it 
will save me a lot of time and effort for my use cases. I will post 
again if I have any updates, thanks again,


George.

[1]: 
https://stackoverflow.com/questions/40412008/how-to-read-a-file-using-multiple-threads-in-java-when-a-high-throughput3gb-s

[basex-talk] file:read-text-lines performance

2019-01-15 Thread George Sofianos


Hello,

I'm trying to read a 4GB text file with 5 million lines and parse its 
contents. I'm using file:read-text-lines function 
 to do 
that. I managed to use fork-join and use 16 CPU threads to read the 
whole file by reading 1 lines in each iteration, but it still takes 
500 seconds for parsing / analyzing the data. Using a profiler I can see 
that most of the time is wasted reading each line - method readline 
. 
I plan to make some changes on the code tonight and see if I can find a 
way to read it faster, but I thought I should also post it here in case 
you have any tips. I'm also very inexperienced with using profilers so I 
hope I read the output correctly :)


Regards,

George

Re: [basex-talk] prof:variables() and inlining

2019-01-14 Thread George Sofianos


On 1/14/19 5:08 PM, Christian Grün wrote:

Oh yes, many things are happening here; I am always surprised by
myself. Here is some background information:

• Up to now, the body of a function was always inlined if it is a
static value, no matter which value is currently assigned as inline
limit.
• BaseX 9 pre-evaluates your function body to a value (a so-called
"singleton sequence", to be exact; it is represented as “SingletonSeq”
in the query plan).
• As a result, the body was inlined with BaseX 9, whereas it was not
in previous versions.

The decision to ignore the inline limit if the function body is a
value has been taken 6 years ago, so I cannot recollect what was the
reason behind that. I decided to change this optimization rules,
though (in practice, hardly anyone will notice this, as it’s only
observable if the function inlining limit is changed). The behavior of
the latest snapshot may be easier to understand now [1].

Hope this helps
Christian

[1] http://files.basex.org/releases/latest/


Thanks Christian, I will use this snapshot, always happy to help testing :)

Re: [basex-talk] prof:variables() and inlining

2019-01-14 Thread George Sofianos




Do you have a self-contained example that shows the behavior?

Thanks in advance,
Christian


Actually looking at the compilation information it looks like something 
more is going on there. \


This example returns the following compilation information and optimized 
query:


declare %basex:inline(0) function local:test() {
  for $x in 1 to 200
  return 1
};

local:test()

Compiling:
- pre-evaluate range expression to range sequence: (1 to 200)
- pre-evaluate range sequence to singleton xs:string sequence: (1 to 
200) -> util:replicate("", 200)
- pre-evaluate util:replicate(items,count) to singleton xs:integer 
sequence: util:replicate(1, 200)
- pre-evaluate FLWOR expression to singleton xs:integer sequence: for 
$x_0 in util:replicate("", 200) retu... -> util:replicate(1, 200)

- inline local:test#0
Optimized Query:
util:replicate(1, 200)

-

But trying to return prof:variables() returns the following compilation 
information and optimized query:


for $x in 1 to 2 return prof:variables()

Compiling:
- pre-evaluate range expression to range sequence: (1 to 2)
- pre-evaluate range sequence to singleton xs:string sequence: (1 to 2) 
-> util:replicate("", 2)

Optimized Query:
for $x_0 in util:replicate("", 2) return prof:variables()


It looks like there is an issue both with prof:variables(), and maybe 
with inlining. Why is the compilation for the first example tries to 
inline the function even if I have defined it to 0? in BaseX 8.4.4 that 
doesn't happen.


Thanks again,

George

Re: [basex-talk] prof:variables() and inlining

2019-01-14 Thread George Sofianos


Do you have a self-contained example that shows the behavior?

Thanks in advance,
Christian


Sorry for not providing one, I thought it was easy to reproduce. Trying 
the example from the wiki:|for $x in 1 to 2 return prof:variables()|


|Will return empty strings in 9.1.1 while it returns valid result (1,2) 
in version 8.4.4 - So I guess it's not an issue with inlining but with 
prof:variables() itself.

|

|George.
|

[basex-talk] prof:variables() and inlining

2019-01-14 Thread George Sofianos


Hello, just something I noticed.

I was trying to check the variables of one of my functions, but it only 
returned an empty string. Reading [1] 
 gave me the 
impression that disabling inlining (locally or globally) would return 
these values. But it doesn't work for me in version 9.1.1 or latest 
snapshot. I tested it on 8.4.4 and it works, so either it is a 
regression or a changed behaviour. I usually open an issue on Github 
directly but I think you now prefer it in the mail list first.


Regards,

George

Re: [basex-talk] Scaling and Jsoniq

2018-12-25 Thread George Sofianos


On 12/22/18 8:26 PM, Ben Pracht wrote:
Second, I read about Jsoniq and was intrigued by it. It seemed like it 
supported JSON without requiring an XML intermediary. Having good JSON 
support would really help with frameworks like Angular or React.
Funny, I was thinking about posting the same thing last week, but at the 
end I didn't. I'm working with some huge amounts of JSON data lately and 
I'm using BaseX to parse the data, understand it, then export it to HTML 
for others to see. XQuery 3.1 has support for JSON via the parse-json 
function, but it converts it to maps so if you want to read a specific 
"node" you need to know the exact path for it, and it can be hard to 
request that. The alternative is to use the basex JSON module which 
converts it to XML, then use XPATH to read what you need. I'm not really 
sure what's the best solution is for something like this, but there are 
some xquery engines / databases who use jsoniq. e.g


1) VXQuery -> https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq

2) Zorba -> http://try.zorba.io/

3) Sparksoniq -> http://sparksoniq.org/

I really love Basex overall though and think it's underrated for what 
it can do.


I agree! It's a tool that is useful for many things and I recommend it 
every time.


Merry Christmas to everyone !

Regards,
George.

[basex-talk] Serialization options

2017-11-29 Thread George Sofianos


Hi,

What I'm trying to do is serialize a CSV with CRLF newlines in Linux 
using BaseX. It's not really important since my CSV parser supports both 
newlines, but maybe this discussion can help me understand how BaseX 
serialization works, or create an improvement for BaseX.


I'm running BaseX GUI (latest snapshot). I have a sequence of strings 
that are CSV. I'm using fn:serialize with an item-separator of xml 
entity  I'm then returning this output as a result of the script. 
This gives me about 200 lines of CSV. Copy pasting these lines into an 
editor, or using the Save button from the GUI, saves these values with 
an LF newline character.


The RFC [2] for CSV files recommends a CRLF character for CSV, so it 
would be nice if I can serialize this from BaseX directly. I tried some 
options from the wiki [1] but had no luck. File module also uses a 
system specific newline character. [3] Maybe this is something that 
could be a part of CSV serialization issue [4] , or maybe it is already 
possible to achieve this somehow.


Thanks,

George

[1|http://docs.basex.org/wiki/Serialization]
[2|https://tools.ietf.org/html/rfc4180#section-2]
[3|http://docs.basex.org/wiki/File_Module#file:write-text-lines]
[4|https://github.com/BaseXdb/basex/issues/1518]

Re: [basex-talk] [basex-announce] Moving on to Java 8?

2017-09-27 Thread George Sofianos

I think it's great that BaseX is moving to Java 8. However I don't have 
a dependency on Java 7 so my opinion might be biased :) Hopefully more 
people agree with that.


George


On 09/27/2017 02:19 PM, Christian Grün wrote:

Dear all,

Java 9 has just been released. As you may know, the BaseX code base is
still compatible with Java 7, because version 7 is still used in the
wild (we even got various user complaints when we upgraded from Java 6
just a few years ago).

There are several reasons why we will move along to Java 8 in the near future:

* It is more and more unsafe to use Java 7, because Oracle has stopped
support two years ago.

* From the developer point of view, there are only advantages when
working with more recent versions of a software: newer language
features and standard libraries can be used, the code base can be
reduced, etc.

* We would like to switch to the newest version of Jetty, which requires Java 8.

Before we approach further, we are interested in hearing your
reactions: Do you still work with Java 7? Would some of you require
longtime support for Java 7?

Thanks in advance and all the best,
Christian

Re: [basex-talk] Embedded BaseX

2017-07-14 Thread George Sofianos


Hi Carl,

Sorry I forgot to answer, it's always nice to see more opinions and 
ideas. We already have a system working on a monolithic application 
deployed in an application server, but I was creating a new context each 
time just to be sure, and now that I'm moving this on a separate service 
It was the perfect time to investigate how to make it better :)


I've also worked with the client / server approach and the dockerized 
version of BaseX, and while both are easy to work with, they don't cover 
all our needs. I'm hoping to implement an error resilient messaging 
queue or use reactive streams and make sure network issues won't affect 
the XQuery scripts execution.


George


On 07/11/2017 03:11 PM, Bondeson, Carl wrote:

I am using BaseX in a multithreaded environment, which is highly 
transactional. I ended up using a singleton class as the controller and all 
other threads execute queries against this singleton class. I have successfully 
implemented an ELR (Electronic Laboratory Reporting) messaging system which is 
in production. This system runs under JBoss so you should not have any issues 
using something like Wildfly  as an application server.

Carl R Bondeson
IT Analyst 3
Information Technology
Connecticut Department of Public Health
410 Capitol Ave
Hartford, CT 06134
Phone: 860-509-7434
carl.bonde...@ct.gov



-Original Message-
From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Tuesday, July 11, 2017 8:02 AM
To: George Sofianos <gsf.gre...@gmail.com>
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Embedded BaseX

Hi George,

It’s recommendable indeed to only create one instance of the Context class. 
Context instances are lightweight, but operations like transactions are 
centrally controlled by this class. QueryProcessor are usually created anew for 
each query evaluation.

Number of jobs… Just start in the Context class and follow the JobPool 
reference. Via the PARALLEL option [1], you can set a maximum limit of parallel 
database transactions. If you want to enforce that there are never more than 
transactions running, you can set FAIRLOCK [2] to true.

I’ve just updated the description of the PARALLEL option to indicate what has 
changed since the introduction of the FAIRLOCK option.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL
[2] http://docs.basex.org/wiki/Options#FAIRLOCK



On Tue, Jul 11, 2017 at 11:36 AM, George Sofianos <gsf.gre...@gmail.com> wrote:

Hello,

I'm building a web service that along with some other things (XML
Validation, Saxon XQuery, etc) includes a BaseX processor for running
XQuery scripts. I'm wondering what is the best way to use an embedded
BaseX processor. Can some objects be shared (e.g Context,
QueryProcessor), or are they very lightweight and there is no need to share / 
reuse ?
Can I get the number of the jobs running and the job status like in
the client / server mode? My intention is to make sure I can limit the
amount of the jobs running in parallel, and have more control over the
execution (maybe cancel jobs if necessary)

Thanks,

George

Re: [basex-talk] Embedded BaseX

2017-07-11 Thread George Sofianos


This is exactly what I was looking for, thanks!

George

On 07/11/2017 03:02 PM, Christian Grün wrote:

Hi George,

It’s recommendable indeed to only create one instance of the Context
class. Context instances are lightweight, but operations like
transactions are centrally controlled by this class. QueryProcessor
are usually created anew for each query evaluation.

Number of jobs… Just start in the Context class and follow the JobPool
reference. Via the PARALLEL option [1], you can set a maximum limit of
parallel database transactions. If you want to enforce that there are
never more than transactions running, you can set FAIRLOCK [2] to
true.

I’ve just updated the description of the PARALLEL option to indicate
what has changed since the introduction of the FAIRLOCK option.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL
[2] http://docs.basex.org/wiki/Options#FAIRLOCK



On Tue, Jul 11, 2017 at 11:36 AM, George Sofianos <gsf.gre...@gmail.com> wrote:

Hello,

I'm building a web service that along with some other things (XML
Validation, Saxon XQuery, etc) includes a BaseX processor for running XQuery
scripts. I'm wondering what is the best way to use an embedded BaseX
processor. Can some objects be shared (e.g Context, QueryProcessor), or are
they very lightweight and there is no need to share / reuse ?
Can I get the number of the jobs running and the job status like in the
client / server mode? My intention is to make sure I can limit the amount of
the jobs running in parallel, and have more control over the execution
(maybe cancel jobs if necessary)

Thanks,

George

[basex-talk] Embedded BaseX

2017-07-11 Thread George Sofianos


Hello,

I'm building a web service that along with some other things (XML 
Validation, Saxon XQuery, etc) includes a BaseX processor for running 
XQuery scripts. I'm wondering what is the best way to use an embedded 
BaseX processor. Can some objects be shared (e.g Context, 
QueryProcessor), or are they very lightweight and there is no need to 
share / reuse ?
Can I get the number of the jobs running and the job status like in the 
client / server mode? My intention is to make sure I can limit the 
amount of the jobs running in parallel, and have more control over the 
execution (maybe cancel jobs if necessary)


Thanks,

George

Re: [basex-talk] querypath alternative

2017-06-13 Thread George Sofianos

Sorry for the delay, I want to make sure first there is nothing wrong 
with my system. I noticed these scripts also fail on a local server 
(8.4.3) with file not found error (while the file exists), but they run 
fine on a BaseX GUI. I will reply again when I find out what's wrong - 
or not.


George


On 06/12/2017 08:09 PM, Christian Grün wrote:

Unless I'm doing
something very wrong, I get this error from my library module:

lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

Hi George, do you have a mini example that I can test out-of-the-box?
Thanks in advance.

Re: [basex-talk] querypath alternative

2017-06-12 Thread George Sofianos


Hi,

I'm trying to use a most recent version of BaseX server (I'm stuck with 
8.4.4 now), and I'm trying to use base-uri to import my modules (BaseX 
latest snapshot). However, I can't seem to make this work. While 
declaring the base-uri works for the XQuery scripts that runs on BaseX 
GUI, basexserver and basexhttp seems to be giving errors. Unless I'm 
doing something very wrong, I get this error from my library module:


lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

Also I'm declaring the base-uri in the main module only, like: declare 
base-uri "file:///home/user/directory/"; without the final slash it gets 
resolved to the previous directory, I guess that's intentional? I only 
found a reference in this closed issue: 
https://github.com/BaseXdb/basex/issues/1454


The alternative would be to bring QUERYPATH back, somehow :) I will then 
be able to upgrade without compatibility issues.


Thanks,

George

On 10/28/2016 02:06 PM, Christian Grün wrote:

Well, this will be difficult… We had to do numerous rewritings, and
the QUERYPATH option was kind of hacky (seen from today’s
perspective). We may be able adding something similar for specific use
cases like yours, but I can’t promise anything yet.

Re: [basex-talk] Combining modules into a single file

2017-06-02 Thread George Sofianos


Hi Christian,

Unfortunately I haven't used the RESTXQ yet, so I'm mostly talking about 
arbitary XQuery projects that can be executed using the java API for 
example. I've made an ant script that takes all XQuery in an envelope 
(each one has its own namespace) and bundles them into a single XQuery 
file, but I had some issues with it, mostly because ant is not very easy 
to use.


But in any case it would be helpful if somehow this can be done within 
BaseX, like a step before compiling, so it will only bundle files that 
are importing each other, and not everything in an envelope, then give 
an option to export that bundle.


George


On 06/02/2017 01:20 PM, Christian Grün wrote:

Hi George,

Bundling modules into a single file would surely be helpful. For
example, I would like to distribute the DBA as zipped file in future.
Currently, there is no such solution available, mostly because we did
not have time to think about all conceptual details. Suggestions are
welcome. Do you mostly think about RESTXQ applications, files in
repositories, or arbitrary XQuery projects that should also be
runnable outside the web context?

Christian


On Wed, May 31, 2017 at 11:13 PM, George Sofianos <gsf.gre...@gmail.com> wrote:

Hi, and thanks for your awesome engine again!

I wonder if there is an easy and complete way of combining xquery files into
a single file. To give an example, I have about 16-17 xquery files,  some
are only 50-100 lines but some can be up to 1500 lines.

Because of multiple reasons, (legacy web application behaviour that makes it
hard to deploy updates, network file system that doesn't always work), it is
preferable to have one single file to execute on the XQuery engine. I know
it can be done with ant for example, but it's not very easy to cover all
cases, so maybe someone has a better idea :) I don't mind implementing
something in a different language, as long as I know it will be doable.

Thanks,

George

[basex-talk] Combining modules into a single file

2017-05-31 Thread George Sofianos


Hi, and thanks for your awesome engine again!

I wonder if there is an easy and complete way of combining xquery files 
into a single file. To give an example, I have about 16-17 xquery 
files,  some are only 50-100 lines but some can be up to 1500 lines.


Because of multiple reasons, (legacy web application behaviour that 
makes it hard to deploy updates, network file system that doesn't always 
work), it is preferable to have one single file to execute on the XQuery 
engine. I know it can be done with ant for example, but it's not very 
easy to cover all cases, so maybe someone has a better idea :) I don't 
mind implementing something in a different language, as long as I know 
it will be doable.


Thanks,

George

[basex-talk] Very slow execution time on complex script

2017-03-08 Thread George Sofianos

I'm having a very difficult issue to resolve. I have an XQuery file with 
6325 lines that does very complex calculations / validations / etc. When 
I'm running this script on an XML file of about 60MB, it takes about 2 
hours to finish. I'm trying to find ways to debug this, and change the 
code where necessary so it will run faster and on larger files. I 
noticed that when I'm using TAILCALLS = -1 I need more than 4 MB stack 
size. (I've increased it to 100MB since it's just one thread anyway)


I'm trying to find out what I can improve in the code, but I can't 
understand how yet. I've used a profiler (yourkit) to see if I can get 
more information, but I'm not very experienced with profilers and I 
don't think any information from the profiler can help me fix the 
script. Compilation takes about 16 seconds with INLINELIMIT = 0 and 
MAINMEM = true, so the issue is in execution.


I'm wondering if multiple function calls that are not tail calls can 
create this issue, or maybe I need to change tail calls to tail 
recursion calls, and keep a TAILCALLS = 256. Some hints on how to 
improve performance here would be very welcome.


p.s I realized BaseX could have a very neat feature for the GUI, to have 
the option to only compile or compile + run an XQuery file. (if it 
doesn't already exist)


On 12/21/2016 03:56 PM, Christian Grün wrote:

I'm curious, what value do you recommend here? I've been using it for BaseX
with -Xss4m for a long time, but I'm sure that is too much.

Hm, good question ;) I think that the Java default setting (…which
also depends on your system configuration) is usually the best
tradeoff. In our own apps, we usually rewrite our XQuery code such
that there is no need for this flag (mostly because of convenience, to
ensure that it runs out-of-the-box when changing the system). If you
don’t experience any bottlenecks with 4m that you don’t encounter with
a smaller value, it’s probably a good choice.

Thanks,
Christian

Re: [basex-talk] Integrating IntelliJ IDEA and BaseX

2017-02-28 Thread George Sofianos


Hi,

as an IntelliJ IDEA user I'm only using IDEA and XQuery plugin to write 
the XQuery scripts, then execute them in BaseX gui while navigating the 
same directory. I'm using the plugin that is mentioned in the wiki page 
for syntax highlighting etc.  We are using BaseX processor as a web 
service so on my local environment I only use the BaseX gui. Most of the 
time you need BaseX gui anyway, in  order to run some quick Xpath 
expressions or retrieve parts of data from an XML file, so I don't 
bother setting up a server / client mode. I also have to change the 
inspections when setting up the plugin, because of a bug currently 
present that uses 100% cpu when inspecting unused variables in large 
XQuery files.


One thing that would be interesting, is to have debugging support in 
BaseX. I think recently Grzegorz added debugging support for Saxon.


George

On 02/28/2017 11:18 PM, Bridger Dyson-Smith wrote:

Hi all,

I'm hoping to get some feedback (and potential collaborators) on a 
wiki page I created[1] that attempts to help a user get started with 
IDEA, XQuery, and BaseX.


However!, I'm feeling a bit uncertain about my limited experience with 
workflows, tooling, and IDEA set up, and so I'm hoping that those of 
you on this list who use BaseX and IDEA might be willing to pass along 
feedback about the linked documentation.


Particularly, how are you interacting with BaseX both as a processor 
and as client/server? How are you starting and stopping your server? 
What pieces of IDEA are you leveraging to make working with XQuery and 
BaseX a(n even) better experience?


Thank you in advance for your time and trouble.
Best,
Bridger

[1] http://docs.basex.org/wiki/Integrating_IntelliJ_IDEA

[basex-talk] format-number rounding

2017-01-13 Thread George Sofianos

Hi there, I was wondering if there is a similar function to 
format-number, but without rounding, so I don't have to create a custom 
one that involves string manipulation.


For example I have two values:
let $x := 15.224134
let $y := 15.2249348734

The following command will create different output for the two values 
(result for $x will be 15.224, result for $y will be 15.225)


format-number(xs:decimal($x), "0.000"))

Thanks,

George

[basex-talk] Schema Validation issues

2016-12-30 Thread George Sofianos

Because of issues with schema validation (can't properly use doc() in 
the validate function - see 
https://github.com/BaseXdb/basex/issues/1324), I'm not using BaseX to do 
schema validation in my workflow.


But I needed it today to make some result comparisons with my own 
implementation of XML Validator, and I noticed that using the beta 
Xerces version which is documented in the wiki 
 can 
make it crash, so I thought I should report it here.


Other than that, I noticed one some XML files that it only reports fatal 
errors, and hides any other non-fatal error. I think it has something to 
do with the process 
 
method but I'm not sure.


Error:
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.6 beta baadd29
Java: Oracle Corporation, 1.8.0_112
OS: Linux, amd64
Stack Trace: 
java.lang.ArrayIndexOutOfBoundsException: 27
at org.apache.xerces.impl.xs.XSConstraints.overlapUPA(Unknown Source)
at org.apache.xerces.impl.xs.XSConstraints.overlapUPA(Unknown Source)
at 
org.apache.xerces.impl.xs.models.XSDFACM.checkUniqueParticleAttribution(Unknown 
Source)
at org.apache.xerces.impl.xs.XSConstraints.fullSchemaChecking(Unknown 
Source)
at 
org.apache.xerces.impl.xs.XMLSchemaValidator.handleEndElement(Unknown Source)
at org.apache.xerces.impl.xs.XMLSchemaValidator.endElement(Unknown 
Source)
at 
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 Source)
at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at 
org.apache.xerces.jaxp.validation.StreamValidatorHelper.validate(Unknown Source)
at org.apache.xerces.jaxp.validation.ValidatorImpl.validate(Unknown 
Source)
at javax.xml.validation.Validator.validate(Validator.java:124)
at 
org.basex.query.func.validate.ValidateXsd$1.process(ValidateXsd.java:97)
at org.basex.query.func.validate.ValidateFn.process(ValidateFn.java:120)
at org.basex.query.func.validate.ValidateXsd.errors(ValidateXsd.java:55)
at org.basex.query.func.validate.ValidateFn.report(ValidateFn.java:88)
at 
org.basex.query.func.validate.ValidateXsdReport.value(ValidateXsdReport.java:15)
at org.basex.query.func.validate.ValidateFn.iter(ValidateFn.java:53)
at org.basex.query.scope.MainModule.iter(MainModule.java:118)
at org.basex.query.QueryContext.iter(QueryContext.java:331)
at org.basex.query.QueryContext.cache(QueryContext.java:622)
at org.basex.query.QueryProcessor.cache(QueryProcessor.java:116)
at org.basex.core.cmd.AQuery.query(AQuery.java:87)
at org.basex.core.cmd.XQuery.run(XQuery.java:22)
at org.basex.core.Command.run(Command.java:255)
at org.basex.core.Command.execute(Command.java:93)
at org.basex.gui.GUI.exec(GUI.java:479)
at org.basex.gui.GUI.access$3(GUI.java:433)
at org.basex.gui.GUI$7.run(GUI.java:421)
Query:
validate:xsd-report(root())
Query plan:

Re: [basex-talk] TagSoup and html5 support

2016-12-21 Thread George Sofianos


Interesting. Is it possible to use it for converting HTML to XML?
I'm not really sure about that. It looks like it parses HTML into a DOM 
document object so I'm not sure if this can work with BaseX.

I see. So it probably sends requests headers like "Accept-Encoding:
x-compress; x-zip" to the server and unzips the result, is this right?
Yes, It sends the request with Accept-Encoding for gzip, retrieves the 
gzipped response, and then it unzips the content into a stream.

I don’t know much about HTTP caching so far, though.
HttpClient has support for some caching libraries, which means it can 
download the XML files into a custom disk storage, then just check if 
they have changed in every document request. In case the file hasn't 
changed on the server that supports HTTP caching, a 304 response is 
returned to the client, so it doesn't need to download the file a second 
time.

Re: [basex-talk] TagSoup and html5 support

2016-12-21 Thread George Sofianos


Would it also help us converting HTML5, or it is a general suggestion? ;)

Unfortunately no, it was a general suggestion :(
In our projects though, we are using https://jsoup.org/ and it works 
well, also very easy to use. I still prefer XPath over the CSS selectors.

Out of interest: Where would this come into play? When using
http:send-request, or also at other places?
I'm talking about calls that happend using XQuery 
doc(http://randomhost.rn/random.xml). I'm not sure if they request 
gzipped files. I think I've tested it once and it didn't.
For example trying to get a 233MB XML file using gzip compression, will 
only need to fetch 27.8MB (this is a random file, the compression may 
vary for different XML files). We are working with files that can be 
over 1GB, so it can make a difference in bandwidth and execution 
(compilation) time.

Re: [basex-talk] TagSoup and html5 support

2016-12-21 Thread George Sofianos


If we find working and light-weight alternatives, we could replace the
original distribution of TagSoup with the new solution. Suggestions
are welcome.
Speaking about suggestions, how do you feel about adding Apache 
HttpClient to BaseX? It can help with requesting gzipped XML files 
(which makes huge difference in large XML files), and could possibly use 
the http cache mechanism.


Regards,
George

Re: [basex-talk] Tail Recursion Error on startup

2016-12-21 Thread George Sofianos




I know some of you are waiting for BaseX 8.6, and I promised to make
it happen until end of this year. I am afraid we won’t make it in
time, but you can definitely expect the new version in January!

Thanks again for all the work.


The problem itself is a generic one and not limited to BaseX or
XQuery. Did you try rewrite your function calls to tail calls [1]?
Another alternative is to increase the stack size of Java (via the
-Xss flag).
I'm curious, what value do you recommend here? I've been using it for 
BaseX with -Xss4m for a long time, but I'm sure that is too much.

Re: [basex-talk] xquery for position

2016-12-01 Thread George Sofianos

Hi again. I actually now think there isn't anything wrong with the 
query, or basex, but the position I'm looking for is actually the same 
for every loop. Because it involved thousands of elements it was 
difficult to verify.


Thanks for your time,

George


On 12/01/2016 04:30 PM, Dirk Kirsten wrote:

Hi George,

you can't disable parts of the optimizer. I guess to be really able to
help you we will need to see the relevant part of the query and best an
SSCCE.

However, if the problem exists in Saxon and BaseX it is quite probably a
problem with your query instead of the optimizers fault. I would be
rather surprised to see that Saxon and BaseX have exactly the same bug.

Cheers
Dirk

[basex-talk] xquery for position

2016-12-01 Thread George Sofianos


Hi,

I'm having some issues with finding the position of a value in a for 
loop. My query is complex so I can't write it in an example here (at 
least for the moment). I think the XQuery optimizer messes up with it.


The result I get from the for loop is something like this for every 
return entry:


2113 - 11953 - 8760

where 2113 is the stuck $pos, 11953 is the count $count, and 8760 is the 
number of values in the sequence I'm looping on.


Any Ideas? Can I disable part of the optimizer and see if that is the 
problem ?


p.s1 I'm already using declare option db:inlinelimit '0'; because I've 
noticed some time ago it helps with compiling the XQuery files (without 
this, I have cases of very slow or freezing compilations).


p.s2 I've also had issues this with Saxon involving $pos.

Re: [basex-talk] xquery result limit

2016-11-12 Thread George Sofianos

These tips are great. I'm working with XQuery for over a year and I'm 
learning things every day. It would also be nice to have a wiki page 
with performance tips, as these things are hard to find ;)

Have a nice weekend,
George

As I overlooked in your last example, you did not use 'order by'.
Without sorting, only the number of requested results will be created,
no matter if you use the GUI or work on command-line. The most
prominent example of this is the following query (which would be
extremely slow and memory consuming otherwise):

   (1 to 1000)[1]

If you use 'order by', it’s always recommendable to only return the
minimum set of required of information, and create the full result in
the subsequent step:

   for $result in (
 for $y in ...lots of stuff...
 order by ...
 return $y
   )[position() = 1 to 100]
   return { $result }

One more trick: You can move your future result in a function and
evaluate it afterwards:

   for $func in (
 for $i in 1 to 10
 order by $i descending
 return function() { { $i } }
   )[position() = 1 to 5]
   return $func()

Hope this helps,
Christian

Re: [basex-talk] xquery result limit

2016-11-11 Thread George Sofianos




No problem! I am just asking because large results in the query will
first be cached before they are displayed in the GUI. On command-line,
single items will be iteratively output as soon as possible. As a
consequence, outputting zour 900,000 rows shouldn’t cause additional
overhead on command-line, but it will increase memory consumption in
the GUI.

Hope this helps,
C.
Thanks, that explains the memory consumption and the delay (about 8 
seconds) while outputing to the GUI window.
So If I get it right, when I use [position() = 1 to 100], only the first 
100 results are calculated? or all 900.000 rows are calculated, and I 
get the first 100 results? (imagine it is a complex query)


(for $x in $xml//something-complex[complex-xpath]
let $y := another-complex-function()
where (another-complex-comparison)
return

{$y}
)[position() = 1 to 100]

Re: [basex-talk] xquery result limit

2016-11-11 Thread George Sofianos




Do you run the query in the GUI or on command-line?

For even better performance, I recommend you to have a look at the
following HOF function:

http://docs.basex.org/wiki/Hof_Module#hof:top-k-by
I'm testing the scripts on GUI, I don't really use command line. I also 
run them on a basexhttp instance.
I will check it out, however I like to keep the scripts as close to the 
xquery spec as possible.

[basex-talk] xquery result limit

2016-11-11 Thread George Sofianos

Hi, I've found this very good answer to limiting results in xquery. 
http://stackoverflow.com/a/8900472/1951487
I like that it works, but I was wondering if you can explain what 
happens in the background?


Thanks,

George

Re: [basex-talk] broken pipe in parallel queries

2016-11-01 Thread George Sofianos


It could also be a network issue so I'm still investigating.

I haven’t experienced exceptions of this kind so far, but if it turns
out that BaseX is the black sheep, feel freeo to create an MVCE for
us.
In case  someone else has something similar, the problem was a timeout 
configuration of the load balancer. Thanks for pointing me to the right 
direction.

[basex-talk] broken pipe in parallel queries

2016-10-28 Thread George Sofianos

I'm having some issues with running some queries in parallel. If I run 
the script on one file, I get the expected result (the execution time is 
about 40 seconds for this specific script). If I run the same script on 
many XML files, at the same time, I get a broken pipe error. I'm using a 
java BasexClient to execute the queries, and I'm trying to execute the 
script on about 50 files at the same time (the .basex parallel 
configuration is the default). Could this be related to basex? It could 
also be a network issue so I'm still investigating.


10/28/2016 4:10:21 PM   Suppressed: java.net.SocketException: Broken pipe
10/28/2016 4:10:21 PM	at java.net.SocketOutputStream.socketWrite0(Native 
Method)
10/28/2016 4:10:21 PM	at 
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
10/28/2016 4:10:21 PM	at 
java.net.SocketOutputStream.write(SocketOutputStream.java:159)
10/28/2016 4:10:21 PM	at 
org.basex.io.out.BufferOutput.flush(BufferOutput.java:60)
10/28/2016 4:10:21 PM	at 
org.basex.io.out.BufferOutput.write(BufferOutput.java:54)
10/28/2016 4:10:21 PM	at 
org.basex.io.out.PrintOutput.write(PrintOutput.java:66)
10/28/2016 4:10:21 PM	at 
org.basex.io.out.ServerOutput.write(ServerOutput.java:31)

10/28/2016 4:10:21 PM   at java.io.OutputStream.write(OutputStream.java:116)
10/28/2016 4:10:21 PM	at 
org.basex.io.out.BufferOutput.flush(BufferOutput.java:60)
10/28/2016 4:10:21 PM	at 
org.basex.io.out.PrintOutput.flush(PrintOutput.java:141)
10/28/2016 4:10:21 PM	at 
org.basex.io.serial.OutputSerializer.close(OutputSerializer.java:82)
10/28/2016 4:10:21 PM	at 
org.basex.server.ServerQuery.execute(ServerQuery.java:143)

10/28/2016 4:10:21 PM   ... 2 more

Re: [basex-talk] querypath alternative

2016-10-28 Thread George Sofianos


You’ll indeed need to remove the location specifier in your main
module; after that, it should work (see our documentation for more
details).
I understand. However, this won't do for my use case. Let's say 
hypothetically I have about 500 modules, most of them library and some 
of them main modules. All of them are in the same directory, and some of 
them share the same namespace. Most of the library modules also import 
other library modules, so I will have to manually remove all location 
specifiers from every file. Also currently, most of the same scripts can 
also work with Saxon without any changes, but if I remove the specifiers 
this will probably make Saxon stop working. So I guess I will have to 
use base-uri for now.

Re: [basex-talk] querypath alternative

2016-10-28 Thread George Sofianos




Could you give me some details what went wrong?


Well even if I copy the modules in the repo directory, the main module 
still has a relative path in the import declaration, so basex is 
searching for the modules in "/srv/" directory in the docker container, 
or in the /bin directory if I run the basexserver from the terminal. If 
I remove the 'at "file.xq"' it is trying to do something different, but 
I still get this error: [XQST0059] Module not found: "namespace" both on 
docker container and on normal basex server.

Re: [basex-talk] standalone vs GUI character parsing

2016-10-27 Thread George Sofianos


I pass this on to the Docker aficionados on the list…

Christian


Thanks and sorry for responding on a month old post about the xml 
parser, I just noticed my email filters were not working.


About the QUERYPATH, I think the issue isn't specifically about docker. 
Maybe I'm missing something, but how can a basexclient execute  XQUERY 
"import module namespace test = "test" at "test.xq" if there isn't a 
querypath to define the directory for the modules? I'm trying this on a 
local server instance and it searches for the test.xq in the BaseX bin 
directory. I hope there is an alternative way to declare the path, 
because I won't be able to use BaseX any more from my java application, 
using the BasexClient query method.


Specifically about Docker, the older images can't run because of the .m2 
permissions, and the latest one is missing QUERYPATH.

Re: [basex-talk] standalone vs GUI character parsing

2016-10-27 Thread George Sofianos

What about characters that outside the UTF-8 scope? I think that still 
makes the internal parser to fail. I thought that was intended behaviour 
so I never mentioned it.


On 09/30/2016 03:10 PM, Christian Grün wrote:

By default, XML documents with invalid characters should be rejected;
but if you turn on the internal parser in the parsing tab of the
Database Creation dialog, all invalid characters will be replaced with
FFFD. Maybe that’s what you have done?

I also noticed that the QUERYPATH has been removed from latest builds, 
how can I set the Docker image to find xq modules? I was using the 
QUERYPATH to map them.

[basex-talk] discard-document equivalent

2016-07-22 Thread George Sofianos

Hi again, I wonder if there is an equivalent command to discard-document 
which is available in Saxon PE? I have a list of XML documents available 
in a remote repository, and I make the same checks in every XML in a for 
loop.


Example:

let $list := ("http://url1;, "http://url2;, "http://url3;, "http://url4;)

for $file in $list

return local:runChecks(doc($file))

If the documents are large, they will keep filling the memory. I just 
run a script that uses 42 files about 50mb each, and that consumed over 
2GB of memory. The checks are quite complex so I don't think that the 
fetch module would work here, if my understanding is correct about what 
it does. For more information about how discard-document works, see my 
question/answer on SO: 
http://stackoverflow.com/questions/34514859/using-discard-document-with-saxon-and-xquery/34516641#34516641


Thanks!

[basex-talk] Maximum number of hits

2016-07-21 Thread George Sofianos

I have changed Maximum number of hits to All, but still, BaseX result 
gets chopped some times. Is this a bug or an intended behaviour? My only 
problem is that I run XQuery scripts that need 200+ seconds to finish, 
so whenever this happens when I click save, they will have to run again 
in the background. I know there is an alternative way to run them, (for 
example using file:write) but I can't really use it at the moment.


My result is just 1 large  element, and the output method is HTML.

Thanks, George.

Re: [basex-talk] Schema validation

2016-07-19 Thread George Sofianos

Indeed, looking at the code seems it's already using it. I wonder what 
creates that delay though. I will have to investigate it a bit, probably 
by debugging BaseX, unless someone already knows. Could a SaxSource vs 
StreamSource be the issue? Or that doesn't affect performance? If my 
question is stupid, just ignore ;)


Thanks, George


On 7/19/2016 4:40 PM, Andy Bunce wrote:

Looks like it wants to use it [1]. You could try running below in the GUI:

Q{java:org.basex.util.Reflect}find("org.apache.xerces.jaxp.validation.XMLSchemaFactory")

/Andy

[1] 
https://github.com/BaseXdb/basex/blob/b8c1ae7738664aa3912ade783b8a01a0a2285d25/basex-core/src/main/java/org/basex/query/func/validate/ValidateXsd.java#L65

Re: [basex-talk] Schema validation

2016-07-19 Thread George Sofianos

Thanks, it looks like it's in the classpath. But is it actually used? I 
can't be sure. I have seen some strange things happening with Xerces 
versions in the past with Saxon.


Anyway, it would be great if BaseX can have a feature to change the 
validation options. Should I open a BaseX ticket about it? or is there 
already a way to set these.


https://xerces.apache.org/xerces2-j/features.html


On 7/19/2016 3:05 PM, Andy Bunce wrote:

Hi George,

Just on point #1
I think BaseX does not install Xerces. Entering the line below in the 
GUI will tell you the version from the JDK


Q{java:com.sun.org.apache.xerces.internal.impl.Version}getVersion()

For me this returns:
Xerces-J 2.7.1

If you have manually added Xerces to the classpath, then you can get 
the version by:

Q{java:org.apache.xerces.impl.Version}getVersion()

/Andy

[basex-talk] Schema validation

2016-07-19 Thread George Sofianos

Hi, I wonder what is the status of schema validation in BaseX? I have a 
Java web service that is used to validate some schemas, which is using 
xerces2 to validate XML files. I want to transfer some of this work to 
my XQuery scripts in BaseX, so I can minimize the bandwidth on large 
files (the XML files are being retrieved from a remote repository). I 
did try to put xercesImpl.java on the lib directory, the validation does 
run, but I'm not sure about these two things


1) Is the new version of xerces being used, or is the Java default one 
being used? Maybe it's possible to add a function to return the library 
being used?


2) My Java service (which is using xerces2) is running the validation in 
about 5 seconds, the same validation takes 32 seconds in BaseX 8.5.1



Any ideas, tips etc are welcome

Thanks, George.

[basex-talk] KILL session

2016-05-06 Thread George Sofianos

I'm trying to create an environment where I can have mutliple BaseX 
Servers, and I can run 2-3 complex queries on them at a time. However, I 
want to have the ability to stop a running XQuery script for any reason. 
I assumed that's what would happen if I KILL the session using a BaseX 
client. However when I try to KILL a session I have the following issues:


1) BaseX client freezes and I have to CTRL-C to stop it
2) The session I try to kill is removed from the list, but the process 
continues to run. I have to force kill -9 the basex server for it to stop.


Should I report this as a bug or is there another way to kill the 
running scripts? Thanks

57 matches

Mail list logo