Re: [basex-talk] executing xquery on basex

2016-10-16 Thread Christian Grün
Hi Genneva,

Looks weird indeed; we haven’t encountered this before. It seems as if
the client string is modified before it is written to the server. Do
you use a single BaseX client instance in parallel? Do you have
parallel calls at all? And, the usual question.. Can you provide us
with a minimized code example?

Christian



> We have seen some weird issue with basex only when executing xquery via java
> client (the basexclient.java from
> https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/org/basex/examples/api/BaseXClient.java)
> . And I was wondering if you would have any tips.
>
> The issue is happening when executing xquery to execute the function
> fn:current-dateTime()
>
> We have logs the queries that we are sending prior to calling the execute
> method in BasexClient.java, so we know that the xquerys are constructed
> correctly. However, we have been receiving inconsistent results from execute
> this xquery. Below are some examples of the basex logs.
>
> 05:42:40.907   172.16.0.43:51776adminREQUEST
> XQUERY fn:current-dateTime()  0.11 ms
>
> 05:42:40.908   172.16.0.43:51776adminOK   Query
> executed in 0.18 ms. 0.77 ms
>
> 05:42:44.787   172.16.0.35:56245adminREQUEST
> ̺°Ì±ïERY fn:current-dateTime() 1.3 ms
>
> 05:42:44.787   172.16.0.35:56245adminERROR   Stopped at
> , 1/6: Unknown command: ̺°Ì±ïERY. Did you mean 'ALTER'?0.03
> ms
>
>
>
> In this basex log, as you can see, that the first time that the query
> "XQUERY fn:current-dateTime()"went successful, and somehow the second time
> the the query was being altered (in red above) and the query was not able to
> recognize the query from basex. We were not able to reproduce this issue
> consistently.
>
> We have cross check the log on our applications so we know that the query
> has been constructed the same prior to making calls to basex.
>
> Any thoughts would be greatly appreciated. Thanks.
>
> Regards,
> -Genneva
>
>
>
>


Re: [basex-talk] Large CSV File (~1M rows)

2016-10-16 Thread Shaun Flynn
That worked! Thank you.

On 16 Oct 2016 10:13, "Shaun Flynn"  wrote:

> Thank you -- I will give a go and let you know.
>
> SCF xx
>
> On 16 Oct 2016 10:02, "Christian Grün"  wrote:
>
>> > The parsing runs out memory. I also tried doubling the memory to
>> 1024MB, and
>> > still the same issue.
>>
>> Memory consumption should be much lower if you create a database from
>> your input (e.g. via the GUI).
>>
>> Does this help?
>> Christian
>>
>>
>>
>> >
>> > Kind regards
>> >
>> >
>> > On 16 Oct 2016 09:15, "Christian Grün" 
>> wrote:
>> >
>> > Hi Shaun,
>> >
>> >> I do this through the MS DOS prompt, but -Xmx does not appear to be a
>> >> parameter any more,
>> >
>> > If you work with the ZIP distributions of BaseX, you can adjust the
>> > memory setting in the start scripts (in the bin directory). Otherwise,
>> > you’ll need to pass it to to java, not BaseX itself.
>> >
>> >> Is there a better method for parsing large CSV files? I then want to
>> add
>> >> the
>> >> resulting file tlk.xml to a new database.
>> >
>> > Did you check if it’s the CSV parsing or the fn:put call that causes the
>> > error?
>> > Christian
>> >
>> > [1] http://docs.basex.org/wiki/Database_Module#db:add
>> >
>> >
>>
>


Re: [basex-talk] Large CSV File (~1M rows)

2016-10-16 Thread Shaun Flynn
Thank you -- I will give a go and let you know.

SCF xx

On 16 Oct 2016 10:02, "Christian Grün"  wrote:

> > The parsing runs out memory. I also tried doubling the memory to 1024MB,
> and
> > still the same issue.
>
> Memory consumption should be much lower if you create a database from
> your input (e.g. via the GUI).
>
> Does this help?
> Christian
>
>
>
> >
> > Kind regards
> >
> >
> > On 16 Oct 2016 09:15, "Christian Grün" 
> wrote:
> >
> > Hi Shaun,
> >
> >> I do this through the MS DOS prompt, but -Xmx does not appear to be a
> >> parameter any more,
> >
> > If you work with the ZIP distributions of BaseX, you can adjust the
> > memory setting in the start scripts (in the bin directory). Otherwise,
> > you’ll need to pass it to to java, not BaseX itself.
> >
> >> Is there a better method for parsing large CSV files? I then want to add
> >> the
> >> resulting file tlk.xml to a new database.
> >
> > Did you check if it’s the CSV parsing or the fn:put call that causes the
> > error?
> > Christian
> >
> > [1] http://docs.basex.org/wiki/Database_Module#db:add
> >
> >
>


Re: [basex-talk] Large CSV File (~1M rows)

2016-10-16 Thread Christian Grün
> The parsing runs out memory. I also tried doubling the memory to 1024MB, and
> still the same issue.

Memory consumption should be much lower if you create a database from
your input (e.g. via the GUI).

Does this help?
Christian



>
> Kind regards
>
>
> On 16 Oct 2016 09:15, "Christian Grün"  wrote:
>
> Hi Shaun,
>
>> I do this through the MS DOS prompt, but -Xmx does not appear to be a
>> parameter any more,
>
> If you work with the ZIP distributions of BaseX, you can adjust the
> memory setting in the start scripts (in the bin directory). Otherwise,
> you’ll need to pass it to to java, not BaseX itself.
>
>> Is there a better method for parsing large CSV files? I then want to add
>> the
>> resulting file tlk.xml to a new database.
>
> Did you check if it’s the CSV parsing or the fn:put call that causes the
> error?
> Christian
>
> [1] http://docs.basex.org/wiki/Database_Module#db:add
>
>


Re: [basex-talk] Large CSV File (~1M rows)

2016-10-16 Thread Shaun Flynn
Hello there,

The parsing runs out memory. I also tried doubling the memory to 1024MB,
and still the same issue.

Kind regards

On 16 Oct 2016 09:15, "Christian Grün"  wrote:

Hi Shaun,

> I do this through the MS DOS prompt, but -Xmx does not appear to be a
> parameter any more,

If you work with the ZIP distributions of BaseX, you can adjust the
memory setting in the start scripts (in the bin directory). Otherwise,
you’ll need to pass it to to java, not BaseX itself.

> Is there a better method for parsing large CSV files? I then want to add
the
> resulting file tlk.xml to a new database.

Did you check if it’s the CSV parsing or the fn:put call that causes the
error?
Christian

[1] http://docs.basex.org/wiki/Database_Module#db:add


Re: [basex-talk] Creating more than a million databases per session: Out Of Memory

2016-10-16 Thread Christian Grün
Hi Bram,

I second Marco in his advise to find a good compromise between single
databases and single documents.

Regarding the OOM, the stack trace could possibly be helpful for
judging what might go wrong in your setup.

Cheers
Christian


On Sat, Oct 15, 2016 at 4:19 PM, Marco Lettere  wrote:
> Hi Bram,
> not being much into the issue of creating databases at this scale I'm not
> sure whether the OOM problems you are facing are related to Basex of JVM
> actually.
> Anyway something rather simple you could try is to behave "in between".
> Instead of opening a single session for the create statements alltogether or
> one session for each and every create, you could split your create
> statements in chunks of 100/1000 or the like and distribute them over
> subsequent (or maybe even parallel?) sessions
> I'm not sure whether this is applicable for your use case though.
> Regards,
> Marco.
>
>
> On 15/10/2016 10:48, Bram Vanroy | KU Leuven wrote:
>
> Hi all
>
>
>
> I’ve talked before on how we restructured our data to drastically improve
> search times on a 500 million token corpus. [1] Now, after some minor
> improvements, I am trying to import the generated XML files into BaseX. The
> result would be 100,00s to millions of BaseX databases – as we expect. When
> doing the import, though, I am running into OOM errors. We put our memory
> limit on 512MB. The thing is that this seems incredibly odd to me: because
> we are creating so many different databases, which are all really small as a
> consequence, I would not expect BaseX to need to store much in memory. After
> each database is created, the garbage collector can come along and remove
> everything that was needed for the previously generated database.
>
>
>
> A solution, I suppose, would be to close and open the BaseX session on each
> creation but I’m afraid that (on such a huge scale) the impact on speed
> would be too large. How it is set up now, in pseudo code:
>
>
>
> 
>
>
>
> $session = Session->new(host, port, user, pw);
>
>
>
> # @allFiles is at least 100,000 items
>
> For $file (@allFiles) {
>
> $database_name = $file . “name”;
>
> $session->execute("CREATE DB $database_name file ");
>
> $session->execute("CLOSE");
>
> }
>
>
>
> $session->close();
>
>
>
> 
>
>
>
> So all databases are created on the same session which I believe causes the
> issue. But why? What is still required in memory after ->execute(“CLOSE”)?
> Are the indices for the generated databases stored in memory? If so, can we
> force them to write to disk?
>
>
>
> ANY thoughts on this are appreciated. Enlightenment on how what is stored in
> a Session’s memory is useful as well. Increasing the memory should be a last
> resort.
>
>
>
>
>
> Thank you in advance!
>
>
>
> Bram
>
>
>
>
>
> [1]:
> http://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Workshop-CMLC2%20Proceedings-rev2.pdf#page=20
>
>
>
>


Re: [basex-talk] Making connections to a database

2016-10-16 Thread Christian Grün
Hi Thomas,

> Am I doing this correctly, or should the application server connect then
> immediately disconnect from BaseX to serve each individual user query
> (in the same way that PHP works with MySQL)?

In general, it should be possible to send multiple commands and
queries via an open connection. As I don’t know what exactly your are
doing, feel free to provide me with a minimized example.

Best,
Christian


Re: [basex-talk] (no subject)

2016-10-16 Thread Christian Grün
Hi Shaun,

> I do this through the MS DOS prompt, but -Xmx does not appear to be a
> parameter any more,

If you work with the ZIP distributions of BaseX, you can adjust the
memory setting in the start scripts (in the bin directory). Otherwise,
you’ll need to pass it to to java, not BaseX itself.

> Is there a better method for parsing large CSV files? I then want to add the
> resulting file tlk.xml to a new database.

Did you check if it’s the CSV parsing or the fn:put call that causes the error?
Christian

[1] http://docs.basex.org/wiki/Database_Module#db:add