Re: [basex-talk] HTTPServer + gzip compression

2019-10-01 Thread Stefan Koch
Hi Christian,Thx for your reply.I'm using the embedded Jett via basexhttp service.Not using RESTXQ - basic REST is what we need for this project.Good idea to ask the Jetty devs - I'll try that.kind regards,Stefan-Ursprüngliche Nachricht-Von: Christian Grün [mailto:christian.gr...@gmail.com]Gesendet: Dienstag, 1. Oktober 2019 14:47An: Stefan Koch Cc: BaseX Betreff: Re: [basex-talk] HTTPServer + gzip compressionHi Stefan,Do you work with an embedded Jetty instance (i.e., did you runbasexhttp), or do you use BaseX as servlet? In the latter case, thismay need to be tackled by the Jetty developers. Did you address thison their mailing list?Another alternative could be to include GZIP support in RESTXQ, andsend gzipped responses whenever the client send a correspondingAccept-Encoding header. I’ll have some more thoughts on that.Best,ChristianOn Tue, Sep 24, 2019 at 8:09 PM Stefan Koch  wrote:>> Hi,> I'm struggling to get gzip compression working.> I'm using the REST module.> http://docs.basex.org/wiki/REST> Version: BaseX 9.2.2> Tried to add the handler in jetty.xml as descriped here:> https://www.eclipse.org/jetty/documentation/current/gzip-filter.html> But it didn't work. Tried gzip filter via web.xml - but it is deprecated> since jetty 9.3.> Gzip handler is the correct way to do it.> Searched the mailling list, similar problem was reported here:> https://mailman.uni-konstanz.de/pipermail/basex-talk/2019-February/014160.html> Unfortunately no solution :(> Not much experience with embedded jetty - but adding gzip compression is> straight forward in tomcat or apache - no clue why it doesn't> work.Shouldn't be that hard - and it's kinda a default feature.> Did anybody get it to work? Any tipps?> Thanks> Stefan Koch>> B Solutions> Dipl.-Kfm. Rudolf Markus Petri> Lietzenburger Str. 77> 10719 Berlin> Tel: 0049 30 8867 6099> Fax: 0049 30 8867 6159> Mail: k...@buit-solutions.com> Web: www.buit-solutions.com


Re: [basex-talk] basex OOM on 30GB database upon running /dba/db-optimize/

2019-10-01 Thread Christian Grün
Hi first name,

If you optimize your database, the indexes will be rebuilt. In this
step, the builder tries to guess how much free memory is still
available. If memory is exhausted, parts of the index will be split
(i. e., partially written to disk) and merged in a final step.
However, you can circumvent the heuristics by manually assigning a
static split value; see [1] for more information. If you use the DBA,
you’ll need to assign this value to your .basex or the web.xml file
[2]. In order to find the best value for your setup, it may be easier
to play around with the BaseX GUI.

As you have already seen in our statistics, an XML document has
various properties that may represent a limit for a single database.
Accordingly, these properties make it difficult to decide for the
system when the memory will be exhausted during an import or index
rebuild.

In general, you’ll get best performance (and your memory consumption
will be lower) if you create your database and specify the data to be
imported in a single run. This is currently not possible via the DBA;
use the GUI (Create Database) or console mode (CREATE DB command)
instead.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Options#SPLITSIZE
[2] http://docs.basex.org/wiki/Configuration



On Mon, Sep 30, 2019 at 7:09 AM first name last name
 wrote:
>
> Hi,
>
> Let's say there's a 30GB dataset [3] containing most threads/posts from [1].
> After importing all of it, when I try to run /dba/db-optimize/ on it (which 
> must have some corresponding command) I get the OOM error in the stacktrace 
> attached. I am using -Xmx2g so BaseX is limited to 2GB of memory (the machine 
> I'm running this on doesn't have a lot of memory).
> I was looking at [2] for some estimates of peak memory usage for this 
> "db-optimize" operation, but couldn't find any.
> Actually it would be nice to know peak memory usage because.. of course, for 
> any database (including BaseX) a common operation is to do server sizing, to 
> know what kind of server would be needed.
> In this case, it seems like 2GB memory is enough to import 340k documents, 
> weighing in at 30GB total, but it's not enough to run "dba-optimize".
> Is there any info about peak memory usage on [2] ? And are there guidelines 
> for large-scale collection imports like I'm trying to do?
>
> Thanks,
> Stefan
>
> [1] https://www.linuxquestions.org/
> [2] http://docs.basex.org/wiki/Statistics
> [3] https://drive.google.com/open?id=1lTEGA4JqlhVf1JsMQbloNGC-tfNkeQt2


Re: [basex-talk] "Out of Memory" when inserting data from one DB to another

2019-10-01 Thread Christian Grün
Hi Michael,

Your query looks pretty straightforward. As you have already guessed, it’s
simply the big number of inserted nodes that causes the memory error.

Is there any chance to assign more memory to your BaseX Java process? If
not, you may need to write an intermediate XML document with the desired
structure to disk and reimport this file in a second step. You could also
call your function multiple times, and insert only parts of you source data
in a single run.

Hope this helps,
Christian


On Fri, Sep 27, 2019 at 12:05 PM BIRKNER Michael 
wrote:

> Hi to all,
>
> I get an "Out of Memory" error (using the BaseX GUI on Ubuntu Linux) when
> I try to insert quite a lot of data into a BaseX database. The use case: I
> have a database (size is about 2600 MB, 13718400 nodes) with information in
>  elements that should be added to  elements in another
> database. The s have a 1 to 1 connection identified by an ID that
> is available in both databases.
>
> An example (simplified) of the DB with the information I want to add to
> the other DB:
>
> 
>   
> 1
> Some data
> More data
> More data
> ...
>   
>   
> 2
> Some data
> More data
> More data
> ...
>   
>   
> 3
> Some data
> More data
> More data
> ...
>   
>   ... many many more s
> 
>
> Here an example (simplified) of the DB to which the above  elements
> should be added:
>
> 
>   
> 1
> Main data
> More main data
> More main data
> ...
> 
>   
>   
> 2
> Main data
> More main data
> More main data
> ...
> 
>   
>   
> 3
> Main data
> More main data
> More main data
> ...
> 
>   
>   ... many many more s
> 
>
> This is the XQuery I use to insert the given  elements from the
>  in one database to the corresponding  in the other
> database. It results in an "Out of Memory" error:
>
> let $infoRecs := db:open('db-with-data')/collection/record
> let $mainRecs := db:open('db-to-insert-data')/collection/record
> for $infoRec in $infoRecs
>   let $id := data($infoRec/id)
>   let $mainRec := $mainRecs[id=$id]
>   let $dataToInsert := $infoRec/*[not(name()='id')]
>   return insert node ($dataToInsert) into $mainRec
>
> I assume that the error is a result of the large amount of data that is
> processed. My question is if a strategy exists to work with such an amount
> of data without getting an "Out of Memory" error?
>
> Thanks very much to everyone in advance for any hint and advice. If you
> need more information about DB setup or options just let me know.
>
> Best regrads,
> Michael
>
>
>


Re: [basex-talk] HTTPServer + gzip compression

2019-10-01 Thread Christian Grün
Hi Stefan,

Do you work with an embedded Jetty instance (i.e., did you run
basexhttp), or do you use BaseX as servlet? In the latter case, this
may need to be tackled by the Jetty developers. Did you address this
on their mailing list?

Another alternative could be to include GZIP support in RESTXQ, and
send gzipped responses whenever the client send a corresponding
Accept-Encoding header. I’ll have some more thoughts on that.

Best,
Christian




On Tue, Sep 24, 2019 at 8:09 PM Stefan Koch  wrote:
>
>  Hi,
> I'm struggling to get gzip compression working.
> I'm using the REST module.
> http://docs.basex.org/wiki/REST
> Version: BaseX 9.2.2
> Tried to add the handler in jetty.xml as descriped here:
> https://www.eclipse.org/jetty/documentation/current/gzip-filter.html
> But it didn't work. Tried gzip filter via web.xml - but it is deprecated
> since jetty 9.3.
> Gzip handler is the correct way to do it.
> Searched the mailling list, similar problem was reported here:
> https://mailman.uni-konstanz.de/pipermail/basex-talk/2019-February/014160.html
> Unfortunately no solution :(
> Not much experience with embedded jetty - but adding gzip compression is
> straight forward in tomcat or apache - no clue why it doesn't
> work.Shouldn't be that hard - and it's kinda a default feature.
> Did anybody get it to work? Any tipps?
> Thanks
> Stefan Koch
>
> B Solutions
> Dipl.-Kfm. Rudolf Markus Petri
> Lietzenburger Str. 77
> 10719 Berlin
> Tel: 0049 30 8867 6099
> Fax: 0049 30 8867 6159
> Mail: k...@buit-solutions.com
> Web: www.buit-solutions.com