[basex-talk] Creating more than a million databases per session: Out Of Memory

2016-10-15 Thread Bram Vanroy | KU Leuven
Hi all

 

I've talked before on how we restructured our data to drastically improve
search times on a 500 million token corpus. [1] Now, after some minor
improvements, I am trying to import the generated XML files into BaseX. The
result would be 100,00s to millions of BaseX databases - as we expect. When
doing the import, though, I am running into OOM errors. We put our memory
limit on 512MB. The thing is that this seems incredibly odd to me: because
we are creating so many different databases, which are all really small as a
consequence, I would not expect BaseX to need to store much in memory. After
each database is created, the garbage collector can come along and remove
everything that was needed for the previously generated database. 

 

A solution, I suppose, would be to close and open the BaseX session on each
creation but I'm afraid that (on such a huge scale) the impact on speed
would be too large. How it is set up now, in pseudo code:

 




 

$session = Session->new(host, port, user, pw);

 

# @allFiles is at least 100,000 items 

For $file (@allFiles) {

$database_name = $file . "name";

$session->execute("CREATE DB $database_name file ");

$session->execute("CLOSE");

}

 

$session->close();

 




 

So all databases are created on the same session which I believe causes the
issue. But why? What is still required in memory after ->execute("CLOSE")?
Are the indices for the generated databases stored in memory? If so, can we
force them to write to disk?

 

ANY thoughts on this are appreciated. Enlightenment on how what is stored in
a Session's memory is useful as well. Increasing the memory should be a last
resort.

 

 

Thank you in advance!

 

Bram

 

 

[1]:
http://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Workshop-CML
C2%20Proceedings-rev2.pdf#page=20

 



[basex-talk] (no subject)

2016-10-15 Thread Shaun Flynn
Hello there,

I am trying to parse a large CSV file into XML by the following xquery:

let $file := fetch:text("D:\BPLAN\tlk.txt")
let $convert := csv:parse($file, map { 'header' : true(), 'separator' :
'tab'})
return fn:put(
  
  {for $row in $convert/csv/record
  return {$row/*}
  },
  "D:\BPLAN\tlk.xml"
)

Using the GUI, it runs out of memory -- when I click on the bottom right
hand corner (where the memory usage is shown), it says to increase memory,
restart using -Xmx .

I do this through the MS DOS prompt, but -Xmx does not appear to be a
parameter any more,

Is there a better method for parsing large CSV files? I then want to add
the resulting file tlk.xml to a new database.

Kindest Regards
Shaun Connelly-Flynn


[basex-talk] Making connections to a database

2016-10-15 Thread Thomas Daly
Hello,

I'm developing an application server with node.js.  When the server 
starts, it creates a single connection to a BaseX database, to allow 
subsequent user queries to the database.

I notice that when I connect separately to the BaseX database with the 
BaseXClient command line tool, the application server connection dies 
with a 'broken pipe' error.

Am I doing this correctly, or should the application server connect then 
immediately disconnect from BaseX to serve each individual user query 
(in the same way that PHP works with MySQL)?

Thanks

Thomas