Re: Slow TDB writing

Zak Mc Kracken Fri, 05 Jan 2018 04:05:49 -0800

Hi Andy, thanks for this.

I found a way to use the TDBLoader programmatically:
https://github.com/marco-brandizi/rdfutils/blob/master/rdfutils-jena/src/main/java/info/marcobrandizi/rdfutils/jena/elt/TDBLoadingHandler.java#L44


I use this class in a way similar to this:
https://github.com/marco-brandizi/rdfutils/blob/master/rdfutils-jena/src/test/java/info/marcobrandizi/rdfutils/jena/elt/RDFImporterTest.java#L95
https://github.com/marco-brandizi/rdfutils/blob/master/rdfutils-jena/src/main/java/info/marcobrandizi/rdfutils/jena/elt/RDFImporter.java

More in detail, I have an application that creates Jena's (default)Model instances from converting non-RDF data, then it sends such modelsto the TDBLoadingHandler above for saving data into a common TDB. As youcan see, now I've serialised the writings and I'm fine with it(multi-threaded writing doesn't seem supported in TDB).

Following your comments, I've also tried both the version above inTDBLoadingHandler, which uses the bulk loader, and the one that uses thetransactional approach (the commented code in the TDBLoadingHandler).Initially I was sending Model instances with a size of 10-30k triplesand, in that case, the transactional way seems to be quite slower, nowI'm working with 500k triple chunks and indeed this way the speeds inthe two cases are pretty similar, but I get memory overflow errors whenI try to use the transactional version (while the bulk loading worksfine). So, I think I will keep the bulk loader method for the timebeing, at least if my code above is correct enough.


Marco.


On 03/01/2018 10:20, Andy Seaborne wrote:

Marco,

There are some things that are no clear:

* 2G of Turtle - how many triples?
* where does the multithreading come into it?
* what's 'model'?
* what's the setup? hardware? jena version? java heap size?
Bulk loading is faster on an empty database, otherwise it isn't doinganything special. Transactions are safer.
One single write transaction for all the data to be added would be theright way to do it. A transaction has some overhead so very smalltransactions incur costs but with a reasonable size, x10 is not likely.
    Andy

On 02/01/18 22:13, Zak Mc Kracken wrote:
Hi all again,
I'm trying to ask again about the problem below. It surprises me thatis so slow (like 2k triples/s, when the tdbloader tool is 10x faster).I've tried to use org.apache.jena.tdb.TDBLoader, but I haven't clearhow to obtain a DatasetGraphTDB or a GraphNonTxnTDB from the path ofa TDB location.
Thanks in advance for any help,
Marco

On 23/12/2017 13:09, Zak Mc Kracken wrote:
Hi all,
I've an application where exporting threads are producing Modelinstances of a pre-configured size, then I want to write thosemodels into a TDB.
For the moment, I'm using this (I believe, rather canonical) codefor the writing (dataSet is shared between threads:
        this.dataSet.begin ( ReadWrite.WRITE );
        try {
            Model dmodel = this.dataSet.getDefaultModel ();
            dmodel.add ( model );
            this.dataSet.commit ();
        }
        finally {
            this.dataSet.end ();
        }
It's extremely slow. When tested with about 2G of Turtle data, it isstill running after hours. The same data exported to a .ttl and thenloaded with tdbloader take a couple of minutes. Am I doing somethingwrong? Is the transactional approach inherently slower? Should Icall the TDBLoader instead (the one used by the command line tool)?
Note that I'm OK to force everything to one thread only (or toserialise a couple of threads, as the code above seem to enforce),the application is going to be used in different export use casesand in some of them it will be truly parallel (eg, saving data ondifferent files).
Thanks in advance for any help.

Marco.

Re: Slow TDB writing

Reply via email to