Re: RE: About fuseki2 load performance by java API

Laura Morales Thu, 18 Jul 2019 05:50:08 -0700

I had a similar problem when trying to load wikidata on my laptop with 8GB RAM, 
i7 CPU, 750GB HDD. It started fine but then slowed to a crawl after about 100 
million triples. I don't think CPU or RAM are the problem, it's probably to do 
with disk queues or caches or something like that. IIRC when Andy tried to load 
the same dataset on his PC with a 1TB SSD and 16GB RAM, he didn't have those 
problems. Bottom line: try with an SSD/NVMe instead of an HDD.


Besides, it would be nice to have a better way (parallelized) for loading huge 
datasets (trillions of triples).



> Sent: Thursday, July 18, 2019 at 2:08 PM
> From: "Scarlet Remilia" <knight.lost.parad...@outlook.com>
> To: "users@jena.apache.org" <users@jena.apache.org>
> Subject: RE: About fuseki2 load performance by java API
>
> Thank you for reply!
> 
> 
> 
> The server storage is HDD on local with RAID 10.
> 
> CPU is 4x 14 cores with 28 threads but only one core is used during the load.
> 
> The JVM of fuseki2 is tuned by adding -Xmx=50GB -Xms=50GB and TDB2 used is 
> also tuned by tuning cache size.
> 
> I observed disk IO by iostat, but it seems not utilized much disk IO and also 
> it is observed that memory usage of fuseki2 is increasing after loading every 
> 3 millions triples.
> 
> Fuseki2 is setup as a standalone server by the command below:
> 
> 
> 
> ./fuseki-server –tdb2 –loc=./tdb2dataset –port 2222  -update /fuseki2
> 
> 
> 
> Thank you very much!
> 
> 
> 
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
> 
> 
> 
> ________________________________
> From: Andy Seaborne <a...@apache.org>
> Sent: Thursday, July 18, 2019 6:41:56 PM
> To: users@jena.apache.org
> Subject: Re: About fuseki2 load performance by java API
> 
> That's quite slow. I get maybe 50-70K triples for a 100m load via the
> Fuseki UI.
> 
> The fastest way is to use the bulk loader directly to setup the
> database, then add it to Fuseki.
> 
> The hardware of the server makes a big difference. What's the server
> setup? Disk/SSD? Local or remote storage?
> 
>      Andy
> 
> You don't need the begin/commit in the client - the transaction is in
> the backend server.
> 
> On 18/07/2019 09:02, Scarlet Remilia wrote:
> > Hello everyone,
> > I want to load a hundred millions triple into TDB2-backend fuseki2 by Java 
> > API.
> > I used code below:
> >
> > Model model = ModelFactory.createDefaultModel();
> > model.add(model.asStatement(triple));
> > RDFConnectionRemoteBuilder builder = RDFConnectionFuseki.create()
> >                      .destination(FusekiURL);
> >              RDFConnection conn = builder.build();
> >              conn.begin(ReadWrite.WRITE);
> >              try {
> >                  conn.load(model);
> >                  conn.commit();
> >              } finally {
> >                  conn.end();
> >              }
> >
> > The code is actually worked but performance is not ideal enough.
> >
> > [2019-07-18 23:29:25] Fuseki     INFO  [46] POST 
> > http://192.168.204.244:2222/fuseki2?default
> > [2019-07-18 23:30:45] Fuseki     INFO  [15] Body: Content-Length=-1, 
> > Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : 
> > Count=3257309 Triples=3257309 Quads=0
> > [2019-07-18 23:31:12] Fuseki     INFO  [15] 200 OK (3,302.546 s)
> >
> > Every 3 millions triples cost 3,302.546 seconds and there are totally 300 
> > millions triples in queue…(One in-mem Model is impossible to contain so 
> > much triples…)
> >
> > Is there any better method to load them quicker?
> >
> > Thanks!
> >
> > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 
> > 10
> >
> >
>

Re: RE: About fuseki2 load performance by java API

Reply via email to