Re: About fuseki2 load performance by java API

ajs6f Thu, 18 Jul 2019 05:46:14 -0700

I want to emphasize what Andy said first: 

> The fastest way is to use the bulk loader directly to setup the database, 
> then add it to Fuseki.



This will be very much faster, as well as eliminating any questions of you 
needing to write efficient code. If you can find a workflow that does this, I 
suspect it might be the best immediate choice.

ajs6f

> On Jul 18, 2019, at 8:08 AM, Scarlet Remilia 
> <[email protected]> wrote:
> 
> Thank you for reply!
> 
> 
> 
> The server storage is HDD on local with RAID 10.
> 
> CPU is 4x 14 cores with 28 threads but only one core is used during the load.
> 
> The JVM of fuseki2 is tuned by adding -Xmx=50GB -Xms=50GB and TDB2 used is 
> also tuned by tuning cache size.
> 
> I observed disk IO by iostat, but it seems not utilized much disk IO and also 
> it is observed that memory usage of fuseki2 is increasing after loading every 
> 3 millions triples.
> 
> Fuseki2 is setup as a standalone server by the command below:
> 
> 
> 
> ./fuseki-server –tdb2 –loc=./tdb2dataset –port 2222  -update /fuseki2
> 
> 
> 
> Thank you very much!
> 
> 
> 
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
> 
> 
> 
> ________________________________
> From: Andy Seaborne <[email protected]>
> Sent: Thursday, July 18, 2019 6:41:56 PM
> To: [email protected]
> Subject: Re: About fuseki2 load performance by java API
> 
> That's quite slow. I get maybe 50-70K triples for a 100m load via the
> Fuseki UI.
> 
> The fastest way is to use the bulk loader directly to setup the
> database, then add it to Fuseki.
> 
> The hardware of the server makes a big difference. What's the server
> setup? Disk/SSD? Local or remote storage?
> 
>     Andy
> 
> You don't need the begin/commit in the client - the transaction is in
> the backend server.
> 
> On 18/07/2019 09:02, Scarlet Remilia wrote:
>> Hello everyone,
>> I want to load a hundred millions triple into TDB2-backend fuseki2 by Java 
>> API.
>> I used code below:
>> 
>> Model model = ModelFactory.createDefaultModel();
>> model.add(model.asStatement(triple));
>> RDFConnectionRemoteBuilder builder = RDFConnectionFuseki.create()
>>                     .destination(FusekiURL);
>>             RDFConnection conn = builder.build();
>>             conn.begin(ReadWrite.WRITE);
>>             try {
>>                 conn.load(model);
>>                 conn.commit();
>>             } finally {
>>                 conn.end();
>>             }
>> 
>> The code is actually worked but performance is not ideal enough.
>> 
>> [2019-07-18 23:29:25] Fuseki     INFO  [46] POST 
>> http://192.168.204.244:2222/fuseki2?default
>> [2019-07-18 23:30:45] Fuseki     INFO  [15] Body: Content-Length=-1, 
>> Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : 
>> Count=3257309 Triples=3257309 Quads=0
>> [2019-07-18 23:31:12] Fuseki     INFO  [15] 200 OK (3,302.546 s)
>> 
>> Every 3 millions triples cost 3,302.546 seconds and there are totally 300 
>> millions triples in queue…(One in-mem Model is impossible to contain so much 
>> triples…)
>> 
>> Is there any better method to load them quicker?
>> 
>> Thanks!
>> 
>> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>> 
>>

Re: About fuseki2 load performance by java API

Reply via email to