RE: About fuseki2 load performance by java API

Scarlet Remilia Sun, 21 Jul 2019 07:30:47 -0700

Thank you very much.

I recreate a new workflow generating RDF files and try on tdb2.tdbloaders.




Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10



________________________________
From: Andy Seaborne <[email protected]>
Sent: Friday, July 19, 2019 1:41:34 AM
To: [email protected] <[email protected]>
Subject: Re: About fuseki2 load performance by java API



On 18/07/2019 13:08, Scarlet Remilia wrote:
> Thank you for reply!
>
>
>
> The server storage is HDD on local with RAID 10.
>
> CPU is 4x 14 cores with 28 threads but only one core is used during the load.
>
> The JVM of fuseki2 is tuned by adding -Xmx=50GB -Xms=50GB and TDB2 used is 
> also tuned by tuning cache size.
>
> I observed disk IO by iostat, but it seems not utilized much disk IO and also 
> it is observed that memory usage of fuseki2 is increasing after loading every 
> 3 millions triples.

If you mean by IO bandwidth, then yes, it will not be high because it
becomes random I/O and the effects as Laura describes happen.

Memory will increase because Java does not do a GC unless it needs to.

The tdb2.tdbloaders will do better then the Fuseki UI even with a disk,
but for larger datasets, SSD is preferred.

>
> Fuseki2 is setup as a standalone server by the command below:
>
>
>
> ./fuseki-server –tdb2 –loc=./tdb2dataset –port 2222  -update /fuseki2
>
>
>
> Thank you very much!
>
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>
>
>
> ________________________________
> From: Andy Seaborne <[email protected]>
> Sent: Thursday, July 18, 2019 6:41:56 PM
> To: [email protected]
> Subject: Re: About fuseki2 load performance by java API
>
> That's quite slow. I get maybe 50-70K triples for a 100m load via the
> Fuseki UI.
>
> The fastest way is to use the bulk loader directly to setup the
> database, then add it to Fuseki.
>
> The hardware of the server makes a big difference. What's the server
> setup? Disk/SSD? Local or remote storage?
>
>       Andy
>
> You don't need the begin/commit in the client - the transaction is in
> the backend server.
>
> On 18/07/2019 09:02, Scarlet Remilia wrote:
>> Hello everyone,
>> I want to load a hundred millions triple into TDB2-backend fuseki2 by Java 
>> API.
>> I used code below:
>>
>> Model model = ModelFactory.createDefaultModel();
>> model.add(model.asStatement(triple));
>> RDFConnectionRemoteBuilder builder = RDFConnectionFuseki.create()
>>                       .destination(FusekiURL);
>>               RDFConnection conn = builder.build();
>>               conn.begin(ReadWrite.WRITE);
>>               try {
>>                   conn.load(model);
>>                   conn.commit();
>>               } finally {
>>                   conn.end();
>>               }
>>
>> The code is actually worked but performance is not ideal enough.
>>
>> [2019-07-18 23:29:25] Fuseki     INFO  [46] POST 
>> http://192.168.204.244:2222/fuseki2?default
>> [2019-07-18 23:30:45] Fuseki     INFO  [15] Body: Content-Length=-1, 
>> Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : 
>> Count=3257309 Triples=3257309 Quads=0
>> [2019-07-18 23:31:12] Fuseki     INFO  [15] 200 OK (3,302.546 s)
>>
>> Every 3 millions triples cost 3,302.546 seconds and there are totally 300 
>> millions triples in queue…(One in-mem Model is impossible to contain so much 
>> triples…)
>>
>> Is there any better method to load them quicker?
>>
>> Thanks!
>>
>> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>>
>>
>

RE: About fuseki2 load performance by java API

Reply via email to