My team has a big knowledge graph that we want to server via a Sparql
endpoint. We are looking into using Apache Fuseki for the same. I have some
questions and was hoping someone here can guide me.

Right now, I'm working on a dataset which consists on 175 Million Triples
which translates to  around 250GB size of TDB2 table using the
tdb2.tdbloader.

The entire knowledge db is generated once a day and as per our rough count,
approx 14 Million triples ( 1.6GB uncompressed) changes(including additions
and deletion) everyday ~8%.

What is the best way to update live fuseki dataset when you have to update
such large number of triples ?

We have tried doing something like this

> curl -X POST -d @update.txt --header "Content-type: 
> application/sparql-update" -v http://localhost:9999/my/update
>
>
Where update.txt file looks something like

> DELETE DATA {
> <sub1> <pred1> <obj1> .
> <sub2> <pred2> <obj2> .
> ...
> };
> INSERT DATA {
> <sub1> <pred1> <obj11> .
> <sub2> <pred2> <obj22> .
> ....
> }


It takes around 15-20 minutes on our beefy machine. I had some questions
regarding this approach

   - Does making a curl request like this warps the entire call within a
   transaction?
   - Is there a size limit on how big a call I can make ?
   - My understanding is that the Fuseki server will have to download the
   full file on its side and then apply the changes? Is it correct ? Also,
   will it affect any ongoing read requests running in parallel?
   - Is there any other better way  to update the db?

Thanks for your help.

Regards
Amit

Reply via email to