My team has a big knowledge graph that we want to server via a Sparql endpoint. We are looking into using Apache Fuseki for the same. I have some questions and was hoping someone here can guide me.
Right now, I'm working on a dataset which consists on 175 Million Triples which translates to around 250GB size of TDB2 table using the tdb2.tdbloader. The entire knowledge db is generated once a day and as per our rough count, approx 14 Million triples ( 1.6GB uncompressed) changes(including additions and deletion) everyday ~8%. What is the best way to update live fuseki dataset when you have to update such large number of triples ? We have tried doing something like this > curl -X POST -d @update.txt --header "Content-type: > application/sparql-update" -v http://localhost:9999/my/update > > Where update.txt file looks something like > DELETE DATA { > <sub1> <pred1> <obj1> . > <sub2> <pred2> <obj2> . > ... > }; > INSERT DATA { > <sub1> <pred1> <obj11> . > <sub2> <pred2> <obj22> . > .... > } It takes around 15-20 minutes on our beefy machine. I had some questions regarding this approach - Does making a curl request like this warps the entire call within a transaction? - Is there a size limit on how big a call I can make ? - My understanding is that the Fuseki server will have to download the full file on its side and then apply the changes? Is it correct ? Also, will it affect any ongoing read requests running in parallel? - Is there any other better way to update the db? Thanks for your help. Regards Amit
