On 07/12/2018 01:03, Lee, Seokju | Daniel | TPDD wrote:
Greetings, I am using Apache Jena 3.7.0 now and encounter the following issue so I would like to know how to solve it. Background: * We created our own sparql endpoint for using apache jena. * Sometimes we need to clear data store and restore from new ttl file. * For performance, we are using ram disk for TDB instead of SSD with our own reason. * We thought that we have big enough memory for TDB Issue * Our application was just down because of full of ram disk * Back then, we repeated to restore from new ttl files * The number of TTL is 1.5 millions and file size is around 250MB * Ramdisk size is 4 GB (first time when we restore it, ram disk used under 1 GB)
Have you considered using an in-memory Jena graph? <#dataset> rdf:type ja:MemoryDataset; ja:data "data.trig"; .
Investigating * I think nodes.dat is real data and looks like SPO.dat, POS.dat, OSP.dat looked didn't remove old data that I removed in my application. Question * Is there any way to keep same size between real data and TDB?
ja:MemoryDataset
* We are using for removal with "Model.removeAll()" and "TDB.sync()"
TDB.sync() is not necessary when using transactions. And not using transactions is not a good idea for a SPARQL endpoint. TDB.sync is legacy and for older single-threaded applications.
The in-memory dataset described above is fully transactional (Serialization level isolation), uses the heap for storage so only uses what is needed and deleted data gets garbage collected.
(TDB2 has a compaction operation but does mean there are times where there are two copies of the database.)
Andy
Thanks Daniel