On 07/12/2018 01:03, Lee, Seokju | Daniel | TPDD wrote:
Greetings,
I am using Apache Jena 3.7.0 now and encounter the following issue so I would
like to know how to solve it.
Background:
* We created our own sparql endpoint for using apache jena.
* Sometimes we need to clear data store and restore from new ttl file.
* For performance, we are using ram disk for TDB instead of SSD with our
own reason.
* We thought that we have big enough memory for TDB
Issue
* Our application was just down because of full of ram disk
* Back then, we repeated to restore from new ttl files
* The number of TTL is 1.5 millions and file size is around 250MB
* Ramdisk size is 4 GB (first time when we restore it, ram disk used under
1 GB)
Have you considered using an in-memory Jena graph?
<#dataset> rdf:type ja:MemoryDataset;
ja:data "data.trig";
.
Investigating
* I think nodes.dat is real data and looks like SPO.dat, POS.dat, OSP.dat
looked didn't remove old data that I removed in my application.
Question
* Is there any way to keep same size between real data and TDB?
ja:MemoryDataset
* We are using for removal with "Model.removeAll()" and "TDB.sync()"
TDB.sync() is not necessary when using transactions. And not using
transactions is not a good idea for a SPARQL endpoint. TDB.sync is
legacy and for older single-threaded applications.
The in-memory dataset described above is fully transactional
(Serialization level isolation), uses the heap for storage so only uses
what is needed and deleted data gets garbage collected.
(TDB2 has a compaction operation but does mean there are times where
there are two copies of the database.)
Andy
Thanks
Daniel