On 07/12/2018 01:03, Lee, Seokju | Daniel | TPDD wrote:
Greetings,

I am using Apache Jena 3.7.0 now and encounter the following issue so I would 
like to know how to solve it.

Background:

   *   We created our own sparql endpoint for using apache jena.
   *   Sometimes we need to clear data store and restore from new ttl file.
   *   For performance, we are using ram disk for TDB instead of SSD with our 
own reason.
   *   We thought that we have big enough memory for TDB

Issue

   *   Our application was just down because of full of ram disk
   *   Back then, we repeated to restore from new ttl files
   *   The number of TTL is 1.5 millions and file size is around 250MB
   *   Ramdisk size is 4 GB (first time when we restore it, ram disk used under 
1 GB)

Have you considered using an in-memory Jena graph?

<#dataset> rdf:type ja:MemoryDataset;
   ja:data "data.trig";
.


Investigating

   *   I think nodes.dat is real data and looks like SPO.dat, POS.dat, OSP.dat 
looked didn't remove old data that I removed in my application.

Question

   *   Is there any way to keep same size between real data and TDB?

ja:MemoryDataset

   *   We are using for removal with "Model.removeAll()" and "TDB.sync()"

TDB.sync() is not necessary when using transactions. And not using transactions is not a good idea for a SPARQL endpoint. TDB.sync is legacy and for older single-threaded applications.

The in-memory dataset described above is fully transactional (Serialization level isolation), uses the heap for storage so only uses what is needed and deleted data gets garbage collected.

(TDB2 has a compaction operation but does mean there are times where there are two copies of the database.)

    Andy


Thanks
Daniel

Reply via email to