Hi Andy, Thanks for the reply.
>The in-memory dataset described above is fully transactional Interesting that I didn't know that it is different from using TDB even I used to use only for test purpose because I thought it is the same as TDB. I have another question that is how you can keep this persistent because in-memory means if application is crash with whatever the reason is, it would be losed. Am I right? (Just for your understanding, to use ramdisk is just for dev and stg environment for functional test not production. For prodecution we will use SSD) Thanks Daniel -----Original Message----- From: Andy Seaborne <[email protected]> Sent: Friday, December 7, 2018 8:09 PM To: [email protected] Subject: Re: Is there any way to keep same size between real data and TDB On 07/12/2018 01:03, Lee, Seokju | Daniel | TPDD wrote: > Greetings, > > I am using Apache Jena 3.7.0 now and encounter the following issue so I would > like to know how to solve it. > > Background: > > * We created our own sparql endpoint for using apache jena. > * Sometimes we need to clear data store and restore from new ttl file. > * For performance, we are using ram disk for TDB instead of SSD with our > own reason. > * We thought that we have big enough memory for TDB > > Issue > > * Our application was just down because of full of ram disk > * Back then, we repeated to restore from new ttl files > * The number of TTL is 1.5 millions and file size is around 250MB > * Ramdisk size is 4 GB (first time when we restore it, ram disk used > under 1 GB) Have you considered using an in-memory Jena graph? <#dataset> rdf:type ja:MemoryDataset; ja:data "data.trig"; . > > Investigating > > * I think nodes.dat is real data and looks like SPO.dat, POS.dat, > OSP.dat looked didn't remove old data that I removed in my application. > > Question > > * Is there any way to keep same size between real data and TDB? ja:MemoryDataset > * We are using for removal with "Model.removeAll()" and "TDB.sync()" TDB.sync() is not necessary when using transactions. And not using transactions is not a good idea for a SPARQL endpoint. TDB.sync is legacy and for older single-threaded applications. The in-memory dataset described above is fully transactional (Serialization level isolation), uses the heap for storage so only uses what is needed and deleted data gets garbage collected. (TDB2 has a compaction operation but does mean there are times where there are two copies of the database.) Andy > > Thanks > Daniel >
