Re: optimization of JenaDB

Andy Seaborne Tue, 18 Dec 2018 02:37:38 -0800


On 17/12/2018 09:11, Jovanovska Sashka wrote:

Dear all,
We are a group of developers from Macedonia, who are currently workingwith JenaDB. We are trying to implement Jena in our project and we dida lot of research and decided that this may be the best approach forus. The requirement for choosing of suitable database was defined tostore data about objects with their classes and attributes. Thedatabase needs to be chosen in that way so it has the highest(fastest) performance concerning RDF data, so objects can be retrievedin shortest time possible. The analysis for choosing database haveshown that JENA TDB is the right approach for storing RDF data model,which as triple store technology is shown as best option for RDFstructures (subject, predicate and object). It is concluded that theJENA TDB database is capable to persist all objects according toprofile defined classes.
1. We have three environments. Their configurations are:

1.1 CPU 4 cores @ 2.2 GHz

1.2 16Gb - RAM

1.3 100Gb - Disk

2.1 CPU 16 cores @ 2.0 GHz

2.2 96Gb - RAM

2.3 500Gb - Disk

3.1 CPU 4 cores @ 3.2 GHz

3.2 16Gb - RAM

3.3 320Gb - SSD Disk

SSD is better especially when adding in multiple transactions.

Which setup was used to produce the figures?


We are using Jena version 3.6

The current Jena release is 3.9.0.

Are you using Fuseki or running TDB in your application?

2. We are working on system that will contain data for objects withunique object ID - mRID (Master Resource ID). The main goal of thesystem is exchange of those objects between more systems. Importing ofmodel from file goes from model.write and with that is creating adataset. Also we are creating single objects with insert. Attached areexamples of our model, together with its namespaces.
3. Our objectives on terms of speed are for create and get to be lessthen 30ms, and fully load database with 100M objects (~500M triplets).Also we need to make export of the database, with correct tag namesfrom namespace of the objects (currently we receive all of them withrdf:Description tag)

rdf:Description is not RDF data - it is part of the RDFXML format. Youcan write out the database in RDF/XML (use RDFFormat.RDFXML_PLAIN, notthe default "pretty" format) if you want but other formats are moreefficient and more readable and are the same RDF data.

4. Currently we are getting numbers like 200ms or more for creating of1 object and we think that we have reached the load limit of thedatabase which is currently full with 80 million objects.

An object is on average 5 triples.

How are you using dataset transactions in your application? (Are youusing them?)

It will be better to add objects in batches - many objects in a singletransaction. At the end of a transaction, there is important overheadsuch as writing the journal safely to disk, and then at some pointupdating the main database. The default (TDB run in application) is somebuffering of 10 transactions but adding 50 triple units will still havehigh overhead, especially with rotating disk.

The bulk loader is faster. For TDB1 it only works on an empty database;for TDB2 it will update in bulk an existing database.

One way is to write all the triples to a file, and load the file once,with the bulkloader, for the majority of your data.


    Andy

But in the process of developing we faced some problems. Our goal wasto fill the database with 100M objects in which we encounter a lot ofproblems. Nothing is working as it should, and the importing processis getting slower and slower as the database grows bigger, even thoughit still dose not reach the highest limit.
Would you be able to answer few questions of ours?

 1. Is it possible somehow to optimize the application (and database
    also) in order to work faster and more reliable?
 2. What is the correct way for using namespace prefixes in order to
    export data in correct way?
 3. Will it be possible for us to get your help in form of a workshop
    or training?

I would like to thank you in advance for your help.

Kind Regards,

Sashka

Re: optimization of JenaDB

Reply via email to