Hello,

I am currently working on a project that do a loading of a full freebase
dump into a triple store.

The whole freebase dump is around 2 billion triples at the moment (260 GB
uncompressed data).

We chose to investigate Apache Jena TDB as a first product for this.

I run Jena on a virtual machine with Linux Red Hat distribution and of 8
cores CPU, 64 GB RAM and 1.2 TB hard drive.

Which data loader would be recommended here: (are loaders: tdbloader3 and
tdbloader4 even of concern) - I have done my first test of loading 2,5% of
freebase to Jena with tdbloader2 and it took 3,48 hours, which is not very
promising even if the import time changes linearly.

Is there a way to make the import parallel (run a few instances of loader
at the same time against one Jena instace)?

Is there a way to tune the loader so that data load is faster (did not find
any information for that).

I do not understand the idea of Jena indexing; second phase of the load -
the one that is acctualy time consuming - is the index phase. Is this
indexing at all required for querying with Sparql or this is 'full text
search' type of indexing. I'm am wondering if I could maybe skip this phase
entirely if possible.

I am basically trying to think how I can make the import faster.

And the last question:

Would you recommend running import with compressed or uncompressed file and
an input file?

Regards,

Ewa

Reply via email to