Apache Jena tdbloader performance and limits

Wolfgang Fahl Tue, 19 May 2020 22:57:19 -0700

Dear Apache Jena users,

Some 2 years ago Laura Morlaes and Dick Murray had an exchange on this
list on how to influence the performance of
tdbloader. The issue is currently of interest for me again in the
context of trying to load some 15 billion triples from a
copy of wikidata. At
http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData i have
documented what i am trying to accomplish
and a few days ago I placed a question on stackoverflow
https://stackoverflow.com/questions/61813248/jena-tdbloader2-performance-and-limits
with the following three questions:


*What is proven to speed up the import without investing into extra
hardware?*

e.g. splitting the files, changing VM arguments, running multiple
processes ...

*What explains the decreasing speed at higher numbers of triples and how
can this be avoided?*

*What sucessful multi-billion triple imports for Jena do you know of and
what are the circumstances for these?*

There were some 50 fews on the question so far and some comments but
there is no real hint yet on what could improve things.

Especially the Java VM crashes that happened with different Java
environments on the Mac OSX machine are disappointing since event with a
slow speed the import would have been finished after a  while but with a
crash its a never ending story.

I am curious to learn what your experience and advice is.

Yours

  Wolfgang
**

**

-- 


Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web: http://www.bitplan.de

signature.asc
Description: OpenPGP digital signature

Apache Jena tdbloader performance and limits

Reply via email to