Dear Apache Jena users, Some 2 years ago Laura Morlaes and Dick Murray had an exchange on this list on how to influence the performance of tdbloader. The issue is currently of interest for me again in the context of trying to load some 15 billion triples from a copy of wikidata. At http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData i have documented what i am trying to accomplish and a few days ago I placed a question on stackoverflow https://stackoverflow.com/questions/61813248/jena-tdbloader2-performance-and-limits with the following three questions:
*What is proven to speed up the import without investing into extra hardware?* e.g. splitting the files, changing VM arguments, running multiple processes ... *What explains the decreasing speed at higher numbers of triples and how can this be avoided?* *What sucessful multi-billion triple imports for Jena do you know of and what are the circumstances for these?* There were some 50 fews on the question so far and some comments but there is no real hint yet on what could improve things. Especially the Java VM crashes that happened with different Java environments on the Mac OSX machine are disappointing since event with a slow speed the import would have been finished after a while but with a crash its a never ending story. I am curious to learn what your experience and advice is. Yours Wolfgang ** ** -- Wolfgang Fahl Pater-Delp-Str. 1, D-47877 Willich Schiefbahn Tel. +49 2154 811-480, Fax +49 2154 811-481 Web: http://www.bitplan.de
signature.asc
Description: OpenPGP digital signature
