Since it's N-triples and so one triple per line, why not use unix utilities (eg 
'split') to divide it into lots of smaller chunks and do a series of tdbloader 
uploads.  Should be fairly straightforward to script in bash or other scripting 
language of your choice.  That should have a lower memory requirement and so 
avoid the massive slowdown.  Or am I missing something?

Bill

On 26 Feb 2013, at 18:23, Aaron Coburn <[email protected]> wrote:

> I recently had a need to load ~225M triples into a TDB triplestore, and when 
> allocating only ~12G to the triple loader, I experienced the very same 
> slowdowns you described. As an alternative, I just reserved an on-demand, 
> high memory (i.e. ~60GB) instance in the public cloud, and the processing 
> completed in only a few hours. I then just moved the files onto my local 
> server and proceeded from there.
> 
> Aaron Coburn
> 
> 
> On Feb 25, 2013, at 1:25 PM, Andy Seaborne <[email protected]> wrote:
> 
>> On 25/02/13 20:07, Joshua Greben wrote:
>>> Hello All,
>>> 
>>> I am new to this list and to Jena and was wondering if anyone could
>>> offer advice for loading a large triplestore.
>>> 
>>> I am trying to load 670M Ntriples into a store using tdbloader on a
>>> single machine with 64-bit hardware and 8GB of memory. However, I am
>>> running into a massive slowdown. When the load starts the tdbloader
>>> is processing around 30K tps but by the time it has loaded 130M
>>> triples it can essentially no longer load any more and slows down to
>>> 2300 tps. At that point I have to kill the process because it will
>>> basically never finish.
>>> 
>>> Is 8GB of memory enough or is there a more efficient way to load this
>>> data? I am trying to load the data into a single DB location. Should
>>> I be splitting up the triples and loading them into different DBs?
>>> 
>>> Advice from anyone who has experience successfully loading a large
>>> triplestore is much appreciated.
>> 
>> Only 8G is pushing it somewhat for 670M triples.  It will finish; it will 
>> take a very long time.  Faster loads have been reported by using a larger 
>> machine (e.g. Freebase in 8 hours on a IBM Power7 and 48G RAM).
>> 
>> tdbloader2 (Linux only) may get you there a bit quicker but really you need 
>> a bigger machine.
>> 
>> Once built, you can copy the dataset as files to other machines.
>> 
>>      Andy
>> 
>>> 
>>> Thanks!
>>> 
>>> - Josh
>>> 
>>> 
>>> 
>>> Joshua Greben Library Systems Programmer & Analyst Stanford
>>> University Libraries (650) 714-1937 [email protected]
>>> 
>>> 
>>> 
>> 
> 

Reply via email to