Re: Slow bulk loading of freebase dump using tdbloader

Andy Seaborne Wed, 26 Mar 2014 14:28:34 -0700

On 26/03/14 21:15, Martino Buffolino wrote:

I tried exactly this using an amazon ec2 instance. Here is some advice that
I previously received:


"By the way, loading is much faster on Amazon if you use one of the
instances with a large SSD.

In fact, running plan "tdbloader" and using an SSD machine maybe the best
way.  tdbloader does not have large intermediate files."

Hopefully this helps :)



On Wed, Mar 26, 2014 at 4:58 PM, Agustin Barto <[email protected]> wrote:

I changed the max heap size on the tdbloader script to 8g, but it
didn't improve.

Heap size is not relevant - the file cache is out of heap in memorymapped files.


Agustin

On Wed, Mar 26, 2014 at 5:33 PM, Olivier Rossel
<[email protected]> wrote:

Buy a lot of RAM !!

Envoyé de mon iPad

Le 26 mars 2014 à 19:37, Agustin Barto <[email protected]> a écrit :

I'm trying to load a freebase dump into a tdb store and I noticed that
it is awfully slow. I tried using tdbloader2 which starts really fast
(at around 300k triples per second) but after a while the rate drops
to under 1000 triples per second.

I tried splitting the dump into several files so I can load them in
chunks and the first file (of around 200M triples) is processed with
no problems, but when we try to process the next one using tdbloader,
it rarely goes over 500 triples per second.


Splitting will not help and will make it worse.

The bulk loading assumes the DB is empty - if it is found not to be, itdoes not currently do anything smart.


I suppose there might be an I/O bottleneck somewhere, but I was
wondering if anybody has suggestions on how to do this properly.

Thanks in advance,
Agustin

Re: Slow bulk loading of freebase dump using tdbloader

Reply via email to