Re: Slow bulk loading of freebase dump using tdbloader

Martino Buffolino Wed, 26 Mar 2014 14:16:38 -0700

I tried exactly this using an amazon ec2 instance. Here is some advice that
I previously received:


"By the way, loading is much faster on Amazon if you use one of the
instances with a large SSD.

In fact, running plan "tdbloader" and using an SSD machine maybe the best
way.  tdbloader does not have large intermediate files."

Hopefully this helps :)



On Wed, Mar 26, 2014 at 4:58 PM, Agustin Barto <[email protected]> wrote:

> I changed the max heap size on the tdbloader script to 8g, but it
> didn't improve.
>
> Agustin
>
> On Wed, Mar 26, 2014 at 5:33 PM, Olivier Rossel
> <[email protected]> wrote:
> > Buy a lot of RAM !!
> >
> > Envoyé de mon iPad
> >
> > Le 26 mars 2014 à 19:37, Agustin Barto <[email protected]> a écrit :
> >
> >> I'm trying to load a freebase dump into a tdb store and I noticed that
> >> it is awfully slow. I tried using tdbloader2 which starts really fast
> >> (at around 300k triples per second) but after a while the rate drops
> >> to under 1000 triples per second.
> >>
> >> I tried splitting the dump into several files so I can load them in
> >> chunks and the first file (of around 200M triples) is processed with
> >> no problems, but when we try to process the next one using tdbloader,
> >> it rarely goes over 500 triples per second.
> >>
> >> I suppose there might be an I/O bottleneck somewhere, but I was
> >> wondering if anybody has suggestions on how to do this properly.
> >>
> >> Thanks in advance,
> >> Agustin
>

Re: Slow bulk loading of freebase dump using tdbloader

Reply via email to