I tried exactly this using an amazon ec2 instance. Here is some advice that I previously received:
"By the way, loading is much faster on Amazon if you use one of the instances with a large SSD. In fact, running plan "tdbloader" and using an SSD machine maybe the best way. tdbloader does not have large intermediate files." Hopefully this helps :) On Wed, Mar 26, 2014 at 4:58 PM, Agustin Barto <[email protected]> wrote: > I changed the max heap size on the tdbloader script to 8g, but it > didn't improve. > > Agustin > > On Wed, Mar 26, 2014 at 5:33 PM, Olivier Rossel > <[email protected]> wrote: > > Buy a lot of RAM !! > > > > Envoyé de mon iPad > > > > Le 26 mars 2014 à 19:37, Agustin Barto <[email protected]> a écrit : > > > >> I'm trying to load a freebase dump into a tdb store and I noticed that > >> it is awfully slow. I tried using tdbloader2 which starts really fast > >> (at around 300k triples per second) but after a while the rate drops > >> to under 1000 triples per second. > >> > >> I tried splitting the dump into several files so I can load them in > >> chunks and the first file (of around 200M triples) is processed with > >> no problems, but when we try to process the next one using tdbloader, > >> it rarely goes over 500 triples per second. > >> > >> I suppose there might be an I/O bottleneck somewhere, but I was > >> wondering if anybody has suggestions on how to do this properly. > >> > >> Thanks in advance, > >> Agustin >
