On 26/03/14 21:15, Martino Buffolino wrote:
I tried exactly this using an amazon ec2 instance. Here is some advice that
I previously received:
"By the way, loading is much faster on Amazon if you use one of the
instances with a large SSD.
In fact, running plan "tdbloader" and using an SSD machine maybe the best
way. tdbloader does not have large intermediate files."
Hopefully this helps :)
On Wed, Mar 26, 2014 at 4:58 PM, Agustin Barto <[email protected]> wrote:
I changed the max heap size on the tdbloader script to 8g, but it
didn't improve.
Heap size is not relevant - the file cache is out of heap in memory
mapped files.
Agustin
On Wed, Mar 26, 2014 at 5:33 PM, Olivier Rossel
<[email protected]> wrote:
Buy a lot of RAM !!
Envoyé de mon iPad
Le 26 mars 2014 à 19:37, Agustin Barto <[email protected]> a écrit :
I'm trying to load a freebase dump into a tdb store and I noticed that
it is awfully slow. I tried using tdbloader2 which starts really fast
(at around 300k triples per second) but after a while the rate drops
to under 1000 triples per second.
I tried splitting the dump into several files so I can load them in
chunks and the first file (of around 200M triples) is processed with
no problems, but when we try to process the next one using tdbloader,
it rarely goes over 500 triples per second.
Splitting will not help and will make it worse.
The bulk loading assumes the DB is empty - if it is found not to be, it
does not currently do anything smart.
I suppose there might be an I/O bottleneck somewhere, but I was
wondering if anybody has suggestions on how to do this properly.
Thanks in advance,
Agustin