> tdb2.tdbloader --loader=parallel
>
> but it still becomes random IO (moves disk heads)
>
> I haven't tried it extensively on an HDD - I'd be interested in hearing
> what happens.


oh nice! I completely missed it. I've tried it with a 67GB .nt file from 
LinkedGeoData on the same 750GB HDD but the end result does not seem very 
different. It's difficult to compare exactly with when I tried to load 
wikidata, because I don't see any progress being reported here. I mean I don't 
see any "X triples loaded (Y per second)" kind of message. Anyway it starts at 
full speed, boiling CPU, HDD cooking up my wrist from beneath the plastic case, 
and fans almost generating enough lift to take off. Then it gradually slows 
down. I stopped it after 1 hour. At this point I was seeing less than 10% CPU 
usage, 90% iowait, TDB2 files size ~15GB.


> The proper solution is either to do caching+write ordering


What does this mean in practice? Can I change my input data (eg. sorting 
triples) so that tdb2.tdbloader can overcome the bottleneck with HDDs?

Reply via email to