On 19/07/2019 08:09, Laura Morales wrote:
tdb2.tdbloader --loader=parallel

but it still becomes random IO (moves disk heads)

I haven't tried it extensively on an HDD - I'd be interested in hearing
what happens.


oh nice! I completely missed it. I've tried it with a 67GB .nt file from LinkedGeoData on 
the same 750GB HDD but the end result does not seem very different. It's difficult to 
compare exactly with when I tried to load wikidata, because I don't see any progress 
being reported here. I mean I don't see any "X triples loaded (Y per second)" 
kind of message. Anyway it starts at full speed, boiling CPU, HDD cooking up my wrist 
from beneath the plastic case, and fans almost generating enough lift to take off. Then 
it gradually slows down. I stopped it after 1 hour. At this point I was seeing less than 
10% CPU usage, 90% iowait, TDB2 files size ~15GB.


The proper solution is either to do caching+write ordering


What does this mean in practice? Can I change my input data (eg. sorting 
triples) so that tdb2.tdbloader can overcome the bottleneck with HDDs?

No, it is not to do with the data - what's needed is internal changes, which is something tdbloader2 (for TDB1) tends to do better on. It doesn't do a massive amount of random pattern I/O (as much reading as writing). Random I/O ends up bad for HDDs - the physical head has to move too much.

    Andy

Reply via email to