On 19/07/2019 08:09, Laura Morales wrote:
tdb2.tdbloader --loader=parallel
but it still becomes random IO (moves disk heads)
I haven't tried it extensively on an HDD - I'd be interested in hearing
what happens.
oh nice! I completely missed it. I've tried it with a 67GB .nt file from LinkedGeoData on
the same 750GB HDD but the end result does not seem very different. It's difficult to
compare exactly with when I tried to load wikidata, because I don't see any progress
being reported here. I mean I don't see any "X triples loaded (Y per second)"
kind of message. Anyway it starts at full speed, boiling CPU, HDD cooking up my wrist
from beneath the plastic case, and fans almost generating enough lift to take off. Then
it gradually slows down. I stopped it after 1 hour. At this point I was seeing less than
10% CPU usage, 90% iowait, TDB2 files size ~15GB.
The proper solution is either to do caching+write ordering
What does this mean in practice? Can I change my input data (eg. sorting
triples) so that tdb2.tdbloader can overcome the bottleneck with HDDs?
No, it is not to do with the data - what's needed is internal changes,
which is something tdbloader2 (for TDB1) tends to do better on. It
doesn't do a massive amount of random pattern I/O (as much reading as
writing). Random I/O ends up bad for HDDs - the physical head has to
move too much.
Andy