On 12/12/17 21:06, Laura Morales wrote:
2) from my tests, tdbloader2 starts by parsing triples rather quickly (130K 
TPS) but then it quickly slows down*a lot*  over time,

That's memory.

When the node table index exceeds RAM, updating slows down because disk I/O happens on what used to be RAM access to check whether a node has been seen before.

Creating the node table index may be amenable to the same approach as index building, caveat details.

And I'm not convinced it's a problem of disk cache either, because I tried to 
flush it several times

Does not help - it's a read work load.

(It is a memory mapped file)

> (1MB/s writes!!!)

Presumably because random-pattern writes are occurring as pages are flushed. The entries are keyed by a large hash, hence have a random pattern.

    Andy

Reply via email to