On 12/12/17 21:06, Laura Morales wrote:
2) from my tests, tdbloader2 starts by parsing triples rather quickly (130K
TPS) but then it quickly slows down*a lot* over time,
That's memory.
When the node table index exceeds RAM, updating slows down because disk
I/O happens on what used to be RAM access to check whether a node has
been seen before.
Creating the node table index may be amenable to the same approach as
index building, caveat details.
And I'm not convinced it's a problem of disk cache either, because I tried to
flush it several times
Does not help - it's a read work load.
(It is a memory mapped file)
> (1MB/s writes!!!)
Presumably because random-pattern writes are occurring as pages are
flushed. The entries are keyed by a large hash, hence have a random
pattern.
Andy