Re: Very slow tdbloader2 insertion

Andy Seaborne Tue, 18 Apr 2017 01:12:46 -0700


On 17/04/17 23:07, Laura Morales wrote:

tdbloader2 builds b+trees from bottom to top, given sorted input. As
such blocks are streamed to disk which is disk-efficient.

It is a series of java programs scripted together by a shell script.

tdbloader is pure java. It builds the b+trees by inserting, which for
some idndxes is not optimal because it causes random inserts leading to
random I/O, which is bad for disk performance.

Andy



But why is tdbloader better for smaller datasets, whereas tdbloader2 is better for very 
large dataset ("100M+ triples")? Wouldn't the approach of tdbloader2 be 
superior in all cases?


Try them both and see!

tdbloader2 has high overhead.

On small datasets (less than 100m), an index fits in the OS disk cacheso tdbloader I/O is effectively "in-memory" and the randomness is not aproblem. When it spills, it slows down quite markedly.

tdbloader2 is a slower algorithm but does not produce this "fall-offeffect" on index writing.


    Andy

Re: Very slow tdbloader2 insertion

Reply via email to