Re: tdbdump Exception

Andy Seaborne Sun, 29 Jul 2012 09:23:31 -0700

On 24/07/12 12:24, Michael Brunnbauer wrote:


Hello Andy,

On Tue, Jul 24, 2012 at 01:13:59PM +0200, Michael Brunnbauer wrote:

BTW: Here is some output from tdbloader2 for this TDB which shows that
the tdbloader2 data phase runtime gets quite non-linear for very big datasets.
I called tdbloader2 with JVM_ARGS="-Xmx32768M -server" and it did not seem to
run into memory problems.


I should be more specific here: Whenever I watched it after 10^9 quads it was
doing disk IO (i think mostly writes, probably to node2id.dat and nodes.dat).
Would it be possible to generate node2id.dat and nodes.dat without random
access ?


(see also tdbloader4)

Yes - it looks like the node file, part of which is a B+Tree of hash(128 bits) to NodeId. This is used to see if the node has already beenencountered. There is a cache - maybe this needs greatly increasing insize or a more explicit in-memory structure fronting the node table forbulk loading. At query time, this isn't such an important lookup.

How big are the node* files (node2id.dat, .idn, nodes.dat) in theresulting database in this case?


        Andy

Regards,

Michael Brunnbauer

Re: tdbdump Exception

Reply via email to