Hello Andy,

[tdbloader2 performance for 1B+ triples]

On Mon, Jul 30, 2012 at 05:06:55PM +0100, Andy Seaborne wrote:
> >>How big are the node* files (node2id.dat, .idn, nodes.dat) in the
> >>resulting database in this case?
> >
> >node2id.dat 9470738432 bytes
> 
> 9,470,738,432 => 9G
> 
> >node2id.idn 50331648 bytes
> 
> 50,331,648 => 50M
> 
> Much less than RAM size.
> 
> >nodes.dat 20182577027 bytes
> 
> This file is written sequentially and isn't read during loading so 
> should not be an issue.
> 
> In 64 bit mode, the B+Tree node2id is a memory mapped file and the OS 
> takes care of paging+caching the data.
> 
> I think that use of
> 
>   JVM_ARGS="-Xmx32768M -server"
> 
> is in fact making things worse: the heap grows to 32G, reducing the 
> space available to the OS for mmap files.  So it is squeezing out the OS 
> managed mmap files and the result is that there is little real RAM 
> devoted to caching the node table.
> 
> 2G heap should be enough IIRC (caveat long literals).

The -Xmx32768M is not there without reason. I've had out of memory errors with 
much higher values and earlier Jena versions. I tried JVM_ARGS="-Xmx2048M"
with tdbloader2 from apache-jena-2.7.3 and the error came after 55mio triples:

INFO  Add: 55,300,000 Data (Batch: 281 / Avg: 13,794)
INFO  Add: 55,350,000 Data (Batch: 227 / Avg: 13,088)
INFO  Add: 55,400,000 Data (Batch: 192 / Avg: 12,342)
INFO  Add: 55,450,000 Data (Batch: 134 / Avg: 11,406)
INFO  Add: 55,500,000 Data (Batch: 98 / Avg: 10,335)
INFO    Elapsed: 5,369.59 seconds [2012/08/09 17:45:44 CEST]
INFO  Add: 55,550,000 Data (Batch: 52 / Avg: 8,785)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2882)
        at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
        at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
        at java.lang.StringBuilder.append(StringBuilder.java:119)
        at com.hp.hpl.jena.tdb.lib.NodeLib.hash(NodeLib.java:160)
        at com.hp.hpl.jena.tdb.lib.NodeLib.setHash(NodeLib.java:116)
        at 
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.accessIndex(NodeTableNative.java:124)
        at 
com.hp.hpl.jena.tdb.nodetable.NodeTableNative._idForNode(NodeTableNative.java:117)
        at 
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getAllocateNodeId(NodeTableNative.java:83)
        at 
com.hp.hpl.jena.tdb.nodetable.NodeTableCache._idForNode(NodeTableCache.java:123)
        at 
com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getAllocateNodeId(NodeTableCache.java:83)
        at 
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getAllocateNodeId(NodeTableWrapper.java:43)
        at 
com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getAllocateNodeId(NodeTableInline.java:51)
        at 
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder$NodeTableBuilder.send(CmdNodeTableBuilder.java:223)
        at 
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder$NodeTableBuilder.send(CmdNodeTableBuilder.java:190)
        at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:71)
        at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)
        at org.openjena.riot.RiotLoader.readQuads(RiotLoader.java:206)
        at 
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:168)
        at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
        at 
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:79)

Any idea what a good value for -Xmx for 1B+ triples would be ?

I will try with 16384 now.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail [email protected]
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Reply via email to