Re: Memory errors when using tdbloader2

Andy Seaborne Thu, 23 Apr 2015 14:15:12 -0700

Hi there,

It's hard to eb sure - what does the load log file say before theexception occurs?


On 23/04/15 20:53, Daniel Hernández wrote:

Hello,

I'm trying to load two files into a tdb with the command below:

bin/tdbloader2 --loc=tdb-03 d3.nt dc.nt


Do these files have a lot of literals? A lot of large literals?

More below:

The files d3.nt and dc.nt have 114,176,368 and 175,984,917 triples,
respectively. The server where I'm running the command have 32GB of
RAM and enough disk space. I'm using Jena 2.13.0 and the java version
that comes with Debian:

java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~deb7u1)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)

However I got the error when processing the triples:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded
         at java.util.LinkedHashMap.createEntry(LinkedHashMap.java:442)
         at java.util.HashMap.addEntry(HashMap.java:884)
         at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
         at java.util.HashMap.put(HashMap.java:505)
         at org.apache.jena.atlas.lib.cache.CacheLRU.put(CacheLRU.java:59)
         at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableCache.cacheUpdate(NodeTableCache.java:200)

         at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:127)

         at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:85)

         at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:55)

         at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)

         at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.convert(StatsCollectorNodeId.java:51)

         at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorBase.results(StatsCollectorBase.java:54)

         at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.results(StatsCollectorNodeId.java:30)

         at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:172)

         at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
         at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
         at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
         at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80)


I have incremented the memory used by java setting the line above in
the bin/tbloader2worker file.

JVM_ARGS=${JVM_ARGS:--Xmx20000M}

JVM_ARGS is set further out in tdbloader2 as well and so this change hasno effect (JVM_ARGS is set so ${:-} returns the existing value). it'smerely a fall back at that point.


The right idiom is to set in the shell environment calling tdbloader2

e.g.

export JVM_ARGS=-Xmx5000M
tdbloader2 ...

or
env JVM_ARGS=-Xmx5000M tdbloader2 ...

Don't set it too large.  Much of the bulk space is no in the java heap.

        Andy


After that I run the tdbloader2 again and I got the following error
message:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
         at java.util.HashMap.resize(HashMap.java:580)
         at java.util.HashMap.addEntry(HashMap.java:879)
         at java.util.HashMap.put(HashMap.java:505)
         at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.convert(StatsCollectorNodeId.java:52)

         at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorBase.results(StatsCollectorBase.java:54)

         at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.results(StatsCollectorNodeId.java:30)

         at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:172)

         at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
         at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
         at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
         at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80)


I'm do not have much experience with the java management of memory. I
guess that there is a configuration that would be better when working
with the Jena tdbloader in this scenario. Is there?

Thanks in advance!
Daniel

Re: Memory errors when using tdbloader2

Reply via email to