Hi there,
It's hard to eb sure - what does the load log file say before the
exception occurs?
On 23/04/15 20:53, Daniel Hernández wrote:
Hello,
I'm trying to load two files into a tdb with the command below:
bin/tdbloader2 --loc=tdb-03 d3.nt dc.nt
Do these files have a lot of literals? A lot of large literals?
More below:
The files d3.nt and dc.nt have 114,176,368 and 175,984,917 triples,
respectively. The server where I'm running the command have 32GB of
RAM and enough disk space. I'm using Jena 2.13.0 and the java version
that comes with Debian:
java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~deb7u1)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
However I got the error when processing the triples:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded
at java.util.LinkedHashMap.createEntry(LinkedHashMap.java:442)
at java.util.HashMap.addEntry(HashMap.java:884)
at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
at java.util.HashMap.put(HashMap.java:505)
at org.apache.jena.atlas.lib.cache.CacheLRU.put(CacheLRU.java:59)
at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableCache.cacheUpdate(NodeTableCache.java:200)
at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:127)
at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:85)
at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:55)
at
com.hp.hpl.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.convert(StatsCollectorNodeId.java:51)
at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorBase.results(StatsCollectorBase.java:54)
at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.results(StatsCollectorNodeId.java:30)
at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:172)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80)
I have incremented the memory used by java setting the line above in
the bin/tbloader2worker file.
JVM_ARGS=${JVM_ARGS:--Xmx20000M}
JVM_ARGS is set further out in tdbloader2 as well and so this change has
no effect (JVM_ARGS is set so ${:-} returns the existing value). it's
merely a fall back at that point.
The right idiom is to set in the shell environment calling tdbloader2
e.g.
export JVM_ARGS=-Xmx5000M
tdbloader2 ...
or
env JVM_ARGS=-Xmx5000M tdbloader2 ...
Don't set it too large. Much of the bulk space is no in the java heap.
Andy
After that I run the tdbloader2 again and I got the following error
message:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:580)
at java.util.HashMap.addEntry(HashMap.java:879)
at java.util.HashMap.put(HashMap.java:505)
at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.convert(StatsCollectorNodeId.java:52)
at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorBase.results(StatsCollectorBase.java:54)
at
com.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.results(StatsCollectorNodeId.java:30)
at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:172)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80)
I'm do not have much experience with the java management of memory. I
guess that there is a configuration that would be better when working
with the Jena tdbloader in this scenario. Is there?
Thanks in advance!
Daniel