Re: Memory errors when using tdbloader2

Daniel Hernández Fri, 24 Apr 2015 09:08:43 -0700

El 2015-04-23 18:12, Andy Seaborne escribió:

Hi there,


It's hard to eb sure - what does the load log file say before the
exception occurs?


It was loading data when the error occurs. I tried again with
export JVM_ARGS=-Xmx10000M before the load execution and I got the
error:

NFO  Add: 289,750,000 Data (Batch: 120,481 / Avg: 67,007)
INFO  Add: 289,800,000 Data (Batch: 117,647 / Avg: 67,012)
INFO  Add: 289,850,000 Data (Batch: 155,279 / Avg: 67,018)
INFO  Add: 289,900,000 Data (Batch: 151,515 / Avg: 67,025)
INFO  Add: 289,950,000 Data (Batch: 156,250 / Avg: 67,031)
INFO  Add: 290,000,000 Data (Batch: 155,279 / Avg: 67,038)
INFO    Elapsed: 4,325.89 seconds [2015/04/24 12:23:55 UTC]
INFO  Add: 290,050,000 Data (Batch: 162,866 / Avg: 67,045)
INFO  Add: 290,100,000 Data (Batch: 50,968 / Avg: 67,041)
INFO  Add: 290,150,000 Data (Batch: 160,771 / Avg: 67,048)

Exception in thread "main" java.lang.OutOfMemoryError: GC overheadlimit exceeded

        at java.util.LinkedHashMap.createEntry(LinkedHashMap.java:442)
        at java.util.HashMap.addEntry(HashMap.java:884)
        at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
        at java.util.HashMap.put(HashMap.java:505)
        at org.apache.jena.atlas.lib.cache.CacheLRU.put(CacheLRU.java:59)

atcom.hp.hpl.jena.tdb.store.nodetable.NodeTableCache.cacheUpdate(NodeTableCache.java:200)atcom.hp.hpl.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:127)atcom.hp.hpl.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:85)atcom.hp.hpl.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:55)atcom.hp.hpl.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)atcom.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.convert(StatsCollectorNodeId.java:51)atcom.hp.hpl.jena.tdb.solver.stats.StatsCollectorBase.results(StatsCollectorBase.java:54)atcom.hp.hpl.jena.tdb.solver.stats.StatsCollectorNodeId.results(StatsCollectorNodeId.java:30)atcom.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:172)

        at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)

atcom.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80

On 23/04/15 20:53, Daniel Hernández wrote:

Hello,

I'm trying to load two files into a tdb with the command below:

bin/tdbloader2 --loc=tdb-03 d3.nt dc.nt


Do these files have a lot of literals? A lot of large literals?


I think that there is not problem with the literals, because I have
loaded the same data with another schema and without problems. I guess
that the problem could be having much different predicates. The first
file have 50 millions of different predicates.

I have incremented the memory used by java setting the line above in
the bin/tbloader2worker file.

JVM_ARGS=${JVM_ARGS:--Xmx20000M}


JVM_ARGS is set further out in tdbloader2 as well and so this change
has no effect (JVM_ARGS is set so ${:-} returns the existing value).
it's merely a fall back at that point.

The right idiom is to set in the shell environment calling tdbloader2

e.g.

export JVM_ARGS=-Xmx5000M
tdbloader2 ...

or
env JVM_ARGS=-Xmx5000M tdbloader2 ...

Don't set it too large. Much of the bulk space is no in the javaheap.

I used 10GB for the heap the last time, so there are 20GB extra to beused.

However, I got the error above.

Thanks,
Daniel

Re: Memory errors when using tdbloader2

Reply via email to