at
com.hp.hpl.jena.sparql.resultset.TextOutput.format(TextOutput.java:65)
at
com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:135)
at
com.hp.hpl.jena.sparql.util.QueryExecUtils.outputResultSet(QueryExecUtils.java:157)
at
com.hp.hpl.jena.sparql.util.QueryExecUtils.doSelectQuery(QueryExecUtils.java:199)
at
com.hp.hpl.jena.sparql.util.QueryExecUtils.executeQuery(QueryExecUtils.java:75)
Looks like you are trying to output as formatted text.
For text format aligns column widths so it needs to scan the entire
result set to find column widths, then go back and actually write stuff.
It takes a copy of the whole results to do that.
You can use a streaming format like JSON, TSV, CSV (the last two can be
thought of as unformatted text).
Andy
On 28/03/14 15:10, Timothy Lebo wrote:
Thanks, David.
Bumping it from 1 GB to 4 GB handled it to produce:
38 MB of gzipped dbpedia URLs,
8 MB of gzipped freebase URLs, and
7 MB of gzipped reference.data.gov.uk URLs.
(the only three “big” domains)
I’ll put the streaming question on hold until I run out of memory :-)
Regards,
Tim
On Mar 28, 2014, at 10:44 AM, David Jordan <[email protected]> wrote:
The first question to answer is how much memory have you allocated in the Java
heap. You can control this. The default JVM heap size will very likely be too
small.
-----Original Message-----
From: Timothy Lebo [mailto:[email protected]]
Sent: Friday, March 28, 2014 10:41 AM
To: [email protected]
Subject: OutOfMemoryError with tdbquery
Jena,
I have a TDB with 4.2 billion triples that I created with tdbloader.
It's taken from the 2012 Billion Triples Challenge.
I assert three triples for each URL they retrieved ("context"), e.g. for the
URL http://www.hyphen.info/rdf/30.xml:
<http://www.hyphen.info/rdf/30.xml>
<http://purl.org/twc/vocab/between-the-edges/root> <http://www.hyphen.info> .
<http://www.hyphen.info> <http://purl.org/twc/vocab/between-the-edges/pld>
<http://hyphen.info> .
<http://hyphen.info> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/twc/vocab/between-the-edges/PayLevelDomain> .
When I submit the following query with tdbquery:
select ?url where{?url <http://purl.org/twc/vocab/between-the-edges/root>
<http://dbpedia.org>.}
The following Exception is thrown.
I'm assuming that Jena is trying to build up all of the results before
reporting them.
Is there a way to just get "the stream" to avoid the memory issue?
Thanks,
Tim Lebo
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at
com.hp.hpl.jena.tdb.base.record.RecordFactory.create(RecordFactory.java:87)
at
com.hp.hpl.jena.tdb.base.record.RecordFactory.buildFrom(RecordFactory.java:122)
at
com.hp.hpl.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:107)
at
com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:53)
at
com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:130)
at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
at
com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:119)
at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
at org.openjena.atlas.iterator.Iter$3.hasNext(Iter.java:181)
at org.openjena.atlas.iterator.Iter.hasNext(Iter.java:825)
at
org.openjena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:58)
at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:72)
at
com.hp.hpl.jena.sparql.resultset.ResultSetMem.<init>(ResultSetMem.java:95)
at
com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:147)
at
com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:130)
at
com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:118)
at
com.hp.hpl.jena.sparql.resultset.TextOutput.format(TextOutput.java:65)
at
com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:135)
at
com.hp.hpl.jena.sparql.util.QueryExecUtils.outputResultSet(QueryExecUtils.java:157)
at
com.hp.hpl.jena.sparql.util.QueryExecUtils.doSelectQuery(QueryExecUtils.java:199)
at
com.hp.hpl.jena.sparql.util.QueryExecUtils.executeQuery(QueryExecUtils.java:75)
at arq.query.queryExec(query.java:186)
at arq.query.exec(query.java:145)