Re: OutOfMemoryError with tdbquery

Andy Seaborne Fri, 28 Mar 2014 09:54:52 -0700

        at 
com.hp.hpl.jena.sparql.resultset.TextOutput.format(TextOutput.java:65)
        at 
com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:135)
        at 
com.hp.hpl.jena.sparql.util.QueryExecUtils.outputResultSet(QueryExecUtils.java:157)
        at 
com.hp.hpl.jena.sparql.util.QueryExecUtils.doSelectQuery(QueryExecUtils.java:199)
        at 
com.hp.hpl.jena.sparql.util.QueryExecUtils.executeQuery(QueryExecUtils.java:75)


Looks like you are trying to output as formatted text.

For text format aligns column widths so it needs to scan the entireresult set to find column widths, then go back and actually write stuff.


It takes a copy of the whole results to do that.

You can use a streaming format like JSON, TSV, CSV (the last two can bethought of as unformatted text).


        Andy

On 28/03/14 15:10, Timothy Lebo wrote:

Thanks, David.

Bumping it from 1 GB to 4 GB handled it to produce:

38 MB of gzipped dbpedia URLs,
8 MB of gzipped freebase URLs, and
7 MB of gzipped reference.data.gov.uk URLs.
(the only three “big” domains)

I’ll put the streaming question on hold until I run out of memory :-)

Regards,
Tim

On Mar 28, 2014, at 10:44 AM, David Jordan <[email protected]> wrote:

The first question to answer is how much memory have you allocated in the Java 
heap. You can control this. The default JVM heap size will very likely be too 
small.

-----Original Message-----
From: Timothy Lebo [mailto:[email protected]]
Sent: Friday, March 28, 2014 10:41 AM
To: [email protected]
Subject: OutOfMemoryError with tdbquery

Jena,

I have a TDB with 4.2 billion triples that I created with tdbloader.
It's taken from the 2012 Billion Triples Challenge.
I assert three triples for each URL they retrieved ("context"), e.g. for the 
URL http://www.hyphen.info/rdf/30.xml:

<http://www.hyphen.info/rdf/30.xml> 
<http://purl.org/twc/vocab/between-the-edges/root> <http://www.hyphen.info> .
<http://www.hyphen.info> <http://purl.org/twc/vocab/between-the-edges/pld> 
<http://hyphen.info> .
<http://hyphen.info> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/twc/vocab/between-the-edges/PayLevelDomain> .


When I submit the following query with tdbquery:

select ?url where{?url <http://purl.org/twc/vocab/between-the-edges/root> 
<http://dbpedia.org>.}

The following Exception is thrown.

I'm assuming that Jena is trying to build up all of the results before 
reporting them.
Is there a way to just get "the stream" to avoid the memory issue?

Thanks,
Tim Lebo

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at 
com.hp.hpl.jena.tdb.base.record.RecordFactory.create(RecordFactory.java:87)
        at 
com.hp.hpl.jena.tdb.base.record.RecordFactory.buildFrom(RecordFactory.java:122)
        at 
com.hp.hpl.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:107)
        at 
com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:53)
        at 
com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:130)
        at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
        at 
com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:119)
        at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
        at org.openjena.atlas.iterator.Iter$3.hasNext(Iter.java:181)
        at org.openjena.atlas.iterator.Iter.hasNext(Iter.java:825)
        at 
org.openjena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:58)
        at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:72)
        at 
com.hp.hpl.jena.sparql.resultset.ResultSetMem.<init>(ResultSetMem.java:95)
        at 
com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:147)
        at 
com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:130)
        at 
com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:118)
        at 
com.hp.hpl.jena.sparql.resultset.TextOutput.format(TextOutput.java:65)
        at 
com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:135)
        at 
com.hp.hpl.jena.sparql.util.QueryExecUtils.outputResultSet(QueryExecUtils.java:157)
        at 
com.hp.hpl.jena.sparql.util.QueryExecUtils.doSelectQuery(QueryExecUtils.java:199)
        at 
com.hp.hpl.jena.sparql.util.QueryExecUtils.executeQuery(QueryExecUtils.java:75)
        at arq.query.queryExec(query.java:186)
        at arq.query.exec(query.java:145)

Re: OutOfMemoryError with tdbquery

Reply via email to