Thanks, David.

Bumping it from 1 GB to 4 GB handled it to produce:

38 MB of gzipped dbpedia URLs, 
8 MB of gzipped freebase URLs, and 
7 MB of gzipped reference.data.gov.uk URLs.
(the only three “big” domains)

I’ll put the streaming question on hold until I run out of memory :-)

Regards,
Tim

On Mar 28, 2014, at 10:44 AM, David Jordan <[email protected]> wrote:

> The first question to answer is how much memory have you allocated in the 
> Java heap. You can control this. The default JVM heap size will very likely 
> be too small.
> 
> -----Original Message-----
> From: Timothy Lebo [mailto:[email protected]] 
> Sent: Friday, March 28, 2014 10:41 AM
> To: [email protected]
> Subject: OutOfMemoryError with tdbquery
> 
> Jena,
> 
> I have a TDB with 4.2 billion triples that I created with tdbloader.
> It's taken from the 2012 Billion Triples Challenge.
> I assert three triples for each URL they retrieved ("context"), e.g. for the 
> URL http://www.hyphen.info/rdf/30.xml:
> 
> <http://www.hyphen.info/rdf/30.xml> 
> <http://purl.org/twc/vocab/between-the-edges/root> <http://www.hyphen.info> .
> <http://www.hyphen.info> <http://purl.org/twc/vocab/between-the-edges/pld> 
> <http://hyphen.info> .
> <http://hyphen.info> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
> <http://purl.org/twc/vocab/between-the-edges/PayLevelDomain> .
> 
> 
> When I submit the following query with tdbquery:
> 
> select ?url where{?url <http://purl.org/twc/vocab/between-the-edges/root> 
> <http://dbpedia.org>.}
> 
> The following Exception is thrown.
> 
> I'm assuming that Jena is trying to build up all of the results before 
> reporting them.
> Is there a way to just get "the stream" to avoid the memory issue?
> 
> Thanks,
> Tim Lebo
> 
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>       at 
> com.hp.hpl.jena.tdb.base.record.RecordFactory.create(RecordFactory.java:87)
>       at 
> com.hp.hpl.jena.tdb.base.record.RecordFactory.buildFrom(RecordFactory.java:122)
>       at 
> com.hp.hpl.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:107)
>       at 
> com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:53)
>       at 
> com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:130)
>       at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
>       at 
> com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:119)
>       at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
>       at org.openjena.atlas.iterator.Iter$3.hasNext(Iter.java:181)
>       at org.openjena.atlas.iterator.Iter.hasNext(Iter.java:825)
>       at 
> org.openjena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:58)
>       at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
>       at 
> com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
>       at 
> com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
>       at 
> com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)
>       at 
> com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
>       at 
> com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
>       at 
> com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
>       at 
> com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
>       at 
> com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
>       at 
> com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:72)
>       at 
> com.hp.hpl.jena.sparql.resultset.ResultSetMem.<init>(ResultSetMem.java:95)
>       at 
> com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:147)
>       at 
> com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:130)
>       at 
> com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:118)
>       at 
> com.hp.hpl.jena.sparql.resultset.TextOutput.format(TextOutput.java:65)
>       at 
> com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:135)
>       at 
> com.hp.hpl.jena.sparql.util.QueryExecUtils.outputResultSet(QueryExecUtils.java:157)
>       at 
> com.hp.hpl.jena.sparql.util.QueryExecUtils.doSelectQuery(QueryExecUtils.java:199)
>       at 
> com.hp.hpl.jena.sparql.util.QueryExecUtils.executeQuery(QueryExecUtils.java:75)
>       at arq.query.queryExec(query.java:186)
>       at arq.query.exec(query.java:145)
> 
> 

Reply via email to