Thanks, David. Bumping it from 1 GB to 4 GB handled it to produce:
38 MB of gzipped dbpedia URLs, 8 MB of gzipped freebase URLs, and 7 MB of gzipped reference.data.gov.uk URLs. (the only three “big” domains) I’ll put the streaming question on hold until I run out of memory :-) Regards, Tim On Mar 28, 2014, at 10:44 AM, David Jordan <[email protected]> wrote: > The first question to answer is how much memory have you allocated in the > Java heap. You can control this. The default JVM heap size will very likely > be too small. > > -----Original Message----- > From: Timothy Lebo [mailto:[email protected]] > Sent: Friday, March 28, 2014 10:41 AM > To: [email protected] > Subject: OutOfMemoryError with tdbquery > > Jena, > > I have a TDB with 4.2 billion triples that I created with tdbloader. > It's taken from the 2012 Billion Triples Challenge. > I assert three triples for each URL they retrieved ("context"), e.g. for the > URL http://www.hyphen.info/rdf/30.xml: > > <http://www.hyphen.info/rdf/30.xml> > <http://purl.org/twc/vocab/between-the-edges/root> <http://www.hyphen.info> . > <http://www.hyphen.info> <http://purl.org/twc/vocab/between-the-edges/pld> > <http://hyphen.info> . > <http://hyphen.info> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> > <http://purl.org/twc/vocab/between-the-edges/PayLevelDomain> . > > > When I submit the following query with tdbquery: > > select ?url where{?url <http://purl.org/twc/vocab/between-the-edges/root> > <http://dbpedia.org>.} > > The following Exception is thrown. > > I'm assuming that Jena is trying to build up all of the results before > reporting them. > Is there a way to just get "the stream" to avoid the memory issue? > > Thanks, > Tim Lebo > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > com.hp.hpl.jena.tdb.base.record.RecordFactory.create(RecordFactory.java:87) > at > com.hp.hpl.jena.tdb.base.record.RecordFactory.buildFrom(RecordFactory.java:122) > at > com.hp.hpl.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:107) > at > com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:53) > at > com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:130) > at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295) > at > com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:119) > at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295) > at org.openjena.atlas.iterator.Iter$3.hasNext(Iter.java:181) > at org.openjena.atlas.iterator.Iter.hasNext(Iter.java:825) > at > org.openjena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:58) > at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295) > at > com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54) > at > com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108) > at > com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59) > at > com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108) > at > com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40) > at > com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108) > at > com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40) > at > com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108) > at > com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:72) > at > com.hp.hpl.jena.sparql.resultset.ResultSetMem.<init>(ResultSetMem.java:95) > at > com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:147) > at > com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:130) > at > com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:118) > at > com.hp.hpl.jena.sparql.resultset.TextOutput.format(TextOutput.java:65) > at > com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:135) > at > com.hp.hpl.jena.sparql.util.QueryExecUtils.outputResultSet(QueryExecUtils.java:157) > at > com.hp.hpl.jena.sparql.util.QueryExecUtils.doSelectQuery(QueryExecUtils.java:199) > at > com.hp.hpl.jena.sparql.util.QueryExecUtils.executeQuery(QueryExecUtils.java:75) > at arq.query.queryExec(query.java:186) > at arq.query.exec(query.java:145) > >
