Sorry I became a little sidetracked over the last week. This occurs when
indexing a model of size 19169727.
*Stack Trace:*
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at org.apache.jena.larq.LARQ.hash(LARQ.java:266)
at org.apache.jena.larq.LARQ.unindex(LARQ.java:132)
at
org.apache.jena.larq.IndexBuilderNode.unindex(IndexBuilderNode.java:116)
at org.apache.jena.larq.IndexBuilderNode.index(IndexBuilderNode.java:83)
at
org.apache.jena.larq.IndexBuilderLiteral.indexStatement(IndexBuilderLiteral.java:88)
at
org.apache.jena.larq.IndexBuilderModel.indexStatements(IndexBuilderModel.java:84)
at RDFIndexer.main(RDFIndexer.java:53)
line 53: larqBuilder.indexStatements(model.listStatements());
*Running:*
Mac OS X 10.8.3
2.66 Intel Core i7 64bit
8 GB Ram
and setting VM arg -Xmx2048m
*Code:
*IndexBuilderString larqBuilder = new IndexBuilderString();
Dataset dataset = TDBFactory.createDataset(dir);
Model model = dataset.getDefaultModel();
larqBuilder.indexStatements(model.listStatements());
Please let me know if you need any other information. Thanks
On Mon, Mar 18, 2013 at 12:10 PM, Andy Seaborne <[email protected]> wrote:
> On 18/03/13 01:50, Martino Buffolino wrote:
>
>> Thanks for the response Andy.
>>
>> So I guess the overall picture would be that I have a TDB dataset stored
>> on
>> disk and I would like to query it using lucene text match like the
>> following:
>>
>> PREFIX pf:
>> <http://jena.hpl.hp.com/ARQ/**property#<http://jena.hpl.hp.com/ARQ/property#>>SELECT
>> ?doc{
>> ?lit pf:textMatch '+text' .
>> ?doc ?p ?lit}
>>
>> If I index by partitions of the dataset, can I store that to disk so I
>> don't have to repeat the process again?
>>
>
> Yes - the lucene index should be on disk.
>
> (/me still not clear where it runs out of memory - what's your system? 32
> bit? What's the stacktrace?)
>
>
>>
>> On Sun, Mar 17, 2013 at 8:57 AM, Andy Seaborne <[email protected]> wrote:
>>
>> On 17/03/13 00:45, Martino Buffolino wrote:
>>>
>>> Hi,
>>>>
>>>> I built a large dataset using tdbloader and now I would like to query it
>>>> by
>>>> using a lucene index. I've tried to index by using
>>>> larqBuilder.indexStatements(****model.listStatements()); which led to
>>>> an
>>>> out of
>>>> memory exception.
>>>>
>>>>
>>> Could you give some more details?
>>>
>>> It might be it is using up RAM for something but it might also be because
>>> the model has many large text literals which, combined with all the other
>>> uses of heap, is causing the problem, rather than LARQ per se.
>>>
>>>
>>> Is there another approach to do this?
>>>
>>>>
>>>>
>>> If it's a large database, then doing it in sections is a possibility.
>>>
>>> What might work (given I'm not sure where it is running out of memory) is
>>> to:
>>>
>>> Get an iterator e.g. model.listStatements() then index some selection of
>>> it (e.g. 1000 items), then close and reopen the index, then index another
>>> 1,000 items from the iterator.
>>>
>>> Andy
>>>
>>>
>>> Thanks
>>>>
>>>>
>>>>
>>>
>>
>