Re: Combining TDB + LARQ

Martino Buffolino Mon, 25 Mar 2013 11:26:59 -0700

Sorry I became a little sidetracked over the last week. This occurs when
indexing a model of size 19169727.


*Stack Trace:*

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
    at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
    at java.lang.StringBuilder.append(StringBuilder.java:119)
    at org.apache.jena.larq.LARQ.hash(LARQ.java:266)
    at org.apache.jena.larq.LARQ.unindex(LARQ.java:132)
    at
org.apache.jena.larq.IndexBuilderNode.unindex(IndexBuilderNode.java:116)
    at org.apache.jena.larq.IndexBuilderNode.index(IndexBuilderNode.java:83)
    at
org.apache.jena.larq.IndexBuilderLiteral.indexStatement(IndexBuilderLiteral.java:88)
    at
org.apache.jena.larq.IndexBuilderModel.indexStatements(IndexBuilderModel.java:84)
    at RDFIndexer.main(RDFIndexer.java:53)

line 53: larqBuilder.indexStatements(model.listStatements());

*Running:*

Mac OS X 10.8.3
2.66 Intel Core i7 64bit
8 GB Ram

and setting VM arg -Xmx2048m

*Code:

*IndexBuilderString larqBuilder = new IndexBuilderString();
Dataset dataset = TDBFactory.createDataset(dir);
Model model = dataset.getDefaultModel();
larqBuilder.indexStatements(model.listStatements());


Please let me know if you need any other information. Thanks


On Mon, Mar 18, 2013 at 12:10 PM, Andy Seaborne <[email protected]> wrote:

> On 18/03/13 01:50, Martino Buffolino wrote:
>
>> Thanks for the response Andy.
>>
>> So I guess the overall picture would be that I have a TDB dataset stored
>> on
>> disk and I would like to query it using lucene text match like the
>> following:
>>
>> PREFIX pf: 
>> <http://jena.hpl.hp.com/ARQ/**property#<http://jena.hpl.hp.com/ARQ/property#>>SELECT
>> ?doc{
>>      ?lit pf:textMatch '+text' .
>>      ?doc ?p ?lit}
>>
>> If I index by partitions of the dataset, can I store that to disk so I
>> don't have to repeat the process again?
>>
>
> Yes - the lucene index should be on disk.
>
> (/me still not clear where it runs out of memory - what's your system? 32
> bit? What's the stacktrace?)
>
>
>>
>> On Sun, Mar 17, 2013 at 8:57 AM, Andy Seaborne <[email protected]> wrote:
>>
>>  On 17/03/13 00:45, Martino Buffolino wrote:
>>>
>>>  Hi,
>>>>
>>>> I built a large dataset using tdbloader and now I would like to query it
>>>> by
>>>> using a lucene index. I've tried to index by using
>>>> larqBuilder.indexStatements(****model.listStatements()); which led to
>>>> an
>>>> out of
>>>> memory exception.
>>>>
>>>>
>>> Could you give some more details?
>>>
>>> It might be it is using up RAM for something but it might also be because
>>> the model has many large text literals which, combined with all the other
>>> uses of heap, is causing the problem, rather than LARQ per se.
>>>
>>>
>>>   Is there another approach to do this?
>>>
>>>>
>>>>
>>> If it's a large database, then doing it in sections is a possibility.
>>>
>>> What might work (given I'm not sure where it is running out of memory) is
>>> to:
>>>
>>> Get an iterator e.g. model.listStatements() then index some selection of
>>> it (e.g. 1000 items), then close and reopen the index, then index another
>>> 1,000 items from the iterator.
>>>
>>>          Andy
>>>
>>>
>>>  Thanks
>>>>
>>>>
>>>>
>>>
>>
>

Re: Combining TDB + LARQ

Reply via email to