On 18/03/13 01:50, Martino Buffolino wrote:
Thanks for the response Andy.
So I guess the overall picture would be that I have a TDB dataset stored on
disk and I would like to query it using lucene text match like the
following:
PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>SELECT ?doc{
?lit pf:textMatch '+text' .
?doc ?p ?lit}
If I index by partitions of the dataset, can I store that to disk so I
don't have to repeat the process again?
Yes - the lucene index should be on disk.
(/me still not clear where it runs out of memory - what's your system?
32 bit? What's the stacktrace?)
On Sun, Mar 17, 2013 at 8:57 AM, Andy Seaborne <[email protected]> wrote:
On 17/03/13 00:45, Martino Buffolino wrote:
Hi,
I built a large dataset using tdbloader and now I would like to query it
by
using a lucene index. I've tried to index by using
larqBuilder.indexStatements(**model.listStatements()); which led to an
out of
memory exception.
Could you give some more details?
It might be it is using up RAM for something but it might also be because
the model has many large text literals which, combined with all the other
uses of heap, is causing the problem, rather than LARQ per se.
Is there another approach to do this?
If it's a large database, then doing it in sections is a possibility.
What might work (given I'm not sure where it is running out of memory) is
to:
Get an iterator e.g. model.listStatements() then index some selection of
it (e.g. 1000 items), then close and reopen the index, then index another
1,000 items from the iterator.
Andy
Thanks