Thanks for the response Andy. So I guess the overall picture would be that I have a TDB dataset stored on disk and I would like to query it using lucene text match like the following:
PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>SELECT ?doc{ ?lit pf:textMatch '+text' . ?doc ?p ?lit} If I index by partitions of the dataset, can I store that to disk so I don't have to repeat the process again? On Sun, Mar 17, 2013 at 8:57 AM, Andy Seaborne <[email protected]> wrote: > On 17/03/13 00:45, Martino Buffolino wrote: > >> Hi, >> >> I built a large dataset using tdbloader and now I would like to query it >> by >> using a lucene index. I've tried to index by using >> larqBuilder.indexStatements(**model.listStatements()); which led to an >> out of >> memory exception. >> > > Could you give some more details? > > It might be it is using up RAM for something but it might also be because > the model has many large text literals which, combined with all the other > uses of heap, is causing the problem, rather than LARQ per se. > > > Is there another approach to do this? >> > > If it's a large database, then doing it in sections is a possibility. > > What might work (given I'm not sure where it is running out of memory) is > to: > > Get an iterator e.g. model.listStatements() then index some selection of > it (e.g. 1000 items), then close and reopen the index, then index another > 1,000 items from the iterator. > > Andy > > >> Thanks >> >> >
