Davide,

Are you running on a 32 bit JVM? Is that why you can't increase the heap size?

The internal constants for TDB are tuned for one database on 32 bit JVM. If you have 64 bit hardware and JVM, then you can have several large datasets but 32 bit is a limiting due to the java limition of max ~1.5G heap.

A TDB to TDB addNamedModel does need to take a copy as it copies triples into the target database. This will expand the internal caches. I guess this is what is causing the OOME.

You can have a general Dataset (datasetFactory.createMem()) into which you put TDB backed models. No copy occurs. But you are stil exposed to exceeding the cache sizes.

The issue is the need to have two datasets in a small situation. What's driving this need?

You may be better creating a new TDB dataset (dump to n-quads, concat teh files, load into a new DB) from the first two and working with that. It should fit into a 32 bit JVM.

        Andy

On 21/07/13 18:08, Davide Rossi wrote:
Hi Andy.
First of all, thank you for your answers. I'm new about Jena, so I' ll try
to be as much accurate as possible.

What kind of dataset are you using?   Some don't copy at this point, some
do.

I'm using a typical RDF dataset. I create it by
TDBFactory.createDataset(...). If I understood well, there are some kind of
dataset that don't copy data to memory when I do
dirstDataset.addNamedModel("second", secondModel). Is it correct? If so,
what are these kinds of dataset? Maybe I could try use them.

Thank you very much for your courtesy
Regards
Davide


2013/7/21 Andy Seaborne <[email protected]>

On 21/07/13 14:04, Davide Rossi wrote:

Hi everybody,
I have two large datasets, each of which is about 2GB of memory. Now, I
have to query both datasets in one single query because the informations I
have to retrieve are divided in these datasets so I have to navigate their
union. The problem is that I have only 1GB of memory on my JVM so when I
try to do
firstDataset.addNamedModel("**second", secondModel)


What kind of dataset are you using?   Some don't copy at this point, some
do.


  or
firstModel.add(secondModel)
I have an OutOfMemoryError. So, I would like to know if is possible solve
this memory problem avoiding store all the informations in only one
dataset
(I must store the informations in two datasets).

Thanks for your answers
regards
Davide


Do you mean in two graphs rather than two datasets?  An RDF dataset is
itself a collection of graphs.

If you are running out of memory, then you probably want to consider using
a database to store the data.

See TDB:

http://jena.apache.org/**documentation/tdb/index.html<http://jena.apache.org/documentation/tdb/index.html>

         Andy



Reply via email to