Jena TDB: Limitations of orgnizing large collections via named graphs

Fidan Limani Tue, 27 Oct 2020 01:27:02 -0700

Recently, I am dealing with a large collection of resources that need to be 
converted to RDF. The original collection contains a set of files, each 
containing > 4 M resources on average. In order to keep the provenance, I 
thought having named graphs with the same name to organize the RDF collection 
would be nice.


However, after half of the collection is stored, even on a powerful server, the 
memory does not seem to be enough for the store operation in the TDB. Consider 
the following statement:
     
>  dataset.getNamedModel(namedGraph).add(model);

In it, we retrieve the current RDF Model of triples and add another collection 
of triples to it. After a while, once the storage reaches a certain point, the 
operation "hangs" due to heap space exception.

(Finally) The question, then, is: is there a way (a more streaming-like) to 
store larger collections via named graphs? My current workaround consists in 
splitting the original collection into smaller, more manageable collections 
that the server can handle and store in named graphs.

Jena TDB: Limitations of orgnizing large collections via named graphs

Reply via email to