Recently, I am dealing with a large collection of resources that need to be
converted to RDF. The original collection contains a set of files, each
containing > 4 M resources on average. In order to keep the provenance, I
thought having named graphs with the same name to organize the RDF collection
would be nice.
However, after half of the collection is stored, even on a powerful server, the
memory does not seem to be enough for the store operation in the TDB. Consider
the following statement:
> dataset.getNamedModel(namedGraph).add(model);
In it, we retrieve the current RDF Model of triples and add another collection
of triples to it. After a while, once the storage reaches a certain point, the
operation "hangs" due to heap space exception.
(Finally) The question, then, is: is there a way (a more streaming-like) to
store larger collections via named graphs? My current workaround consists in
splitting the original collection into smaller, more manageable collections
that the server can handle and store in named graphs.