*tl:dr;* I had to extend DatasetImpl to bypass internal caching in order to
be able to retrieve the same models that I added to the Dataset. I believe
the only performance hit that I can see will be that the Models which wrap
the underlying graphs will not be garbage collected. Is this a correct
assumption regarding the effect of this change, and are my expectations
regarding the behavior of dataset in error?
*background:* I extended the OntModel interface (and created a
corresponding extension to OntModelImpl) to facilitate some domain-specific
entity representations and actions. I wished to execute queries over sets
of these Models. These queries were to include their sub-models, and
potentially inferred triples from any attached reasoner.
To allow this, I added instances of my new implementation to a Jena
in-memory Dataset.
So far, this has worked as I had expected (with fairly thorough integration
testing taking place). Recently, I discovered a crash whenever I had added
a sufficient number of models to the system. When I went to retrieve a
named model, I could not cast it back to its original type. The following
bit of code from DatasetImpl make this clear why:
private Model graph2model(Graph graph)
{
// Called from readers -- outer synchronation needed.
Model model = cache.get(graph) ;
if ( model == null )
{
model = ModelFactory.createModelForGraph(graph) ;
cache.put(graph, model) ;
}
return model ;
}
Inside DatasetImpl, a LRU cache is used to keep track of recently requested
(or inserted) models. After executing for some time, the original model has
been lost, and a subsequent request (and cast) results in a
ClassCastException.
*workaround:* My workaround would have been to extend DatasetImpl and
override the 4 methods that interact with the cache to, instead, work with
a standard HashMap. As those methods are private, am presently creating my
own implementation.
*question:* My workaround makes several assumptions, and I wanted to be
certain that they made sense.
*1) *My use of Dataset is outside of its expected use. Multiple requests to
get the same model should not expect to retrieve the same implementation,
let alone the same instance, of the model that they saw on a previous
request. (Ie: *that this isn't a bug*)
*2)* The caching that occurs within DatasetImpl primarily exists to
minimize held references to held objects in order to facilitate java
garbage collection (the Model may be garbage collected, but its backing
graph, retained by the Dataset, will not). Therefore, *overriding this
behavior will have no effect on applications which are already retaining
references to those Models*.
Thanks in advance,
Rob