Re: DatasetImpl and Model retrieval

Rob Hall Tue, 04 Dec 2012 14:07:58 -0800

My initial instinct for creating the new domain models was to use
composition (either through dynamic proxy objects, or hand writing the
entire interface). In the end, however, my design paralleled the one used
in jena for the same reasons that OntModelImpl doesn't extend InfModel by
composition. In fact, this problem exists as much for OntModels and
InfModels as it does for my implementation. Dataset only produces Model
implementations, thus breaking basic use cases outside of my domain, as
well.


I suggestion that a new model should never be created except if one
is requesting the creation of one that doesn't exist in the graph already.
One reason for this rational is, that if I add a complex model (such as an
InfModel) to the Dataset, the DatasetImpl will pass the model's backing
graph (a union of a graph dynamically produced by inference, and a graph of
the contained tripples) into the DatasetGraph. When the DatasetImpl then
creates a new model object to wrap that graph, it loses the representation
of sub models / etc that existed, while retaining those sub-graphs in the
data. DatasetImpl, as a consequence, fails to do anything except become a
thin wrapper to calling sdg.addNamedGraph.

Furthermore, these models are lightweight, as you mentioned; preparing them
for garbage collection or, alternatively, creating them every time, both
seem needlessly chatty for almost no gain and provable loss. Meanwhile, the
API not only becomes more intuitive / useful for anyone working with
something more complicated than Model, but also remains consistent with its
treatment of the backing graphs.

By this rational, I'd consider DatasetImpl creating models (in instances
where the client code is not requesting a created model) a bug. There is no
reason for an abstraction, whose only purpose is to relate a collection of
models to the collection of their backing graphs, to create new models to
then represent those graphs.



On Tue, Dec 4, 2012 at 4:15 PM, Stephen Allen <[email protected]> wrote:

> On Tue, Dec 4, 2012 at 3:24 PM, Rob Hall <[email protected]> wrote:
> > *tl:dr;* I had to extend DatasetImpl to bypass internal caching in order
> to
> > be able to retrieve the same models that I added to the Dataset. I
> believe
> > the only performance hit that I can see will be that the Models which
> wrap
> > the underlying graphs will not be garbage collected. Is this a correct
> > assumption regarding the effect of this change, and are my expectations
> > regarding the behavior of dataset in error?
> >
> > *background:* I extended the OntModel interface (and created a
> > corresponding extension to OntModelImpl) to facilitate some
> domain-specific
> > entity representations and actions. I wished to execute queries over sets
> > of these Models. These queries were to include their sub-models, and
> > potentially inferred triples from any attached reasoner.
> >
>
> I think a better way to handle this is via composition rather than
> inheritance from the OntModelImpl class.  If your extended model class
> does not need direct access to the instance variables of OntModelImpl,
> then this way is a lot more flexible.  Especially if the
> implementation of OntModelImpl changes in the future.  Essentially,
> make your custom Model a stateless wrapper around another Model.
>
> Make your own interface that extends OntModel, but make your
> implementation class implement that new interface and delegate all
> calls to an OntModel that is passed into the contructor.  This way
> your class very cheap to construct, and you can create a new one
> backed by any OntModel that you retreive from a Dataset.
>
>
> > To allow this, I added instances of my new implementation to a Jena
> > in-memory Dataset.
> >
> > So far, this has worked as I had expected (with fairly thorough
> integration
> > testing taking place). Recently, I discovered a crash whenever I had
> added
> > a sufficient number of models to the system. When I went to retrieve a
> > named model, I could not cast it back to its original type. The following
> > bit of code from DatasetImpl make this clear why:
> >
> >     private Model graph2model(Graph graph)
> >     {
> >         // Called from readers -- outer synchronation needed.
> >         Model model = cache.get(graph) ;
> >         if ( model == null )
> >         {
> >             model = ModelFactory.createModelForGraph(graph) ;
> >             cache.put(graph, model) ;
> >         }
> >         return model ;
> >     }
> >
> > Inside DatasetImpl, a LRU cache is used to keep track of recently
> requested
> > (or inserted) models. After executing for some time, the original model
> has
> > been lost, and a subsequent request (and cast) results in a
> > ClassCastException.
> >
>
> I think that cache needs to be eliminated.  It is probably a relic
> from the days of expensive object creation.
>
>
> > *workaround:* My workaround would have been to extend DatasetImpl and
> > override the 4 methods that interact with the cache to, instead, work
> with
> > a standard HashMap. As those methods are private, am presently creating
> my
> > own implementation.
> >
> > *question:* My workaround makes several assumptions, and I wanted to be
> > certain that they made sense.
> >
> > *1) *My use of Dataset is outside of its expected use. Multiple requests
> to
> > get the same model should not expect to retrieve the same implementation,
> > let alone the same instance, of the model that they saw on a previous
> > request. (Ie: *that this isn't a bug*)
>
> Model is meant to be a stateless wrapper around an underlying Graph,
> and Dataset is a similarly a wrapper around a DatasetGraph.  Therefore
> it is expected that you would different instances of Model back from a
> Dataset than you added.
>
>
> > *2)* The caching that occurs within DatasetImpl primarily exists to
> > minimize held references to held objects in order to facilitate java
> > garbage collection (the Model may be garbage collected, but its backing
> > graph, retained by the Dataset, will not). Therefore, *overriding this
> > behavior will have no effect on applications which are already retaining
> > references to those Models*.
>
> The caching that is there now probably isn't needed, and it should be
> returning a new Model instance every time.
>
> -Stephen
>



-- 
Robert Trevor Hall
Phone: (315) 719-5039

Re: DatasetImpl and Model retrieval

Reply via email to