On 23/08/13 01:03, David Jordan wrote:

The default Jena reasoner performs inference on an entire graph. For a very 
large graph, this inferencing can be fairly expensive. I got asked today 
whether there is any way to just do inferencing on a small subset of a very 
large graph.

I am wondering whether it would be feasible and make sense to create a new 
in-memory graph and then essentially make a copy of the relevant triples from 
the very large graph into this in-memory graph, and then perform inferencing 
just on that small graph. The purpose is to answer a query or question on a 
small subset of the graph without incurring the overhead of doing it for the 
entire graph.

It is certainly feasible.

For example, it is not that uncommon to take a graph containing the description of a few resources, compute the inference closure of that against the full ontologies, and then either query over that closure or add the closure into a larger model for later query.

This is one way to get incremental additions with inference to a large model.

*However*, this can lead to incomplete inferences so whether the approach is viable for any given situation depends on the data, the ontology and the queries you need to be able to answer.

For example, simple RDFS inference can be handled this way - so long as all RDFS assertions (class hierarchy, property definitions and hierarchy) are in the ontology you reason against. For example, if in your fragment graph you have:

   :subject a :MyClass; :property :myvalue .

and in your ontology you have:

   :MyClass rdfs:subClassOf :Super .
   :property rdfs:range :R; rdfs:domain :D; rdfs:subPropertyOf :superp .

then you can make the local inferences quite happily, independent of what might be in the rest of the data:

   :subject a :MyClass, :Super, :D;
      :property :myvalue;
      :superp :myvalue .
   :myvalue a :R .

However, for OWL you can have longer range effects for example transitive properties and chain axioms. These can't be computed on local extracts. For example, if :p is a transitive property and your main data has:

  :a :p :b .

  :c :p :d .

Then you go to add a fragment:

  :b :p :c .

Then if you can see the whole graph you could infer:

  :a :p :c .
  :a :p :d .
  :b :p :d .

But you can't infer any of these just from the fragment and the ontology.

All of RDFS and OWL are monotonic, which means that the inferences you can make over a fragment are always correct, they may just be incomplete.

If by "inference" you include non-monotonic rules or a closed world assumption then you can't take the fragment approach at all. You get incorrect inferences, not just incomplete ones.

Is this a common practice? Best practice?

I would describe it as "there are circumstances where this can be useful and appropriate". Not best practice in general for the reasons outlined.

Are there any recommended ways to efficiently implement the copy process from 
the stored graph into the in-memory graph?

Depends on what defines a fragment for you and how you are accessing your data.

If a fragment is a description of a few named resources, and you are accessing your data as a local model then you can use the Closure utility to extract the bNode closure of their description. If you have the same fragment definition but are accessing a remote endpoint then use SPARQL DESCRIBE.

If your graph is split into small connected components and you want to extract a complete connected component then see ResourceUtils.reachableClosure - but it is pretty rare for an RDF graph to be of the right shape for that.

Dave

Reply via email to