Re: inferencing a small subset of a graph

Dave Reynolds Fri, 23 Aug 2013 01:21:09 -0700

On 23/08/13 01:03, David Jordan wrote:


The default Jena reasoner performs inference on an entire graph. For a very 
large graph, this inferencing can be fairly expensive. I got asked today 
whether there is any way to just do inferencing on a small subset of a very 
large graph.

I am wondering whether it would be feasible and make sense to create a new 
in-memory graph and then essentially make a copy of the relevant triples from 
the very large graph into this in-memory graph, and then perform inferencing 
just on that small graph. The purpose is to answer a query or question on a 
small subset of the graph without incurring the overhead of doing it for the 
entire graph.


It is certainly feasible.

For example, it is not that uncommon to take a graph containing thedescription of a few resources, compute the inference closure of thatagainst the full ontologies, and then either query over that closure oradd the closure into a larger model for later query.

This is one way to get incremental additions with inference to a largemodel.

*However*, this can lead to incomplete inferences so whether theapproach is viable for any given situation depends on the data, theontology and the queries you need to be able to answer.

For example, simple RDFS inference can be handled this way - so long asall RDFS assertions (class hierarchy, property definitions andhierarchy) are in the ontology you reason against. For example, if inyour fragment graph you have:


   :subject a :MyClass; :property :myvalue .

and in your ontology you have:

   :MyClass rdfs:subClassOf :Super .
   :property rdfs:range :R; rdfs:domain :D; rdfs:subPropertyOf :superp .

then you can make the local inferences quite happily, independent ofwhat might be in the rest of the data:


   :subject a :MyClass, :Super, :D;
      :property :myvalue;
      :superp :myvalue .
   :myvalue a :R .

However, for OWL you can have longer range effects for exampletransitive properties and chain axioms. These can't be computed on localextracts. For example, if :p is a transitive property and your main datahas:


  :a :p :b .

  :c :p :d .

Then you go to add a fragment:

  :b :p :c .

Then if you can see the whole graph you could infer:

  :a :p :c .
  :a :p :d .
  :b :p :d .

But you can't infer any of these just from the fragment and the ontology.

All of RDFS and OWL are monotonic, which means that the inferences youcan make over a fragment are always correct, they may just be incomplete.

If by "inference" you include non-monotonic rules or a closed worldassumption then you can't take the fragment approach at all. You getincorrect inferences, not just incomplete ones.

Is this a common practice? Best practice?

I would describe it as "there are circumstances where this can be usefuland appropriate". Not best practice in general for the reasons outlined.

Are there any recommended ways to efficiently implement the copy process from 
the stored graph into the in-memory graph?

Depends on what defines a fragment for you and how you are accessingyour data.

If a fragment is a description of a few named resources, and you areaccessing your data as a local model then you can use the Closureutility to extract the bNode closure of their description. If you havethe same fragment definition but are accessing a remote endpoint thenuse SPARQL DESCRIBE.

If your graph is split into small connected components and you want toextract a complete connected component then seeResourceUtils.reachableClosure - but it is pretty rare for an RDF graphto be of the right shape for that.


Dave

Re: inferencing a small subset of a graph

Reply via email to