On 23/08/13 01:03, David Jordan wrote:
The default Jena reasoner performs inference on an entire graph. For a very
large graph, this inferencing can be fairly expensive. I got asked today
whether there is any way to just do inferencing on a small subset of a very
large graph.
I am wondering whether it would be feasible and make sense to create a new
in-memory graph and then essentially make a copy of the relevant triples from
the very large graph into this in-memory graph, and then perform inferencing
just on that small graph. The purpose is to answer a query or question on a
small subset of the graph without incurring the overhead of doing it for the
entire graph.
It is certainly feasible.
For example, it is not that uncommon to take a graph containing the
description of a few resources, compute the inference closure of that
against the full ontologies, and then either query over that closure or
add the closure into a larger model for later query.
This is one way to get incremental additions with inference to a large
model.
*However*, this can lead to incomplete inferences so whether the
approach is viable for any given situation depends on the data, the
ontology and the queries you need to be able to answer.
For example, simple RDFS inference can be handled this way - so long as
all RDFS assertions (class hierarchy, property definitions and
hierarchy) are in the ontology you reason against. For example, if in
your fragment graph you have:
:subject a :MyClass; :property :myvalue .
and in your ontology you have:
:MyClass rdfs:subClassOf :Super .
:property rdfs:range :R; rdfs:domain :D; rdfs:subPropertyOf :superp .
then you can make the local inferences quite happily, independent of
what might be in the rest of the data:
:subject a :MyClass, :Super, :D;
:property :myvalue;
:superp :myvalue .
:myvalue a :R .
However, for OWL you can have longer range effects for example
transitive properties and chain axioms. These can't be computed on local
extracts. For example, if :p is a transitive property and your main data
has:
:a :p :b .
:c :p :d .
Then you go to add a fragment:
:b :p :c .
Then if you can see the whole graph you could infer:
:a :p :c .
:a :p :d .
:b :p :d .
But you can't infer any of these just from the fragment and the ontology.
All of RDFS and OWL are monotonic, which means that the inferences you
can make over a fragment are always correct, they may just be incomplete.
If by "inference" you include non-monotonic rules or a closed world
assumption then you can't take the fragment approach at all. You get
incorrect inferences, not just incomplete ones.
Is this a common practice? Best practice?
I would describe it as "there are circumstances where this can be useful
and appropriate". Not best practice in general for the reasons outlined.
Are there any recommended ways to efficiently implement the copy process from
the stored graph into the in-memory graph?
Depends on what defines a fragment for you and how you are accessing
your data.
If a fragment is a description of a few named resources, and you are
accessing your data as a local model then you can use the Closure
utility to extract the bNode closure of their description. If you have
the same fragment definition but are accessing a remote endpoint then
use SPARQL DESCRIBE.
If your graph is split into small connected components and you want to
extract a complete connected component then see
ResourceUtils.reachableClosure - but it is pretty rare for an RDF graph
to be of the right shape for that.
Dave