Hi David,
On 28/03/13 19:47, David Jordan wrote:
I would like to confirm my understanding about reasoning in Jena, as well as
ask whether the Pellet reasoner does things different.
When I create an OntModel, there is essentially no overhead, this is very fast.
With the very first use of the OntModel, it takes considerable time to produce
a response. Once I get that response, is ALL the reasoning been completed for
the entire OntModel? Subsequent calls are faster, suggesting that there is much
less work being done. Is ALL reasoning done with the first call, or is there
additional lookup/reasoning done with subsequent calls?
The latter.
The rule engines use a mix of forward and backward reasoning.
The forward reasoning can be run any time you like by manually calling
prepare() but if you issue a query to an inference model which is not
yet in a "prepared" state then the prepare() will be triggered
automatically. This forward phase just depends on the rules and the
data, not any query.
The results of forward reasoning (and indeed all the intermediate state)
are stored in memory.
Then for each query a number of backward rules might match to generate
additional inferences. Sometimes queries can reuse partial goal results
from earlier queries (through a sort of caching mechanism called
tabling) but not not always. Typically there will always be some
backward reasoning run for every query.
Does Pellet operate the same way, when it is being used with Jena? Does it do
all inferencing at once, or in a more lazy eval fashion?
You would have to ask on the Pellet list but I would expect that a
substantial part of the reasoning will be done in the equivalent of a
prepare phase, i.e. eager.
I am doing some benchmarking, there is one piece of code that is running orders
of magnitude slower than everything else. The code is as follows: (This is
using SDB with the latest version of Postgres)
OntClass oclass = omodel.getOntClass(GRAPE);
ExtendedIterator<OntProperty> properties =
oclass.listDeclaredProperties(true);
while( properties.hasNext() ){
OntProperty property = properties.next();
System.out.println(property.getLocalName());
}
I have run the command to create the indexes on the data. Is this expected to
be real slow?
Yes.
Reasoning running over a database is always slow and does you no good at
all. The reasoners typically need to touch all the data and need to keep
intermediate results in memory so there is essentially no benefit to
having the base model in a DB, much better to load it into memory and
*then* do reasoning.
Compounded with this SDB is slower than TDB in general, and postgres is
not necessarily the fastest back-end for SDB.
Compounded with this listDeclaredProperties has to do a lot walking over
classes, properties asking about domains and ranges and does all that
using separate Jena API calls. Which translates into a lot of small
queries. SDB is (as the S suggests) has better performance trade-offs
when doing fewer more substantial SPARQL queries rather than lots of
little API calls.
If you can arrange it the best way to use reasoning in combination with
a database is to load all that into memory (maybe on temporary big
machine), compute all the inferences you are interested, then store to a
database a merge of the data plus the inferences (probably as two
separate graphs in a union-default store). Then in your application
query this closure using a non-inference model.
Of course of your application updates the data and you need to see
inferred consequences of that update then this strategy doesn't work.
Dave