tl;dr I'm afraid there's no support for pre-materialized but
incrementally updated OWL reasoning over persistent models in Jena.
On 20/10/2020 02:00, Zalan Kemenczy wrote:
Hi there,
I've been experimenting with various reasoning profiles to get better
performance in my project, and I'm looking for guidance on how to push
materialization towards data creation time, rather than query time. I
believe this would suit my use-case: data mutations are infrequent (they do
happen), but queries need to be fast.
Some additional configuration context:
1. Loading some owl ontologies (~80k triples) into an in memory model
2. Bind it to a OWLMicro reasoner using bindSchema
3. Bind the reasoner to a TDB2 backed model
With just the ontologies loaded, even before I loaded any instance data,
this led to fairly slow queries (> 1 min). To confirm the issue was with
the inference layer, I serialized all the ground and inferred triples to a
second tdb2 backed dataset, with no inference layer, and my reference
queries were much faster (~100ms) and returned identical results.
Reading the docs, I gathered I could try to push materialization towards
data mutation time with a combination of forward reasoning and prepare. The
inference doc had this to say about the GenericRuleReasoner:
"When run in forward mode all rules are treated as forward even if they
were written in backward ("<-") syntax. This allows the same rule set to be
used in different modes to explore the performance tradeoffs."
If you just have a plain rule set with simple A -> B or B <- A rules
then you can indeed run the rules in either direction. However, if you
have a hybrid rule set which mixes directions, and in particular uses
forward rules to create instantiated backward rules, that won't work -
those intrinsically need the hybrid engine.
However, I've run into a couple of issues:
1. If I setMode on the OWLMicroReasoner too FORWARD, I get the following
exception when I try to bind the reasoner to a graph:
org.apache.jena.reasoner.rulesys.BasicForwardRuleInfGraph cannot be cast to
org.apache.jena.reasoner.rulesys.FBRuleInfGraph
Due to the following line:(
https://github.com/apache/jena/blob/bfce1741cb12f9cf544235d32fba6598bc7341b5/jena-core/src/main/java/org/apache/jena/reasoner/rulesys/OWLMicroReasoner.java#L94
)
Yes the OWLMicroReasoner is intrinsically a hybrid ("FB") rule reasoner
and that can't be changed.
2. If I use a GenericRuleReasoner, loaded with OWLMicro rules set to
FORWARD mode, I can bind the reasoner, but then I get the following
execution error at query execution time:
Forward reasoner does not support hybrid rules - [ (?x owl:intersectionOf
?y) -> (?x rdf:type owl:Class) ]
Which I don't understand because that does not seem like a backward rule.
As noted above the OWLMicro rules are hybrid and can't be run via a pure
forward engine.
That particular example looks like a plain fact and forward mode should
support that, possible a bug. However, those cases are the least of the
your worries, it's true hybrid rules like:
[inverseOf2: (?P owl:inverseOf ?Q)
-> table(?P), table(?Q), [inverseOf2b: (?X ?P ?Y) <- (?Y ?Q ?X)] ]
that simply have no pure forward equivalent.
So to sum up, I have two questions:
1. What would be your recommended approach to pushing materialization to
data creation time
I'm afraid there's no good support for this in jena.
If the rate of data updates is very low compared to the rate of queries
then you could re-run the entire materialization from scratch each time
the data changes. Unsubtle and slow at materialization time but queries
would then be faster.
If the rate of data updates is high but your data fits in memory then
use the in-memory reasoner and let it's (limited) incremental reasoning
handle the changes.
But I'm afraid Jena has no support for incrementally updating inference
results when the data is beyond memory limits and persisted to e.g. TDB.
2. How would you create forward rules reasoner that implements OWLmicro, or
closest to
You would need to write a custom pure-forward ruleset to implement the
axioms you want, perhaps starting from etc/rdfs-noresource.rules and
adding the relevant OWL axioms. Depending on which axioms you want
performance may or may not be problematic and the pure forward engine
will still hold all it's data in memory so that won't scale any better.
Dave