tl;dr I'm afraid there's no support for pre-materialized but incrementally updated OWL reasoning over persistent models in Jena.

On 20/10/2020 02:00, Zalan Kemenczy wrote:
Hi there,

I've been experimenting with various reasoning profiles to get better
performance in my project, and I'm looking for guidance on how to push
materialization towards data creation time, rather than query time. I
believe this would suit my use-case: data mutations are infrequent (they do
happen), but queries need to be fast.

Some additional configuration context:

1. Loading some owl ontologies (~80k triples) into an in memory model
2. Bind it to a OWLMicro reasoner using bindSchema
3. Bind the reasoner to a TDB2 backed model

With just the ontologies loaded, even before I loaded any instance data,
this led to fairly slow queries (> 1 min). To confirm the issue was with
the inference layer, I serialized all the ground and inferred triples to a
second tdb2 backed dataset, with no inference layer, and my reference
queries were much faster (~100ms) and returned identical results.

Reading the docs, I gathered I could try to push materialization towards
data mutation time with a combination of forward reasoning and prepare. The
inference doc had this to say about the GenericRuleReasoner:

"When run in forward mode all rules are treated as forward even if they
were written in backward ("<-") syntax. This allows the same rule set to be
used in different modes to explore the performance tradeoffs."

If you just have a plain rule set with simple A -> B or B <- A rules then you can indeed run the rules in either direction. However, if you have a hybrid rule set which mixes directions, and in particular uses forward rules to create instantiated backward rules, that won't work - those intrinsically need the hybrid engine.

However, I've run into a couple of issues:

1. If I setMode on the OWLMicroReasoner too FORWARD, I get the following
exception when I try to bind the reasoner to a graph:

org.apache.jena.reasoner.rulesys.BasicForwardRuleInfGraph cannot be cast to
org.apache.jena.reasoner.rulesys.FBRuleInfGraph

Due to the following line:(
https://github.com/apache/jena/blob/bfce1741cb12f9cf544235d32fba6598bc7341b5/jena-core/src/main/java/org/apache/jena/reasoner/rulesys/OWLMicroReasoner.java#L94
)

Yes the OWLMicroReasoner is intrinsically a hybrid ("FB") rule reasoner and that can't be changed.

2. If I use a GenericRuleReasoner, loaded with OWLMicro rules set to
FORWARD mode, I can bind the reasoner, but then I get the following
execution error at query execution time:

Forward reasoner does not support hybrid rules - [ (?x owl:intersectionOf
?y) -> (?x rdf:type owl:Class) ]

Which I don't understand because that does not seem like a backward rule.

As noted above the OWLMicro rules are hybrid and can't be run via a pure forward engine.

That particular example looks like a plain fact and forward mode should support that, possible a bug. However, those cases are the least of the your worries, it's true hybrid rules like:

[inverseOf2: (?P owl:inverseOf ?Q)
    -> table(?P), table(?Q), [inverseOf2b: (?X ?P ?Y) <- (?Y ?Q ?X)] ]

that simply have no pure forward equivalent.

So to sum up, I have two questions:

1. What would be your recommended approach to pushing materialization to
data creation time

I'm afraid there's no good support for this in jena.

If the rate of data updates is very low compared to the rate of queries then you could re-run the entire materialization from scratch each time the data changes. Unsubtle and slow at materialization time but queries would then be faster.

If the rate of data updates is high but your data fits in memory then use the in-memory reasoner and let it's (limited) incremental reasoning handle the changes.

But I'm afraid Jena has no support for incrementally updating inference results when the data is beyond memory limits and persisted to e.g. TDB.

2. How would you create forward rules reasoner that implements OWLmicro, or
closest to

You would need to write a custom pure-forward ruleset to implement the axioms you want, perhaps starting from etc/rdfs-noresource.rules and adding the relevant OWL axioms. Depending on which axioms you want performance may or may not be problematic and the pure forward engine will still hold all it's data in memory so that won't scale any better.

Dave

Reply via email to