Re: Are there any guidelines available for how to write high performance rules for reasoner?

Dave Reynolds Sun, 17 Jul 2016 05:06:41 -0700

Hi,

On 12/07/16 05:13, Niels Andersen wrote:

Dear Jena user community,


We are using the general purpose rule engine to be able to only specify the 
rules that we need in our model. We have read and are familiar with 
https://jena.apache.org/documentation/inference/

Is there a document that describes the guidelines for creating high performance 
rules in Jena?


Not that I'm aware of.

Specifically we are interested in:

*         What is it that makes inferencing costly? Is it the time it takes to 
run the query (on a large model with millions of triples) or the time it takes 
to generate new triples?

Rules are expensive generally because they can easily interact to createa very large space of possible results. It's quite easy to get anexponential growth.


For forward rules then you are materializing the whole set of results.

The forward engine works by keeping track of all the partially matchedrules, so that when a new triple is deduced you drop that triple in thenetwork and see which rules can now fire as a result. Thus you avoidrunning the full rule antecedents as queries each time but have the costof keeping and searching a lot of state representing the partiallymatched rules. So I would guess that cost dominates over the simplegeneration of the result triples and there isn't much query, as such,going on.

Whereas the backward rules are essentially exploring the search spaceand each step in that involves issuing new (triple pattern) queries andkeeping track of the how far you are through the search space. So againI expect the state tracking dominates over the generating the finaltriples but now there are a lot of queries going on.

o   Examples of good and bad rules

I'm not sure that's answerable in the abstract, it depends on what yourrules are supposed to do.

o   What type of performance should we expect?

*         How to use backwards and hybrid rules (including specific examples). 
We cannot seem to get the rule to trigger when querying through the Jena API.

If you want some non-trivial examples then look at the rule sets forOWL/RDFS reasoning (in resources/etc/). For example rdfs.rules is abrute force expression of (most of) RDFS and can be run in eitherforward or backward mode, whereas rdfs-fb.rules is the same but usinghybrid rules to achieve a different performance trade-off.

[The real rules used are the rdfs-fb-tgc-*.rules where "tgc" means thatthey assume use of transitive reasoner (transitive graph closure).]

*         Is it possible to implement asynchronous rule execution so that the 
reasoner does not hold up the triple store when the reasoner is triggered on 
the next select statement?

For backward rules then you could create a InfGraph for each query(assuming each query arrives on a different thread). Though not helpfulif you use any tabling.

For forward rules then the question doesn't make that much sense, oncethe forward engine has run then each further query won't trigger anymore reasoning until the data changes.

*         What type of hardware configuration is recommended for fast reasoner 
execution?


Fast (!) and lots of memory.

What can be done to increase the parallelism of the reasoner?


Rewrite it from scratch!

*         How do we create a persistent inferred model that can still be 
processed using the reasoner (i.e. persisted triples can be deleted by the 
reasoner at a later stage)? Right now the reasoner runs every time we start the 
system, it could take minutes to infer all the triples.

If your data doesn't change much and you are using forward inferencethen use the reasoner "off line". Run it once, in memory, take all theresults and put them in a graph in the persistent store.


If the data changes then schedule a new rebuild.

The current Jena rule engines are really not that well adapted to lifewith a persistent store.


Dave

Re: Are there any guidelines available for how to write high performance rules for reasoner?

Reply via email to