Hi,

On 12/07/16 05:13, Niels Andersen wrote:
Dear Jena user community,

We are using the general purpose rule engine to be able to only specify the 
rules that we need in our model. We have read and are familiar with 
https://jena.apache.org/documentation/inference/

Is there a document that describes the guidelines for creating high performance 
rules in Jena?

Not that I'm aware of.

Specifically we are interested in:

*         What is it that makes inferencing costly? Is it the time it takes to 
run the query (on a large model with millions of triples) or the time it takes 
to generate new triples?

Rules are expensive generally because they can easily interact to create a very large space of possible results. It's quite easy to get an exponential growth.

For forward rules then you are materializing the whole set of results.
The forward engine works by keeping track of all the partially matched rules, so that when a new triple is deduced you drop that triple in the network and see which rules can now fire as a result. Thus you avoid running the full rule antecedents as queries each time but have the cost of keeping and searching a lot of state representing the partially matched rules. So I would guess that cost dominates over the simple generation of the result triples and there isn't much query, as such, going on.

Whereas the backward rules are essentially exploring the search space and each step in that involves issuing new (triple pattern) queries and keeping track of the how far you are through the search space. So again I expect the state tracking dominates over the generating the final triples but now there are a lot of queries going on.

o   Examples of good and bad rules

I'm not sure that's answerable in the abstract, it depends on what your rules are supposed to do.

o   What type of performance should we expect?

*         How to use backwards and hybrid rules (including specific examples). 
We cannot seem to get the rule to trigger when querying through the Jena API.

If you want some non-trivial examples then look at the rule sets for OWL/RDFS reasoning (in resources/etc/). For example rdfs.rules is a brute force expression of (most of) RDFS and can be run in either forward or backward mode, whereas rdfs-fb.rules is the same but using hybrid rules to achieve a different performance trade-off.

[The real rules used are the rdfs-fb-tgc-*.rules where "tgc" means that they assume use of transitive reasoner (transitive graph closure).]

*         Is it possible to implement asynchronous rule execution so that the 
reasoner does not hold up the triple store when the reasoner is triggered on 
the next select statement?

For backward rules then you could create a InfGraph for each query (assuming each query arrives on a different thread). Though not helpful if you use any tabling.

For forward rules then the question doesn't make that much sense, once the forward engine has run then each further query won't trigger any more reasoning until the data changes.

*         What type of hardware configuration is recommended for fast reasoner 
execution?

Fast (!) and lots of memory.

What can be done to increase the parallelism of the reasoner?

Rewrite it from scratch!

*         How do we create a persistent inferred model that can still be 
processed using the reasoner (i.e. persisted triples can be deleted by the 
reasoner at a later stage)? Right now the reasoner runs every time we start the 
system, it could take minutes to infer all the triples.

If your data doesn't change much and you are using forward inference then use the reasoner "off line". Run it once, in memory, take all the results and put them in a graph in the persistent store.

If the data changes then schedule a new rebuild.

The current Jena rule engines are really not that well adapted to life with a persistent store.

Dave


Reply via email to