Hi David,

On 04/05/16 00:49, Martin, David wrote:
Hi all,

I'm building a system that instantiates small, in-memory rule reasoners, and 
uses each reasoner very briefly to get a small set of inferences. It does this 
repeatedly and frequently. In other words, it instantiates many different 
reasoners over time, and uses each one to handle a distinct problem..

Each reasoner involves a default model of only a few dozen triples, used with a 
GenericRuleReasoner that loads 50-100 forward rules, created in a very normal 
way:

    Model contextData = FileManager.get().loadModel(contextPath, "N3");
    List rules = Rule.rulesFromURL("file:C:/test.rule");
    Reasoner reasoner = new GenericRuleReasoner(rules);
   reasoner.setParameter(ReasonerVocabulary.PROPruleMode, "forward");
    Infmodel inf = ModelFactory.createInfModel(reasoner, contextData);

Typically, the reasoning will add roughly 10 triples and sometimes will delete 
a few triples. I use about 10 calls to inf.listStatements to retrieve the 
inferred triples, and then I'm done with that reasoner.

These reasoning problems are all independent of one another, so no opportunity 
for reuse of the contextData . However, the GenericRuleReasoner is reused.

Naturally, since we want to support many and frequent reasoning processes like 
the above, we are concerned about the overhead (speed and memory, with speed 
the more important of the two). I don't really have any idea how much 
bookkeeping and infrastructure is instantiated with the creation of a model or 
Infmodel.  I would appreciate comments or pointers that may shed light on this.

I've read through the various comments on performance in 
https://jena.apache.org/documentation/inference, but they don't address my 
situation. I'm concerned primarily about the overhead associated with *creating 
new models and reasoners*, since my system does that so often.

The system wasn't really designed with this kind of use case in mind so there aren't many hooks to control this.

Roughly there's three chunks of work involved - parsing the rules, building the internal rule data structure and running the rules. You want to try to save the first two.

Reusing the GenericRuleReasoner instance will mean you don't reparse the rules each time, so that's easy.

The internal data structure is actually part of the InfGraph (or rather the engine instance associated with InfGraph). However, the reasoner implementation plays tricks by creating an internal dummy InfGraph which acts a cache of the built engine. So I *think* just reusing the same GenericRuleReasonser instance is also enough to reuse the engine data structure.

The overhead of creating a wrapping InfGraph is small so creating a new InfGraph each time should be fine.

It would be possible to avoid that small overhead by keeping one instance of the contextData model, one InfModel built over the top, injecting new data "behind the reasoner's back" by clearing the contextData model and adding the new data to it then call inf.rebuild() to tell the reasoner what you've done. My guess is that the savings doing that would be negligible but the only way to be sure is to measure them.

If performance is an issue for you then the other thing to consider is the overhead of working with Models v.s. Graphs. Internally Jena stores data as Triples in Graphs both of which provide minimal interfaces. The more convenient interface of Statements and Models is implemented by creating wrapper objects. So the reasoners work at the Triple level but if you use an InfModel and do listStatements a new Statement object will be created for each Triple. In practice object creation (or rather, the associated GC overhead) in modern java is so good that the cost of this is highly likely to be trivial compared the cost of doing any reasoning in the first place. However, again this is something you could measure.

Dave

Reply via email to