Dave,
Thanks a bunch! Forward-only reasoning was indeed just what I needed.
// make forward-only reasoner, attach it to ontology data
Reasoner r = new RDFSForwardRuleReasoner(new RDFSRuleReasonerFactory());
InfModel fullModel = ModelFactory.createInfModel(r, ontoModel);
fullModel.prepare();
// add new data
fullModel.add( [model from data string] );
fullModel.prepare();
Model subModel =
ResourceUtils.reachableClosure(fullModel.getResource("some:id:uri"));
// create/ exectue query
QueryExecution qexec = QueryExecutionFactory.create(queryString, subModel);
ResultSet results = qexec.execSelect();
Rough timing:
--Forward reasoning method
add new data: ~25 ms
prepare(): < 1ms
<no copy>
extract closure: ~5 ms
create/execute query: 150 ms
total: ~180ms
I will tell you also that these numbers come from a development machine.
The target machine is a small Android device where the speed up was ~12x.
Thanks again!
Andy
On Thu, Jan 25, 2018 at 2:24 PM, Dave Reynolds <[email protected]>
wrote:
>
>
> On 25/01/18 17:02, Andrew Hunter wrote:
>
>> Dave,
>>
>> Some further notes on strategies I have tried and their outcomes:
>>
>> // 1. "Copy-then-extract" method
>> fullModel.add( [[model from data string]] );
>> fullModel.prepare();
>> Model copyModel = ModelFactory.createDefaultModel();
>> copyModel.add(fullModel);
>> Model subModel = ResourceUtils.reachableClosure
>> (copyModel.getResource("some:
>> root:data:id:uri"));
>> QueryExection qe = QueryExecutionFactory.create(queryString, subModel);
>> ResultSet rs = qe.execSelect();
>>
>> // 2. "Direct-extract" method
>> fullModel.add( [[model from data string]] );
>> fullModel.prepare();
>> Model subModel = ResourceUtils.reachableClosure
>> (fullModel.getResource("some:
>> root:data:id:uri"));
>> QueryExection qe = QueryExecutionFactory.create(queryString, subModel);
>> ResultSet rs = qe.execSelect();
>>
>> // 3. "OntModel-only" method
>> fullModel.add( [[model from data string]] );
>> fullModel.prepare();
>> QueryExection qe = QueryExecutionFactory.create(queryString, fullModel);
>> ResultSet rs = qe.execSelect();
>>
>> Rough timing:
>>
>> 1. Copy-then-extract method
>> add new data: ~15 ms
>> prepare(): ~ 1ms
>> make copy: ~450 ms
>> extract closure: ~3 ms
>> create/execute query: 150 ms
>> total: ~600ms
>>
>> 2. Direct-extract method
>> add new data: ~15 ms
>> prepare(): ~ 1ms
>> <no copy made>
>> extract closure: ~4250 ms
>> create/execute query: 150 ms
>> total: ~4400 ms
>>
>> 3. OntModel-only method
>> add new data: ~15 ms
>> prepare(): ~ 1ms
>> <no closure or copy>
>> create/execute query: 1400 ms
>> total: ~1400 ms
>>
>> Clearly, method 1 is the best. It does seem likely that, as you say,
>> statement production is mostly done at query/copy/closure time.
>>
>
> It's a trade-off. The default rule sets are hybrid rules. The forward
> rules run (at prepare time) to generate some inferences and to generate
> ontology-specific backward rules. The backward rules are run in response to
> particular queries and by default don't cache (table) many results.
>
> Often the fraction of a model touched in queries is a lot less than the
> whole so that the hybrid approach is a reasonable trade-off.
>
> A whole-model copy starts with a simple goal (?s ?p ?o) so it has to find
> all results but in a simple context.
>
> Seems like the ReachableClosure code is making a lot of queries and some
> redundant work is being done by the backward rules trying to re-satisfy a
> lot of overlapping query goals. I'm not completely surprised there's a
> difference but a bit surprised it's so big. Suggests your reachable closure
> is actually a large fraction of the model. It may be the ReachableClosure
> code could be improved for this sort of use case, don't know.
>
> This still
>> begs the question: is there a way to perform ahead of time and
>> once-for-all
>> inference just on the ontology and make the increments as new data comes
>> in, but be able to query it as if it were a non-reasoning (but fully
>> populated) model.
>>
>
> As I say, that's what a forward only rule set should give you.
>
> Dave
>
>
> On Thu, Jan 25, 2018 at 7:53 AM, Andrew Hunter <[email protected]>
>> wrote:
>>
>> Dave, thanks a bunch for the notes. I thought, likely mistakenly, that the
>>> prepare() call would produce the inferred statements. As in the sketch my
>>> code does indeed copy the entire model. It turns out I did try it in the
>>> other order as you suggest, and what I found is that it is *much* faster
>>> to
>>> prepare the fullModel, make a copy, and then extract the closure from the
>>> non-inferring copy. Making the closure directly from the inferring model
>>> is
>>> at least an order of magnitude slower. I will look into the forward-only
>>> rule set... Thanks! --Andy
>>>
>>> On Thu, Jan 25, 2018 at 3:52 AM, Dave Reynolds <
>>> [email protected]>
>>> wrote:
>>>
>>> It sounds like you want the inferences for just part of a graph and then
>>>> be able query over those. If you just want to query the base model
>>>> without
>>>> inferred statements then, as Lorenz says, use getBaseModel.
>>>>
>>>> The time doing the copy is not the copy as such but the inferences to
>>>> create the statements that are then copied, if those are what you want
>>>> then
>>>> there's no getting away from it.
>>>>
>>>> However, your sketch code seems to copy the entire model and then
>>>> extract
>>>> a closure which you'll query over. So all the work is done in the copy
>>>> of
>>>> the entire model, creating the submodel closure doesn't seem very
>>>> useful.
>>>> Try it in the other order. I.e. run your resource closure over the
>>>> inference model copying the results into the sub model, avoiding the
>>>> full
>>>> copy.
>>>>
>>>> Your other alternative would be to use alternative inference rules.
>>>> Depending on what you want you maybe able to use a pure forward rule
>>>> set in
>>>> which case you wouldn't need the copy at all - the results would all be
>>>> materialized for you, ready to query efficiently. Your sketchcode shows
>>>> only adds, not deletes, in that case the forward inference would be
>>>> incremental and not have to start over.
>>>>
>>>> Dave
>>>>
>>>>
>>>> On 25/01/18 05:28, Andrew Hunter wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I have an OntModel using simple RDF reasoning, but I have found that
>>>>> preparing the model and then copying it into a default, non-reasoning
>>>>> model
>>>>> is much faster for querying. My immediate question: Is there an
>>>>> efficient
>>>>> way of copying an OntModel into a default, non-reasoning model without
>>>>> copying all statements?
>>>>>
>>>>> Some more details on my motivation, with a sketch of the code found
>>>>> below:
>>>>> My OntModel is held in memory and is originally constructed from a set
>>>>> of
>>>>> OWL files. I am matching a stream of data items, each with a small set
>>>>> of
>>>>> RDF triples attached, against a few long standing queries.
>>>>> Conceptually,
>>>>> I
>>>>> wish to insert a data item into the OntModel and then execute the
>>>>> queries
>>>>> to see if that one item matches, then remove it from the model and wait
>>>>> for
>>>>> the next item. Since there is only one item of interest at a time, I
>>>>> also
>>>>> have the option of extracting a sub-model rooted at a resource
>>>>> identifying
>>>>> the item, and using this much smaller model to query.
>>>>>
>>>>> What I have found is that by far the fastest way to complete query
>>>>> execution is to perform all the reasoning on the ontology and data in
>>>>> the
>>>>> OntModel, and then to make a copy of all its statements into a default
>>>>> model, and query over the default model. Since I have several queries,
>>>>> I
>>>>> actually extract a relevant sub-model using reachableClosure from
>>>>> ResourceUtils and use this "sub" model. Both the querying and the
>>>>> closure
>>>>> extraction are dramatically faster with the default model copy over the
>>>>> OntModel (as I'm sure you all know).
>>>>>
>>>>> However, making the copy itself now takes 95+% of the total item
>>>>> matching
>>>>> time. Hence my question: is there a faster way to make a copy, or
>>>>> reference
>>>>> the statements in an OntModel as if they were part of a default model?
>>>>> Alternatively, is there a way of "turning off" the inference functions
>>>>> when
>>>>> querying (I see no API for this)? My limited understanding of the
>>>>> InfModel
>>>>> and InfGraph indicates the statements are actually held in multiple
>>>>> graphs
>>>>> (e.g., the base graph and the deductions graph) so it probably can't be
>>>>> as
>>>>> easy as "get a reference to the list of statements/triples".
>>>>>
>>>>> I also tried implementing ModelChangedListener and GraphListener to
>>>>> extract
>>>>> the statements being added by the reasoner and add those to the default
>>>>> model, but these seem to only callback when new data is added from
>>>>> outside.
>>>>> The idea was that I could add the large set of ontology-based inference
>>>>> once for all at startup and just make small deltas based on new
>>>>> inferences
>>>>> from the incoming data, but these small deltas were not reported to me.
>>>>>
>>>>> I would be most grateful for any ideas or pointers. In any case, best
>>>>> of
>>>>> luck and thanks!
>>>>>
>>>>> Andy
>>>>>
>>>>>
>>>>> A sketch of my current code:
>>>>>
>>>>> // Block executed once at startup
>>>>> OntModel fullModel = ModelFactory.createOntologyModel(
>>>>> OntModelSpec.OWL_LITE_MEM_RDFS_INF);
>>>>> // load ontology files into fullModel with something like
>>>>> for (each owl file) { fullModel.add( ... ); }
>>>>> // perform inference over onotlogy
>>>>> fullModel.prepare();
>>>>> // end startup block
>>>>>
>>>>> ...
>>>>>
>>>>> // new data arrives
>>>>> fullModel.add( [[model from data string]] );
>>>>> // perform inference on new data + ontology
>>>>> fullModel.prepare();
>>>>> // make a copy into a default model
>>>>> Model copyModel = ModelFactory.createDefaultModel();
>>>>> copyModel.add(fullModel);
>>>>> // extract relevant sub-model
>>>>> Model subModel = ResourceUtils.reachableClosure(copyModel
>>>>> .getResource("some:root:data:id:uri"));
>>>>>
>>>>> // query uses relevant default model
>>>>> QueryExection qe = QueryExecutionFactory.create(queryString,
>>>>> subModel);
>>>>>
>>>>> // execSelect, process results, cleanup, wait for new data, etc.
>>>>>
>>>>>
>>>>>
>>>
>>