Re: fast model copy

Andrew Hunter Thu, 25 Jan 2018 09:02:31 -0800

Dave,

Some further notes on strategies I have tried and their outcomes:


// 1. "Copy-then-extract" method
fullModel.add( [[model from data string]] );
fullModel.prepare();
Model copyModel = ModelFactory.createDefaultModel();
copyModel.add(fullModel);
Model subModel = ResourceUtils.reachableClosure(copyModel.getResource("some:
root:data:id:uri"));
QueryExection qe = QueryExecutionFactory.create(queryString, subModel);
ResultSet rs = qe.execSelect();

// 2. "Direct-extract" method
fullModel.add( [[model from data string]] );
fullModel.prepare();
Model subModel = ResourceUtils.reachableClosure(fullModel.getResource("some:
root:data:id:uri"));
QueryExection qe = QueryExecutionFactory.create(queryString, subModel);
ResultSet rs = qe.execSelect();

// 3. "OntModel-only" method
fullModel.add( [[model from data string]] );
fullModel.prepare();
QueryExection qe = QueryExecutionFactory.create(queryString, fullModel);
ResultSet rs = qe.execSelect();

Rough timing:

1. Copy-then-extract method
add new data: ~15 ms
prepare(): ~ 1ms
make copy: ~450 ms
extract closure: ~3 ms
create/execute query: 150 ms
total: ~600ms

2. Direct-extract method
add new data: ~15 ms
prepare(): ~ 1ms
<no copy made>
extract closure: ~4250 ms
create/execute query: 150 ms
total: ~4400 ms

3. OntModel-only method
add new data: ~15 ms
prepare(): ~ 1ms
<no closure or copy>
create/execute query: 1400 ms
total: ~1400 ms

Clearly, method 1 is the best. It does seem likely that, as you say,
statement production is mostly done at query/copy/closure time. This still
begs the question: is there a way to perform ahead of time and once-for-all
inference just on the ontology and make the increments as new data comes
in, but be able to query it as if it were a non-reasoning (but fully
populated) model.

Andy


On Thu, Jan 25, 2018 at 7:53 AM, Andrew Hunter <[email protected]>
wrote:

> Dave, thanks a bunch for the notes. I thought, likely mistakenly, that the
> prepare() call would produce the inferred statements. As in the sketch my
> code does indeed copy the entire model. It turns out I did try it in the
> other order as you suggest, and what I found is that it is *much* faster to
> prepare the fullModel, make a copy, and then extract the closure from the
> non-inferring copy. Making the closure directly from the inferring model is
> at least an order of magnitude slower. I will look into the forward-only
> rule set... Thanks! --Andy
>
> On Thu, Jan 25, 2018 at 3:52 AM, Dave Reynolds <[email protected]>
> wrote:
>
>> It sounds like you want the inferences for just part of a graph and then
>> be able query over those. If you just want to query the base model without
>> inferred statements then, as Lorenz says, use getBaseModel.
>>
>> The time doing the copy is not the copy as such but the inferences to
>> create the statements that are then copied, if those are what you want then
>> there's no getting away from it.
>>
>> However, your sketch code seems to copy the entire model and then extract
>> a closure which you'll query over. So all the work is done in the copy of
>> the entire model, creating the submodel closure doesn't seem very useful.
>> Try it in the other order. I.e. run your resource closure over the
>> inference model copying the results into the sub model, avoiding the full
>> copy.
>>
>> Your other alternative would be to use alternative inference rules.
>> Depending on what you want you maybe able to use a pure forward rule set in
>> which case you wouldn't need the copy at all - the results would all be
>> materialized for you, ready to query efficiently. Your sketchcode shows
>> only adds, not deletes, in that case the forward inference would be
>> incremental and not have to start over.
>>
>> Dave
>>
>>
>> On 25/01/18 05:28, Andrew Hunter wrote:
>>
>>> Hi,
>>>
>>> I have an OntModel using simple RDF reasoning, but I have found that
>>> preparing the model and then copying it into a default, non-reasoning
>>> model
>>> is much faster for querying. My immediate question: Is there an efficient
>>> way of copying an OntModel into a default, non-reasoning model without
>>> copying all statements?
>>>
>>> Some more details on my motivation, with a sketch of the code found
>>> below:
>>> My OntModel is held in memory and is originally constructed from a set of
>>> OWL files. I am matching a stream of data items, each with a small set of
>>> RDF triples attached, against a few long standing queries. Conceptually,
>>> I
>>> wish to insert a data item into the OntModel and then execute the queries
>>> to see if that one item matches, then remove it from the model and wait
>>> for
>>> the next item. Since there is only one item of interest at a time, I also
>>> have the option of extracting a sub-model rooted at a resource
>>> identifying
>>> the item, and using this much smaller model to query.
>>>
>>> What I have found is that by far the fastest way to complete query
>>> execution is to perform all the reasoning on the ontology and data in the
>>> OntModel, and then to make a copy of all its statements into a default
>>> model, and query over the default model. Since I have several queries, I
>>> actually extract a relevant sub-model using reachableClosure from
>>> ResourceUtils and use this "sub" model. Both the querying and the closure
>>> extraction are dramatically faster with the default model copy over the
>>> OntModel (as I'm sure you all know).
>>>
>>> However, making the copy itself now takes 95+% of the total item matching
>>> time. Hence my question: is there a faster way to make a copy, or
>>> reference
>>> the statements in an OntModel as if they were part of a default model?
>>> Alternatively, is there a way of "turning off" the inference functions
>>> when
>>> querying (I see no API for this)? My limited understanding of the
>>> InfModel
>>> and InfGraph indicates the statements are actually held in multiple
>>> graphs
>>> (e.g., the base graph and the deductions graph) so it probably can't be
>>> as
>>> easy as "get a reference to the list of statements/triples".
>>>
>>> I also tried implementing ModelChangedListener and GraphListener to
>>> extract
>>> the statements being added by the reasoner and add those to the default
>>> model, but these seem to only callback when new data is added from
>>> outside.
>>> The idea was that I could add the large set of ontology-based inference
>>> once for all at startup and just make small deltas based on new
>>> inferences
>>> from the incoming data, but these small deltas were not reported to me.
>>>
>>> I would be most grateful for any ideas or pointers. In any case, best of
>>> luck and thanks!
>>>
>>> Andy
>>>
>>>
>>> A sketch of my current code:
>>>
>>> // Block executed once at startup
>>> OntModel fullModel = ModelFactory.createOntologyModel(
>>> OntModelSpec.OWL_LITE_MEM_RDFS_INF);
>>> // load ontology files into fullModel with something like
>>> for (each owl file) { fullModel.add( ... ); }
>>> // perform inference over onotlogy
>>> fullModel.prepare();
>>> // end startup block
>>>
>>> ...
>>>
>>> // new data arrives
>>> fullModel.add( [[model from data string]] );
>>> // perform inference on new data + ontology
>>> fullModel.prepare();
>>> // make a copy into a default model
>>> Model copyModel = ModelFactory.createDefaultModel();
>>> copyModel.add(fullModel);
>>> // extract relevant sub-model
>>> Model subModel = ResourceUtils.reachableClosure(copyModel
>>> .getResource("some:root:data:id:uri"));
>>>
>>> // query uses relevant default model
>>> QueryExection qe = QueryExecutionFactory.create(queryString, subModel);
>>>
>>> // execSelect, process results, cleanup, wait for new data, etc.
>>>
>>>
>

Re: fast model copy

Reply via email to