Hi Andrew,
Great, glad it's working for you. Hopefully it's giving you the data you
want as well :)
Dave
On 26/01/18 16:01, Andrew Hunter wrote:
Dave,
Thanks a bunch! Forward-only reasoning was indeed just what I needed.
// make forward-only reasoner, attach it to ontology data
Reasoner r = new RDFSForwardRuleReasoner(new RDFSRuleReasonerFactory());
InfModel fullModel = ModelFactory.createInfModel(r, ontoModel);
fullModel.prepare();
// add new data
fullModel.add( [model from data string] );
fullModel.prepare();
Model subModel =
ResourceUtils.reachableClosure(fullModel.getResource("some:id:uri"));
// create/ exectue query
QueryExecution qexec = QueryExecutionFactory.create(queryString, subModel);
ResultSet results = qexec.execSelect();
Rough timing:
--Forward reasoning method
add new data: ~25 ms
prepare(): < 1ms
<no copy>
extract closure: ~5 ms
create/execute query: 150 ms
total: ~180ms
I will tell you also that these numbers come from a development machine.
The target machine is a small Android device where the speed up was ~12x.
Thanks again!
Andy
On Thu, Jan 25, 2018 at 2:24 PM, Dave Reynolds <[email protected]>
wrote:
On 25/01/18 17:02, Andrew Hunter wrote:
Dave,
Some further notes on strategies I have tried and their outcomes:
// 1. "Copy-then-extract" method
fullModel.add( [[model from data string]] );
fullModel.prepare();
Model copyModel = ModelFactory.createDefaultModel();
copyModel.add(fullModel);
Model subModel = ResourceUtils.reachableClosure
(copyModel.getResource("some:
root:data:id:uri"));
QueryExection qe = QueryExecutionFactory.create(queryString, subModel);
ResultSet rs = qe.execSelect();
// 2. "Direct-extract" method
fullModel.add( [[model from data string]] );
fullModel.prepare();
Model subModel = ResourceUtils.reachableClosure
(fullModel.getResource("some:
root:data:id:uri"));
QueryExection qe = QueryExecutionFactory.create(queryString, subModel);
ResultSet rs = qe.execSelect();
// 3. "OntModel-only" method
fullModel.add( [[model from data string]] );
fullModel.prepare();
QueryExection qe = QueryExecutionFactory.create(queryString, fullModel);
ResultSet rs = qe.execSelect();
Rough timing:
1. Copy-then-extract method
add new data: ~15 ms
prepare(): ~ 1ms
make copy: ~450 ms
extract closure: ~3 ms
create/execute query: 150 ms
total: ~600ms
2. Direct-extract method
add new data: ~15 ms
prepare(): ~ 1ms
<no copy made>
extract closure: ~4250 ms
create/execute query: 150 ms
total: ~4400 ms
3. OntModel-only method
add new data: ~15 ms
prepare(): ~ 1ms
<no closure or copy>
create/execute query: 1400 ms
total: ~1400 ms
Clearly, method 1 is the best. It does seem likely that, as you say,
statement production is mostly done at query/copy/closure time.
It's a trade-off. The default rule sets are hybrid rules. The forward
rules run (at prepare time) to generate some inferences and to generate
ontology-specific backward rules. The backward rules are run in response to
particular queries and by default don't cache (table) many results.
Often the fraction of a model touched in queries is a lot less than the
whole so that the hybrid approach is a reasonable trade-off.
A whole-model copy starts with a simple goal (?s ?p ?o) so it has to find
all results but in a simple context.
Seems like the ReachableClosure code is making a lot of queries and some
redundant work is being done by the backward rules trying to re-satisfy a
lot of overlapping query goals. I'm not completely surprised there's a
difference but a bit surprised it's so big. Suggests your reachable closure
is actually a large fraction of the model. It may be the ReachableClosure
code could be improved for this sort of use case, don't know.
This still
begs the question: is there a way to perform ahead of time and
once-for-all
inference just on the ontology and make the increments as new data comes
in, but be able to query it as if it were a non-reasoning (but fully
populated) model.
As I say, that's what a forward only rule set should give you.
Dave
On Thu, Jan 25, 2018 at 7:53 AM, Andrew Hunter <[email protected]>
wrote:
Dave, thanks a bunch for the notes. I thought, likely mistakenly, that the
prepare() call would produce the inferred statements. As in the sketch my
code does indeed copy the entire model. It turns out I did try it in the
other order as you suggest, and what I found is that it is *much* faster
to
prepare the fullModel, make a copy, and then extract the closure from the
non-inferring copy. Making the closure directly from the inferring model
is
at least an order of magnitude slower. I will look into the forward-only
rule set... Thanks! --Andy
On Thu, Jan 25, 2018 at 3:52 AM, Dave Reynolds <
[email protected]>
wrote:
It sounds like you want the inferences for just part of a graph and then
be able query over those. If you just want to query the base model
without
inferred statements then, as Lorenz says, use getBaseModel.
The time doing the copy is not the copy as such but the inferences to
create the statements that are then copied, if those are what you want
then
there's no getting away from it.
However, your sketch code seems to copy the entire model and then
extract
a closure which you'll query over. So all the work is done in the copy
of
the entire model, creating the submodel closure doesn't seem very
useful.
Try it in the other order. I.e. run your resource closure over the
inference model copying the results into the sub model, avoiding the
full
copy.
Your other alternative would be to use alternative inference rules.
Depending on what you want you maybe able to use a pure forward rule
set in
which case you wouldn't need the copy at all - the results would all be
materialized for you, ready to query efficiently. Your sketchcode shows
only adds, not deletes, in that case the forward inference would be
incremental and not have to start over.
Dave
On 25/01/18 05:28, Andrew Hunter wrote:
Hi,
I have an OntModel using simple RDF reasoning, but I have found that
preparing the model and then copying it into a default, non-reasoning
model
is much faster for querying. My immediate question: Is there an
efficient
way of copying an OntModel into a default, non-reasoning model without
copying all statements?
Some more details on my motivation, with a sketch of the code found
below:
My OntModel is held in memory and is originally constructed from a set
of
OWL files. I am matching a stream of data items, each with a small set
of
RDF triples attached, against a few long standing queries.
Conceptually,
I
wish to insert a data item into the OntModel and then execute the
queries
to see if that one item matches, then remove it from the model and wait
for
the next item. Since there is only one item of interest at a time, I
also
have the option of extracting a sub-model rooted at a resource
identifying
the item, and using this much smaller model to query.
What I have found is that by far the fastest way to complete query
execution is to perform all the reasoning on the ontology and data in
the
OntModel, and then to make a copy of all its statements into a default
model, and query over the default model. Since I have several queries,
I
actually extract a relevant sub-model using reachableClosure from
ResourceUtils and use this "sub" model. Both the querying and the
closure
extraction are dramatically faster with the default model copy over the
OntModel (as I'm sure you all know).
However, making the copy itself now takes 95+% of the total item
matching
time. Hence my question: is there a faster way to make a copy, or
reference
the statements in an OntModel as if they were part of a default model?
Alternatively, is there a way of "turning off" the inference functions
when
querying (I see no API for this)? My limited understanding of the
InfModel
and InfGraph indicates the statements are actually held in multiple
graphs
(e.g., the base graph and the deductions graph) so it probably can't be
as
easy as "get a reference to the list of statements/triples".
I also tried implementing ModelChangedListener and GraphListener to
extract
the statements being added by the reasoner and add those to the default
model, but these seem to only callback when new data is added from
outside.
The idea was that I could add the large set of ontology-based inference
once for all at startup and just make small deltas based on new
inferences
from the incoming data, but these small deltas were not reported to me.
I would be most grateful for any ideas or pointers. In any case, best
of
luck and thanks!
Andy
A sketch of my current code:
// Block executed once at startup
OntModel fullModel = ModelFactory.createOntologyModel(
OntModelSpec.OWL_LITE_MEM_RDFS_INF);
// load ontology files into fullModel with something like
for (each owl file) { fullModel.add( ... ); }
// perform inference over onotlogy
fullModel.prepare();
// end startup block
...
// new data arrives
fullModel.add( [[model from data string]] );
// perform inference on new data + ontology
fullModel.prepare();
// make a copy into a default model
Model copyModel = ModelFactory.createDefaultModel();
copyModel.add(fullModel);
// extract relevant sub-model
Model subModel = ResourceUtils.reachableClosure(copyModel
.getResource("some:root:data:id:uri"));
// query uses relevant default model
QueryExection qe = QueryExecutionFactory.create(queryString,
subModel);
// execSelect, process results, cleanup, wait for new data, etc.