On 24/02/2020 13:55, Steve Vestal wrote:
Responses and questions inserted...
On 2/24/2020 3:02 AM, Dave Reynolds wrote:
On 23/02/2020 23:11, Steve Vestal wrote:
If I comment out the FILTER clause that prevents variable aliasing, the
query is processed almost immediately. The number of rows goes from 192
to 576, but it's fast.
Interesting. That does suggest it might actually be Sparql rather than
inference that's the bottleneck. The materialization experiment will
be a test of that.
Have you done that test?
I earlier iterated over statements. To make sure that I fully
materialize all possible entailments, do I need to query for ?s ?p ?o?
Any suggestions on the most efficient way to do this materialization?
As has been said several times now - copy the whole model to a plain
in-memory model and query that.
Querying all statements in the inf model will not, itself, fill in all
the different goal patterns for the backward chainer tables. Hence the
suggestion to materialize to a separate model not just "warm up the
backward rule caches" in an inf model.
Though looking at your query I wonder if you need inference at all -
we can't see your data to be sure since the list doesn't allow
attachments.
Have you tried without any inference? Do you know what inference you
are relying on?
I tried the following.
OntModelSpec.OWL_DL_MEM_RULE_INF
OntModelSpec.OWL_MEM_RULE_INF
OntModelSpec.OWL_LITE_MEM_TRANS_INF
OntModelSpec.OWL_LITE_MEM_RULES_INF
OntModelSpec.OWL_MEM_RDFS_INF
OntModelSpec.OWL_MEM_MICRO_RULE_INF
OntModelSpec.OWL_MEM
I do need some reasoning, minimally chasing through some shallow type
hierarchies and transitive properties/predicates/roles.
Sounds like the sort of inference that can be done with some custom
forward rules (which would eliminate all this backward chainer caching)
or some SPARQL construct queries.
What is the proper way to write a query when you
want a particular set of variables to have distinct solution values?
Not sure there is a better way in general. However, I wonder if you
can partition your query into subgroups, filter within the groups then
do a simpler join on the results. That might reduce the combinatorics.
I had earlier thought briefly about coming up with a more general
pre-fetch query that would collect a set of asserted triples guaranteed
to include triples of possible interest into a separate model of
(hopefully) much smaller size, and then running my sequence of
queries-with-reasoning on that. Has this sort of thing been done
successfully? What gave me pause is that some triples derived from
query results will need to be added back into the original model, and
I'm not sure how blank nodes would play into that. But pre-fetch models
in practice would likely not be smaller than this test case model.
Sounds like a different notion from that which I was suggesting. I was
suggesting partitioning the query to reduce the combinatorics not
partitioning the data. May or may not be possible/relevant.
In one or two earlier postings, mention was made of Pellet as being more
efficient and complete in some cases. My impression is that a Pellet
reasoner is not bundled with Jena, and I would have to find and install
one myself (although the Protege wiki mentions one is available in
Jena). Is that correct? A general web search turned up a number of
sources, e.g., openpellet, mindswap, stardog. Does anyone have any
recommendations and a link to a site that has the master version
compatible with Jena 3 and having a reasonably clear and smooth
install? Are any of the other OWL reasoners out there packaged for use
with Jena?
Pellet is a full DL reasoner and so both complete and, for many
challenging cases, higher performance. You are correct it is not part of
Jena. I believe OpenPellet is the right version to look at but I've no
direct experience with it, not something the Jena team can support.
That's the only only third partly open source jena reasoner I'm aware of.
However, your first job is to check if it really is the reasoner or the
query that's the bottleneck by doing the materialize-to-plain-model
test. You could probably even do that with test data that doesn't need
any inference.
Dave