Hi Andy!
Thanks a lot for looking into this and your very clear explanation.
01.11.2016, 21:12, Andy Seaborne wrote:
It has always been inside-out then optimized to use stream based index
joins.
However in this case "inside out" is confusing because the query has a
double negation of FILTER NOT EXIST.
Right. Let me explain why the query does this.
The query (or actually a longer variant thereof) is used in the Skosmos
application to generate a list of search results. At the time this query
is used the application already has a (possibly long) list of the SKOS
concept URIs it wants to display. This query is used to look up
additional information about those concepts.
Instead of performing a separate query for each concept URI, a single
query with a VALUES block containing up to 20 concept URIs is used. So
in this case, the VALUES is used thinking of it as a sort of for-each
loop. A single query for 20 concepts is faster than performing 20
queries for individual concepts - at least it used to be.
The reason for the double negation is this:
For each concept X, we want to display the SKOS Collections in which
concept X is a member, but only if the collection consists only of
siblings of X (i.e. sharing at least one broader concept with X).
In practice, this has to be turned around into a double negation: for
each collection, check that it doesn't include a concept that doesn't
share any parent with X.
It's possible that a MINUS expression could be used instead of FILTER
NOT EXISTS and perform better. I will have to test this. But other than
switching to MINUS, I can't think of any way to express this constraint
on collections without using some kind of double negation.
At 3.1.1 (JENA-1171), EXISTS are analysed whereas previous they were
skipped which could lead to wrong answers.
Osma - could you please try putting the VALUES in each arm of the UNION
which gets you to something like the first example.
I will try this as well.
It can be pushed in because:
join(A, union(B,C)) == union(join(A,B), join(A,C))
now if A is an complex expression, that is a bad idea (probably).
If A is a small VALUES block then it makes sense. It isn't done though.
Ok. So a potential future optimization perhaps.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi