Hi Andy!

Thanks a lot for looking into this and your very clear explanation.

01.11.2016, 21:12, Andy Seaborne wrote:
It has always been inside-out then optimized to use stream based index
joins.

However in this case "inside out" is confusing because the query has a
double negation of FILTER NOT EXIST.

Right. Let me explain why the query does this.

The query (or actually a longer variant thereof) is used in the Skosmos application to generate a list of search results. At the time this query is used the application already has a (possibly long) list of the SKOS concept URIs it wants to display. This query is used to look up additional information about those concepts.

Instead of performing a separate query for each concept URI, a single query with a VALUES block containing up to 20 concept URIs is used. So in this case, the VALUES is used thinking of it as a sort of for-each loop. A single query for 20 concepts is faster than performing 20 queries for individual concepts - at least it used to be.

The reason for the double negation is this:

For each concept X, we want to display the SKOS Collections in which concept X is a member, but only if the collection consists only of siblings of X (i.e. sharing at least one broader concept with X). In practice, this has to be turned around into a double negation: for each collection, check that it doesn't include a concept that doesn't share any parent with X.

It's possible that a MINUS expression could be used instead of FILTER NOT EXISTS and perform better. I will have to test this. But other than switching to MINUS, I can't think of any way to express this constraint on collections without using some kind of double negation.

At 3.1.1 (JENA-1171), EXISTS are analysed whereas previous they were
skipped which could lead to wrong answers.

Osma - could you please try putting the VALUES in each arm of the UNION
which gets you to something like the first example.

I will try this as well.

It can be pushed in because:

join(A, union(B,C)) == union(join(A,B), join(A,C))

now if A is an complex expression, that is a bad idea (probably).

If A is a small VALUES block then it makes sense.  It isn't done though.

Ok. So a potential future optimization perhaps.

-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to