Hi,
Some further observations. The query I sent earlier was a minimal
example, and it was possible to fix it by just moving the VALUES block.
But a slightly more realistic (closer to the original query I'm having
problems with) example involves a UNION and cannot be fixed so easily -
placing the VALUES block first doesn't help:
--cut--
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT *
WHERE {
VALUES ?uri { <http://www.yso.fi/onto/yso/p864> }
{ ?s ?p ?uri }
UNION
{ ?uri ?p ?o
OPTIONAL {
?x skos:member ?o .
FILTER NOT EXISTS {
?x skos:member ?other .
FILTER NOT EXISTS {
?other skos:broader ?uri
}
}
} }
}
--cut--
Jena 3.1.0 tdbquery: 0.9 seconds
Jena 3.1.1-SNAPSHOT tdbquery: 12.8 seconds
I'm aware that in SPARQL, evaluation proceeds from the inside out and
Jena ARQ has moved more and more in this direction with recent releases,
which may also explain this change. But how should VALUES blocks be
placed for optimal query execution? It seems like a waste not to
propagate those fixed bindings into inner parts of the query, even
though that may violate the inside-out order. In the above query, I
don't know where to place the VALUES so that the binding for ?uri (in
effect, changing the variable to a constant) would be applied in all
parts of the query.
Placing the VALUES block at the bottom of the query (outside the WHERE
block) doesn't help either. In fact execution time increases to 17
seconds with 3.1.1-SNAPSHOT (but is unchanged with 3.1.0).
I tried --engine=ref and it was extremely slow also with 3.1.0, so in
that sense, nothing has changed, only an optimization has been dropped
somewhere.
Should I report this as an issue? Or am I just doing something wrong?
-Osma
On 01/11/16 11:03, Osma Suominen wrote:
Hi,
I'm investigating a performance regression we're seeing with the current
Jena 3.1.1-SNAPSHOT compared to 3.1.0.
The data in graph <http://www.yso.fi/onto/yso/> is the YSO ontology,
available from http://api.finto.fi/download/yso/yso-skos.ttl
This query used to take about 0.2 seconds (with 3.1.0) and now takes
about 10 seconds:
--cut--
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT *
FROM NAMED <http://www.yso.fi/onto/yso/>
WHERE {
?uri ?p ?o .
OPTIONAL {
?x skos:member ?o .
FILTER NOT EXISTS {
?x skos:member ?other .
FILTER NOT EXISTS {
?other skos:broader ?uri
}
}
}
VALUES ?uri { <http://www.yso.fi/onto/yso/p864> }
}
--cut--
If I move the VALUES block to the top of the query, right after WHERE,
then the query becomes fast again.
Is the placement of the VALUES block supposed to affect query evaluation
order in this way? It appears to me that in the slow version, ?uri is
not bound inside the inner FILTER NOT EXISTS, which causes an explosion
of results internally.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi