On 30/01/2020 21:18, Andy Seaborne wrote:
Let's stick to facts and public information.
Hmm, I am not aware I didn't stick to facts.
There is nothing new here. "TopbraidDataset.listNames is not working
for our platform".
The behavior of Jena has changed. With each low level change to ARQ
there is a chance that existing queries may break. Users like ourselves
need to write queries in the syntax that works best for the specific use
cases. If we can suddenly no longer rely on BIND being executed before
GRAPH (as it did all those years before), then please don't be surprised
that we'll not be thrilled about rewriting our queries. We have gone
through that several times in the past, and it doesn't improve
confidence in relying on SPARQL in general. One coping strategy for us
would be to reduce dependency on SPARQL and the GRAPH keyword in
general, but that's a long shot. What we really need is a more
imperative, predictable query language.
I understand that for some Datasets (e.g. single TDB with proper named
graphs) it is no problem to first iterate over all graphs as part of a
pattern. But I also don't think TopBraid's architecture is too different
from how other applications are working. So other users may run into the
same problem. Maybe a dataset could inform the optimizer about which
strategy to use?
TopQuadrant codebase has a mechanism to adapt the ARQ engine. It has
even got documentation in the codebase. There are examples in git
history as well.
If there is a clean way to switch off certain optimizations or activate
other optimizations then I'd be happy to learn about them. Is this
documented anywhere? We cannot just switch off all optimizations, and in
this particular case the change was to a static function that cannot be
overloaded either.
On 29/01/2020 22:31, Holger Knublauch wrote:
Hi Andy,
thanks for the fast response.
On 30/01/2020 01:47, Andy Seaborne wrote:
> I am still trying to narrow down what has changed
It may be JENA-1813 - being about optimziation, the details of the
real case probably matter. Let us know what you discover.
JENA-1813 fixed a specific problem caused by the BIND inside the
GRAPH ("AS ?dummy" - not the outer one). If the inside is a
different pattern it seems optimize properly if I understand the
details. Maybe some small rewrite will work in the real case.
A small rewrite is not an option for us because too many queries are
affected, including those written by customers. It is a very common
pattern to first compute the GRAPH with a BIND and then walk into
that graph (only).
I tried this variation:
SELECT *
WHERE {
BIND(<http://topbraid.org/teamwork#teamGraph>($this) AS
?teamGraph)
GRAPH ?teamGraph {
BIND (str(?teamGraph) AS ?dummy)
}
}
and ?teamGraph is still unbound inside the GRAPH clause:
You don't need the GRAPH to call BIND (function(..) AS ?dummy) for a
proper function.
The key point is that the (extend) is top of (graph) eval (i.e. BIND
is last) in GRAPH ... {}.
So put FILTER(true) in:
GRAPH ?teamGraph {
BIND (42 AS ?dummy)
FILTER(true)
}
but JENA-1815 may change that.
We have for now rolled back the Jena update and will have several months
before our next major release. Maybe JENA-1815 will have happened in the
meantime.
What will happen is that ARQ will get the right answers. Correct
before performant.
Using the graph name inside GRAPH introduces a different situation -
it's different optimization issue to the first example query.
(join
(extend ((?teamGraph (<http://topbraid.org/teamwork#teamGraph>
?this)))
(table unit))
(graph ?teamGraph
(extend ((?dummy (str ?teamGraph)))
(table unit))))
This doesn't look correct to me.
Looks correct to me.
Even if it formally would be correct, it would be very inefficient
and thus render SPARQL's GRAPH keyword useless.
For the audience - ARQ is not executing in a quad-mode; iterating
graph names has side effects.
I am not sure what else to provide but I am afraid we cannot use this
version of Jena until the optimization is (back) in place.
Contributions welcome.
This code is too deep for me to contribute efficiently, esp I don't even
know what your plans would be.
Holger
Thanks,
Holger
There is a follow-up (open) JENA-1815 as well.
Question: was this change intentional and is this behavior going to
stay in Jena? If it does stay, how can I switch it off?
The first algebra expression below is the translation of the query
and what is executing now some optimization doesn't get applied. It
is the right answers.
Andy
On 29/01/2020 05:08, Holger Knublauch wrote:
Hi,
on a branch of our product we have upgraded to jena 3.14 (due to
the Thrift issue). Since then various SPARQL queries have stopped
working. Of particular concern appears to be the evaluation of
GRAPH ?graph clauses where ?graph is determined by a BIND. I am
still trying to narrow down what has changed but suspect it's due to
https://issues.apache.org/jira/browse/JENA-1813
Here is an example query - a simplified version derived from a
query in the product:
SELECT *
WHERE {
BIND(<http://topbraid.org/teamwork#teamGraph>($this) AS ?teamGraph)
GRAPH ?teamGraph {
BIND (42 AS ?dummy)
}
}
which is executed with $this prebound to a graph resource (e.g.
<urn:x-evn-master:geo>). In 3.14 this seems to produce the
following algebra:
(join
(extend ((?teamGraph (<http://topbraid.org/teamwork#teamGraph>
?this)))
(table unit))
(graph ?teamGraph
(extend ((?dummy 42))
(table unit))))
while in 3.13.1 it produces
(sequence
(extend ((?teamGraph (teamwork:teamGraph ?this)))
(table unit))
(graph ?teamGraph
(extend ((?dummy 42))
(table unit))))
As a result, the new version enters the GRAPH ?teamGraph with
unbound ?teamGraph and thus iterates over all graphs which is not
working for our platform:
If my observations so far are correct then this change to Jena
would have quite deep consequences and break not just our own
queries but also those written by customers. We will likely have to
roll back to a previous Jena version for the upcoming release of
our product.
Question: was this change intentional and is this behavior going to
stay in Jena? If it does stay, how can I switch it off?
Thanks
Holger