Re: Different SPARQL execution order with recent Jena update (JENA-1813?)

Holger Knublauch Thu, 30 Jan 2020 16:53:18 -0800

On 30/01/2020 21:18, Andy Seaborne wrote:

Let's stick to facts and public information.

Hmm, I am not aware I didn't stick to facts.

There is nothing new here. "TopbraidDataset.listNames is not workingfor our platform".

The behavior of Jena has changed. With each low level change to ARQthere is a chance that existing queries may break. Users like ourselvesneed to write queries in the syntax that works best for the specific usecases. If we can suddenly no longer rely on BIND being executed beforeGRAPH (as it did all those years before), then please don't be surprisedthat we'll not be thrilled about rewriting our queries. We have gonethrough that several times in the past, and it doesn't improveconfidence in relying on SPARQL in general. One coping strategy for uswould be to reduce dependency on SPARQL and the GRAPH keyword ingeneral, but that's a long shot. What we really need is a moreimperative, predictable query language.

I understand that for some Datasets (e.g. single TDB with proper namedgraphs) it is no problem to first iterate over all graphs as part of apattern. But I also don't think TopBraid's architecture is too differentfrom how other applications are working. So other users may run into thesame problem. Maybe a dataset could inform the optimizer about whichstrategy to use?

TopQuadrant codebase has a mechanism to adapt the ARQ engine. It haseven got documentation in the codebase. There are examples in githistory as well.

If there is a clean way to switch off certain optimizations or activateother optimizations then I'd be happy to learn about them. Is thisdocumented anywhere? We cannot just switch off all optimizations, and inthis particular case the change was to a static function that cannot beoverloaded either.

On 29/01/2020 22:31, Holger Knublauch wrote:
Hi Andy,

thanks for the fast response.

On 30/01/2020 01:47, Andy Seaborne wrote:
> I am still trying to narrow down what has changed
It may be JENA-1813 - being about optimziation, the details of thereal case probably matter. Let us know what you discover.
JENA-1813 fixed a specific problem caused by the BIND inside theGRAPH ("AS ?dummy" - not the outer one). If the inside is adifferent pattern it seems optimize properly if I understand thedetails. Maybe some small rewrite will work in the real case.
A small rewrite is not an option for us because too many queries areaffected, including those written by customers. It is a very commonpattern to first compute the GRAPH with a BIND and then walk intothat graph (only).
I tried this variation:

SELECT *
WHERE {
BIND(<http://topbraid.org/teamwork#teamGraph>($this) AS?teamGraph)
      GRAPH ?teamGraph {
          BIND (str(?teamGraph) AS ?dummy)
      }
}

and ?teamGraph is still unbound inside the GRAPH clause:
You don't need the GRAPH to call BIND (function(..) AS ?dummy) for aproper function.
The key point is that the (extend) is top of (graph) eval (i.e. BINDis last) in GRAPH ... {}.
So put FILTER(true) in:

GRAPH ?teamGraph {
   BIND (42 AS ?dummy)
   FILTER(true)
}

but JENA-1815 may change that.

We have for now rolled back the Jena update and will have several monthsbefore our next major release. Maybe JENA-1815 will have happened in themeantime.

What will happen is that ARQ will get the right answers. Correctbefore performant.
Using the graph name inside GRAPH introduces a different situation -it's different optimization issue to the first example query.
(join
(extend ((?teamGraph (<http://topbraid.org/teamwork#teamGraph>?this)))
     (table unit))
   (graph ?teamGraph
     (extend ((?dummy (str ?teamGraph)))
       (table unit))))

This doesn't look correct to me.
Looks correct to me.
Even if it formally would be correct, it would be very inefficientand thus render SPARQL's GRAPH keyword useless.
For the audience - ARQ is not executing in a quad-mode; iteratinggraph names has side effects.
I am not sure what else to provide but I am afraid we cannot use thisversion of Jena until the optimization is (back) in place.
Contributions welcome.

This code is too deep for me to contribute efficiently, esp I don't evenknow what your plans would be.


Holger

Thanks,
Holger
There is a follow-up (open) JENA-1815 as well.
Question: was this change intentional and is this behavior going tostay in Jena? If it does stay, how can I switch it off?
The first algebra expression below is the translation of the queryand what is executing now some optimization doesn't get applied. Itis the right answers.
    Andy

On 29/01/2020 05:08, Holger Knublauch wrote:
Hi,
on a branch of our product we have upgraded to jena 3.14 (due tothe Thrift issue). Since then various SPARQL queries have stoppedworking. Of particular concern appears to be the evaluation ofGRAPH ?graph clauses where ?graph is determined by a BIND. I amstill trying to narrow down what has changed but suspect it's due to
https://issues.apache.org/jira/browse/JENA-1813
Here is an example query - a simplified version derived from aquery in the product:
SELECT *
WHERE {
BIND(<http://topbraid.org/teamwork#teamGraph>($this) AS ?teamGraph)
     GRAPH ?teamGraph {
         BIND (42 AS ?dummy)
     }
}
which is executed with $this prebound to a graph resource (e.g.<urn:x-evn-master:geo>). In 3.14 this seems to produce thefollowing algebra:
(join
(extend ((?teamGraph (<http://topbraid.org/teamwork#teamGraph>?this)))
     (table unit))
   (graph ?teamGraph
     (extend ((?dummy 42))
       (table unit))))

while in 3.13.1 it produces

(sequence
   (extend ((?teamGraph (teamwork:teamGraph ?this)))
     (table unit))
   (graph ?teamGraph
     (extend ((?dummy 42))
       (table unit))))
As a result, the new version enters the GRAPH ?teamGraph withunbound ?teamGraph and thus iterates over all graphs which is notworking for our platform:
If my observations so far are correct then this change to Jenawould have quite deep consequences and break not just our ownqueries but also those written by customers. We will likely have toroll back to a previous Jena version for the upcoming release ofour product.
Question: was this change intentional and is this behavior going tostay in Jena? If it does stay, how can I switch it off?
Thanks
Holger

Re: Different SPARQL execution order with recent Jena update (JENA-1813?)

Reply via email to