On 30/01/2020 21:18, Andy Seaborne wrote:

Let's stick to facts and public information.
Hmm, I am not aware I didn't stick to facts.

There is nothing new here. "TopbraidDataset.listNames is not working for our platform".

The behavior of Jena has changed. With each low level change to ARQ there is a chance that existing queries may break. Users like ourselves need to write queries in the syntax that works best for the specific use cases. If we can suddenly no longer rely on BIND being executed before GRAPH (as it did all those years before), then please don't be surprised that we'll not be thrilled about rewriting our queries. We have gone through that several times in the past, and it doesn't improve confidence in relying on SPARQL in general. One coping strategy for us would be to reduce dependency on SPARQL and the GRAPH keyword in general, but that's a long shot. What we really need is a more imperative, predictable query language.

I understand that for some Datasets (e.g. single TDB with proper named graphs) it is no problem to first iterate over all graphs as part of a pattern. But I also don't think TopBraid's architecture is too different from how other applications are working. So other users may run into the same problem. Maybe a dataset could inform the optimizer about which strategy to use?


TopQuadrant codebase has a mechanism to adapt the ARQ engine.  It has even got documentation in the codebase. There are examples in git history as well.
If there is a clean way to switch off certain optimizations or activate other optimizations then I'd be happy to learn about them. Is this documented anywhere? We cannot just switch off all optimizations, and in this particular case the change was to a static function that cannot be overloaded either.

On 29/01/2020 22:31, Holger Knublauch wrote:
Hi Andy,

thanks for the fast response.

On 30/01/2020 01:47, Andy Seaborne wrote:
> I am still trying to narrow down what has changed

It may be JENA-1813 - being about optimziation, the details of the real case probably matter. Let us know what you discover.

JENA-1813 fixed a specific problem caused by the BIND inside the GRAPH ("AS ?dummy" - not the outer one).  If the inside is a different pattern it seems optimize properly if I understand the details. Maybe some small rewrite will work in the real case.

A small rewrite is not an option for us because too many queries are affected, including those written by customers. It is a very common pattern to first compute the GRAPH with a BIND and then walk into that graph (only).

I tried this variation:

SELECT *
WHERE {
      BIND(<http://topbraid.org/teamwork#teamGraph>($this) AS ?teamGraph)
      GRAPH ?teamGraph {
          BIND (str(?teamGraph) AS ?dummy)
      }
}

and ?teamGraph is still unbound inside the GRAPH clause:

You don't need the GRAPH to call BIND (function(..) AS ?dummy) for a proper function.

The key point is that the (extend) is top of (graph) eval (i.e. BIND is last) in GRAPH ... {}.

So put FILTER(true) in:

GRAPH ?teamGraph {
   BIND (42 AS ?dummy)
   FILTER(true)
}

but JENA-1815 may change that.
We have for now rolled back the Jena update and will have several months before our next major release. Maybe JENA-1815 will have happened in the meantime.

What will happen is that ARQ will get the right answers.  Correct before performant.

Using the graph name inside GRAPH introduces a different situation - it's different optimization issue to the first example query.


(join
   (extend ((?teamGraph (<http://topbraid.org/teamwork#teamGraph> ?this)))
     (table unit))
   (graph ?teamGraph
     (extend ((?dummy (str ?teamGraph)))
       (table unit))))

This doesn't look correct to me.

Looks correct to me.

Even if it formally would be correct, it would be very inefficient and thus render SPARQL's GRAPH keyword useless.

For the audience - ARQ is not executing in a quad-mode; iterating graph names has side effects.

I am not sure what else to provide but I am afraid we cannot use this version of Jena until the optimization is (back) in place.

Contributions welcome.

This code is too deep for me to contribute efficiently, esp I don't even know what your plans would be.

Holger



Thanks,
Holger



There is a follow-up (open) JENA-1815 as well.

Question: was this change intentional and is this behavior going to stay in Jena? If it does stay, how can I switch it off?

The first algebra expression below is the translation of the query and what is executing now some optimization doesn't get applied. It is the right answers.

    Andy

On 29/01/2020 05:08, Holger Knublauch wrote:
Hi,

on a branch of our product we have upgraded to jena 3.14 (due to the Thrift issue). Since then various SPARQL queries have stopped working. Of particular concern appears to be the evaluation of GRAPH ?graph clauses where ?graph is determined by a BIND. I am still trying to narrow down what has changed but suspect it's due to

https://issues.apache.org/jira/browse/JENA-1813

Here is an example query - a simplified version derived from a query in the product:

SELECT *
WHERE {
BIND(<http://topbraid.org/teamwork#teamGraph>($this) AS ?teamGraph)
     GRAPH ?teamGraph {
         BIND (42 AS ?dummy)
     }
}

which is executed with $this prebound to a graph resource (e.g. <urn:x-evn-master:geo>). In 3.14 this seems to produce the following algebra:

(join
   (extend ((?teamGraph (<http://topbraid.org/teamwork#teamGraph> ?this)))
     (table unit))
   (graph ?teamGraph
     (extend ((?dummy 42))
       (table unit))))

while in 3.13.1 it produces

(sequence
   (extend ((?teamGraph (teamwork:teamGraph ?this)))
     (table unit))
   (graph ?teamGraph
     (extend ((?dummy 42))
       (table unit))))

As a result, the new version enters the GRAPH ?teamGraph with unbound ?teamGraph and thus iterates over all graphs which is not working for our platform:

If my observations so far are correct then this change to Jena would have quite deep consequences and break not just our own queries but also those written by customers. We will likely have to roll back to a previous Jena version for the upcoming release of our product.

Question: was this change intentional and is this behavior going to stay in Jena? If it does stay, how can I switch it off?

Thanks
Holger


Reply via email to