Re: Change in execution order between Jena 2.7.2 and 2.7.3

Holger Knublauch Sat, 11 Aug 2012 18:47:31 -0700

Hi Andy,

oh my, this is really a bigger issue than I thought. The following querypattern also no longer works


SELECT *
WHERE {
    GRAPH <http://spinrdf.org/spin> {
        ?x rdfs:label ?label .
    }
    BIND (?label AS ?result) .
}

The above is again an artificial test case that makes no sense, but weand our customers have an unknown number of queries in production thatuse values from other { ... } blocks in BIND steps, often multiple BINDswhere intermediate values are sliced and diced by succeeding BINDs. Hereis a typical example (?graph is pre-bound from the outside):


SELECT ?result
WHERE {
    {
        BIND (xsd:string(?graph) AS ?str) .
        FILTER fn:starts-with(?str, "urn:x-evn-master:") .
    } .
    BIND (fn:substring(?str, 18) AS ?a) .
    BIND (spif:lastIndexOf(?a, ":") AS ?last) .
    BIND (IF(bound(?last), fn:substring(?a, 1, ?last), ?a) AS ?result) .
}

In another example from our queries

SELECT ?node ?label ?leaf ?icon ?movable
WHERE {
    {
        ?node skos:broader ?parent .

BIND ((!swa:isReadOnlyTriple(?node, skos:broader, ?parent)) AS?movable) .

    }
    UNION
    {
        ?parent skos:hasTopConcept ?node .

BIND ((!swa:isReadOnlyTriple(?parent, skos:hasTopConcept,?node)) AS ?movable) .

    } .
    BIND (NOT EXISTS  {
        ?child skos:broader ?node .
    } AS ?leaf) .
    BIND (ui:label(?node) AS ?label) .
    BIND ("evn-icon-concept" AS ?icon) .
}
ORDER BY (?label)

I have to move the computation of ?label into each branch of the UNION,and move the computation of ?leaf into the SELECT projection. The latterisn't a big problem except for readability, but the double appearance of?label is really bad. The new query is


SELECT ?node ?label ((NOT EXISTS  {
    ?child skos:broader ?node .
}) AS ?leaf) ?icon ?movable
WHERE {
    {
        ?node skos:broader ?parent .

BIND ((!swa:isReadOnlyTriple(?node, skos:broader, ?parent)) AS?movable) .

        BIND (ui:label(?node) AS ?label) .
    }
    UNION
    {
        ?parent skos:hasTopConcept ?node .

BIND ((!swa:isReadOnlyTriple(?parent, skos:hasTopConcept,?node)) AS ?movable) .

        BIND (ui:label(?node) AS ?label) .
    } .
    BIND ("evn-icon-concept" AS ?icon) .
}
ORDER BY (?label)

Others are harder to refactor. For example I have to reformulate this query

SELECT ?actionName ?onSelect ?enabled ?group ?label ?iconClass
WHERE {
    GRAPH ui:unionGraph {
        {
            ?action a swa:ResourceAction .
            ?action rdfs:label ?label .
            ?action arg:condition ?condition .
            BIND (ui:encodeNode(?action) AS ?actionName) .
            BIND (spl:object(?action, arg:onSelect) AS ?onSelectRaw) .

BIND (COALESCE(?onSelectRaw,IF(swa:hasOtherArgument(?action), CONCAT("swa.openHandlerDialog(\"",ui:escapeJSON(?label), "\", \"<", xsd:string(?action), ">\", \"",ui:escapeJSON(xsd:string(?resource)), "\")"), ?none)) AS ?onSelect) .

            BIND (COALESCE(spl:object(?action, arg:group), "") AS ?group) .
            BIND (spl:object(?action, arg:iconClass) AS ?iconClass) .

FILTER (((!bound(?appName)) || (?appName = "")) ||swa:actionHasAppName(?action, ?appName)) .

        } .
        BIND (spin:eval(?condition, arg:resource, ?resource) AS ?enabled) .
        FILTER bound(?enabled) .
    } .
}
ORDER BY (?group) (?label)

because the FILTER depends on the previous BIND, but the BIND can't usethe values from the upper block. I really don't want the spin:eval to becalled if the FILTER above it is false - it's an expensive operation. Iguess it has to become


SELECT ?actionName ?onSelect ?enabled ?group ?label ?iconClass
WHERE {
    GRAPH ui:unionGraph {
        ?action a swa:ResourceAction .
        ?action rdfs:label ?label .
        ?action arg:condition ?condition .
        BIND (ui:encodeNode(?action) AS ?actionName) .
        BIND (spl:object(?action, arg:onSelect) AS ?onSelectRaw) .

BIND (COALESCE(?onSelectRaw, IF(swa:hasOtherArgument(?action),CONCAT("swa.openHandlerDialog(\"", ui:escapeJSON(?label), "\", \"<",xsd:string(?action), ">\", \"", ui:escapeJSON(xsd:string(?resource)),"\")"), ?none)) AS ?onSelect) .

        BIND (COALESCE(spl:object(?action, arg:group), "") AS ?group) .
        BIND (spl:object(?action, arg:iconClass) AS ?iconClass) .

BIND ((((!bound(?appName)) || (?appName = "")) ||swa:actionHasAppName(?action, ?appName)) AS ?app) .BIND (IF(?app, spin:eval(?condition, arg:resource, ?resource),?none) AS ?enabled) .

        FILTER bound(?enabled) .
    } .
}
ORDER BY (?group) (?label)

i.e. the trick is to replace the upper FILTER with the intermediatehelper variable ?app, and use this to prevent the spin:eval call with anIF. This trick obviously doesn't scale if there is a chain of other BINDs.

While I don't understand all the technical details, I believe BIND hasbecome unnecessarily limited and unintuitive with this spec. If yourprevious implementation (that you had for many years including LET) wasindeed just a bug then it was a very useful bug. What ever happened tothe nice mantra that SPARQL is executed from the inside out, if itbecomes impossible to use the produced values in BIND statements? Itseems that the baby has been thrown out with the bath water here.

I believe TQ will need to raise this issue with the SPARQL 1.1 WG again,although it seems we are very late in the process.

BTW in the future it would be helpful to see such changes listed in therelease notes.

And yes, optimizing the FILTER placement would be great and would removesome of the pain and allow query authors to improve query performance.


Thanks,
Holger


On 8/12/2012 1:19, Andy Seaborne wrote:

On 11/08/12 00:50, Holger Knublauch wrote:

On 8/10/2012 19:40, Andy Seaborne wrote:

On 10/08/12 02:12, Holger Knublauch wrote:

Andy,

we are evaluating the move to 2.7.3 and have been immediately hit by
what looks like a change of SPARQL semantics in ARQ. See the attached
Java test which returns "Test" in 272 but null in 273. The query is
really simple:

     SELECT *
     WHERE {
         {
             BIND ("Test" AS ?label) .
         } .
         BIND (?label AS ?result) .
     }

but ?label is no longer visible in the outer BIND. The same happens if

you replace the inner BIND with a BGP that binds ?label, but Iwanted to

make the example model independent.

So my obvious question: is this the intended behavior, why the change
etc?


2.7.3 is right - 2.7.2. is wrong (plain old bug, fixed due to having
to clarify scoping in the SPARQL spec so I went back and check ARQ).

>          {
>              BIND ("Test" AS ?label) .
>          } .
>          BIND (?label AS ?result) .

That's a join of the inner, first BIND and the outer BIND.

The Outer BIND applies to the immediately preceeding BGP. BIND binds
quite tightly (if you'll forgive the pun).

The preceeding BGP is actually empty - it's between the "}" and
"BIND (?label AS ?result) ."

Think of it as :

    {
        { BIND ("Test" AS ?label) . }
        {} BIND (?label AS ?result) .
    }

technically, that's structurally different but it stresses the empty
part before second BIND.

The important factor is the scope of ?label.

The query joins "BIND ("Test" AS ?label)" and
"BIND (?label AS ?result)".  So it evals "BIND (?label AS ?result)"
not in the context of the "BIND ("Test" AS ?label)" i.e. the use of
?label in "BIND (?label AS ?result)" is unbound.


Thanks Andy. I cannot claim that I understand this yet. Nor do I believe
many of our users will. Where does the "hidden {}" come from?

The pattern that I don't see how to solve with the new design is as
follows:

It's not a new design ... it's what the spec has said all alongalthough it was a bit of a mess. The descriptive section was clear;the formal section was open to "multiple interpretations" at best,including none :-( Any spec changes are to make it clear.Also, ARQwas just plain wrong and had a bug regardless of the spec.

     {
         ?x ex:prop ?value .
         FILTER (?value some condition) .
     }
     BIND (my:function(?value) AS ?result) .

I only want my:function to execute if the FILTER is passed. Therefore I
cannot simply write

     ?x ex:prop ?value .
     FILTER (?value some condition) .
     BIND (my:function(?value) AS ?result) .

because 2.7.2 moves the FILTER to the end and makes it effectively

     ?x ex:prop ?value .
     BIND (my:function(?value) AS ?result) .
     FILTER (?value some condition) .

I had introduced the inner { ... } block to ensure that the FILTER is
grouped together with the previous line. The mantra "SPARQL executes
from the inside out" was just easy enough to explain, but now inner
blocks seem to have become useless.

How would I have to rewrite the first query to make sure that the BIND
is only executed after the FILTER, but with ?value bound?

So this wil do exactly what you want - the SELECT expression form willdo what you want.


SELECT ?value (my:function(?value) AS ?result)
{
   ?x ex:prop ?value .
   FILTER (?value some condition) .
}

It is regrettable that * isn't allowed in this position.

Then it really is like:

BIND (my:function(?value) AS ?result WHERE
      {
         ?x ex:prop ?value .
         FILTER (?value some condition) .
      })

The other way to approach is that

  {
    ?x ex:prop ?value .
    BIND (my:function(?value) AS ?result) .
    FILTER (?value some condition) .
  }

Any function should really cope with anything pased to it - it canreturn as error (an exception) and ?result is not bound.

The optimizer can push the filter through the (extend) - the algebraoperator for BIND - so the execution is more efficient.


BGP -> extend -> filter

becomes

BGP -> filter -> extend

It can do this because the extend variable ?result is not used in thefilter. The code (TransformFilterPlacement) does not currently do this.

I'd file a JIRA for it but JIRA@ASF is undergoing maintenance at themoment. They are having to move it to a bigger machine due to toomuch load.


    Andy


Thanks
Holger

Re: Change in execution order between Jena 2.7.2 and 2.7.3

Reply via email to