Hi Steve,

> On 12 May 2020, at 22:19, Steve Vestal <[email protected]> wrote:
> 
> DuCharme's book says that the order of OPTIONALs matters, in the sense
> that if bindings have been done by an OPTIONAL, then later OPTIONALs
> will have no effect if any variables they might bind have already been
> bound (so order of OPTIONALs can affect query results). 
> 
> What prompted this query (no pun intended) is the discovery that putting
> a FILTER inside an OPTIONAL {} versus outside makes a difference, even
> though the FILTERs (which were some !sameTerm() conditions to avoid
> aliasing some variables) are the same.  If I have the pattern "OPTIONAL
> { <triples> FILTER <filter>} " versus "OPTIONAL {<triples>} FILTER
> <filter>", is the behavior that the first form is considered to fail and
> not bind its variables due to filtering within {}; while the second form
> would consider the OPTIONAL to have succeeded in binding the variables
> and cause subsequent OPTIONALs to be skipped?

Yes, that sounds about right.

SELECT query results are “tables” where each “row” is called a “solution” and 
each “column” a “variable”. When we say that a variable is bound to some value, 
that always means it is bound to a value *within a particular row* a.k.a. 
solution. The intermediate results at every step of SPARQL evaluation are also 
“tables” of this same form, and all SPARQL operators can be understood in those 
terms—they produce a table from a graph (in the case of triple patterns), or 
produce new tables from other tables. For example, the result of UNION is 
simply one table appended to another; the result of FILTER is a table with 
non-matching rows removed.

In SPARQL semantics, when curly braces are involved, evaluation is 
“inside-out”. Whatever is inside curly braces is completely evaluated 
independently from anything outside the braces, before being combined with 
other solutions from outside.

For OPTIONAL this means that the graph pattern inside { } is completely 
evaluated independently from what's before or after the OPTIONAL { }. FILTERs 
inside the { } will be evaluated as part of this. Once this is complete, the 
results will be combined in an OPTIONAL join (left join) with the solutions of 
the graph pattern that preceded the OPTIONAL.

> Do variables stay bound
> once bound, a FILTER just filters the final result, as if all FILTERs
> were applied at the end? 

Within a group (a { ... } section), one could say that variables in a solution 
stay bound once bound. But entire solutions can be removed by FILTERs. Within a 
group, the order of *some* constructs matters (OPTIONAL being one). Other 
constructs between these order-dependent ones can be rearranged without 
changing the result. So, if you had a graph pattern of this shape:

    { ... OPTIONAL {...} triples FILTER triples FILTER triple OPTIONAL {...} 
... }

Then the order of the triples and FILTERs can be freely changed, but moving 
them before or after the OPTIONALs would potentially affect the result.

> But in the first pattern above, the filter
> nested inside the OPTIONAL{} is able to unbind anything that was bound
> inside that OPTIONAL?

“Unbinding” is not a particularly good way to think of it.

I would say: The FILTER inside the OPTIONAL { } removes certain solutions from 
the result of the { } group, and therefore those removed solutions are not 
taken into account when the group's result gets left-joined to whatever came 
before the OPTIONAL.

> Ignoring performance issues, are there any cases where the order of
> FILTER statements would affect the result of the query?  Or are
> OPTIONALs the only thing that have order-dependent semantics?

OPTIONAL, BIND and MINUS are order-dependent. FILTERs can be moved around 
between those, but moving them “over” an order-dependent construct potentially 
changes the result.

> Do nested braces have any impact on variable name visibility or
> semantics?

It's complicated.

Some occurrences of a variable potentially “bind” the variable, that is, they 
may cause a value to appear for that variable in a solution. This includes 
variables in triple patterns, in BIND (expr AS ?var), and in VALUES.

Other occurrences of a variable only “use” variables that have been previously 
bound. For example, FILTER can never cause new bindings.

“Bound” variables travel inside-out through curly braces, and top-down within a 
single level of nesting. “Used” variables don't travel.

Consider this:

   { a b OPTIONAL { c } d OPTIONAL { e } f }

A variable bound in c would be visible at d and f, but not at a, b, or e.

A sub-SELECT only binds the variables in its variable list; other variables 
that may occur inside it are not visible outside.

(One way to test all this is by inserting “BIND (1 AS ?var)” at various points 
in the query. This is a syntax error if ?var is already bound at that place.)

> Are all variables appearing anywhere in a WHERE{} in the
> same global namespace, or are there cases where nested {} have some
> namespace semantics? 

All variables are in the same namespace. But queries can be written so that two 
occurrences of the same variable never interact. The question is what operators 
connect them.

Excellent questions by the way!

(What I describe above is one way of determining the correct result of a SPARQL 
query. There are other ways that produce the same correct results. The language 
in the spec is  different. Implementations may do something different. What 
matters is that all get the same results.)

Richard

Reply via email to