Hi Steve,
> On 12 May 2020, at 22:19, Steve Vestal <[email protected]> wrote:
>
> DuCharme's book says that the order of OPTIONALs matters, in the sense
> that if bindings have been done by an OPTIONAL, then later OPTIONALs
> will have no effect if any variables they might bind have already been
> bound (so order of OPTIONALs can affect query results).
>
> What prompted this query (no pun intended) is the discovery that putting
> a FILTER inside an OPTIONAL {} versus outside makes a difference, even
> though the FILTERs (which were some !sameTerm() conditions to avoid
> aliasing some variables) are the same. If I have the pattern "OPTIONAL
> { <triples> FILTER <filter>} " versus "OPTIONAL {<triples>} FILTER
> <filter>", is the behavior that the first form is considered to fail and
> not bind its variables due to filtering within {}; while the second form
> would consider the OPTIONAL to have succeeded in binding the variables
> and cause subsequent OPTIONALs to be skipped?
Yes, that sounds about right.
SELECT query results are “tables” where each “row” is called a “solution” and
each “column” a “variable”. When we say that a variable is bound to some value,
that always means it is bound to a value *within a particular row* a.k.a.
solution. The intermediate results at every step of SPARQL evaluation are also
“tables” of this same form, and all SPARQL operators can be understood in those
terms—they produce a table from a graph (in the case of triple patterns), or
produce new tables from other tables. For example, the result of UNION is
simply one table appended to another; the result of FILTER is a table with
non-matching rows removed.
In SPARQL semantics, when curly braces are involved, evaluation is
“inside-out”. Whatever is inside curly braces is completely evaluated
independently from anything outside the braces, before being combined with
other solutions from outside.
For OPTIONAL this means that the graph pattern inside { } is completely
evaluated independently from what's before or after the OPTIONAL { }. FILTERs
inside the { } will be evaluated as part of this. Once this is complete, the
results will be combined in an OPTIONAL join (left join) with the solutions of
the graph pattern that preceded the OPTIONAL.
> Do variables stay bound
> once bound, a FILTER just filters the final result, as if all FILTERs
> were applied at the end?
Within a group (a { ... } section), one could say that variables in a solution
stay bound once bound. But entire solutions can be removed by FILTERs. Within a
group, the order of *some* constructs matters (OPTIONAL being one). Other
constructs between these order-dependent ones can be rearranged without
changing the result. So, if you had a graph pattern of this shape:
{ ... OPTIONAL {...} triples FILTER triples FILTER triple OPTIONAL {...}
... }
Then the order of the triples and FILTERs can be freely changed, but moving
them before or after the OPTIONALs would potentially affect the result.
> But in the first pattern above, the filter
> nested inside the OPTIONAL{} is able to unbind anything that was bound
> inside that OPTIONAL?
“Unbinding” is not a particularly good way to think of it.
I would say: The FILTER inside the OPTIONAL { } removes certain solutions from
the result of the { } group, and therefore those removed solutions are not
taken into account when the group's result gets left-joined to whatever came
before the OPTIONAL.
> Ignoring performance issues, are there any cases where the order of
> FILTER statements would affect the result of the query? Or are
> OPTIONALs the only thing that have order-dependent semantics?
OPTIONAL, BIND and MINUS are order-dependent. FILTERs can be moved around
between those, but moving them “over” an order-dependent construct potentially
changes the result.
> Do nested braces have any impact on variable name visibility or
> semantics?
It's complicated.
Some occurrences of a variable potentially “bind” the variable, that is, they
may cause a value to appear for that variable in a solution. This includes
variables in triple patterns, in BIND (expr AS ?var), and in VALUES.
Other occurrences of a variable only “use” variables that have been previously
bound. For example, FILTER can never cause new bindings.
“Bound” variables travel inside-out through curly braces, and top-down within a
single level of nesting. “Used” variables don't travel.
Consider this:
{ a b OPTIONAL { c } d OPTIONAL { e } f }
A variable bound in c would be visible at d and f, but not at a, b, or e.
A sub-SELECT only binds the variables in its variable list; other variables
that may occur inside it are not visible outside.
(One way to test all this is by inserting “BIND (1 AS ?var)” at various points
in the query. This is a syntax error if ?var is already bound at that place.)
> Are all variables appearing anywhere in a WHERE{} in the
> same global namespace, or are there cases where nested {} have some
> namespace semantics?
All variables are in the same namespace. But queries can be written so that two
occurrences of the same variable never interact. The question is what operators
connect them.
Excellent questions by the way!
(What I describe above is one way of determining the correct result of a SPARQL
query. There are other ways that produce the same correct results. The language
in the spec is different. Implementations may do something different. What
matters is that all get the same results.)
Richard