Hi Andy,

Thank you very much for your kind explanation. I now remember that I had
read somewhere about the bottom up evaluation of subqueries (it's in the
spec ...), but hadn't got that into my head.

Unfortunately I had oversimplified my example - there is not one
ambiguous URI, but roughly 60,000. So what I got with the IN expression
was a timeout, even when I extended it to more than 24 hours.
(Nevertheless I was quite impressed that the running query, which ate up
most but not all cpu, did not completely block other queries).

I have already worked around the problem outside SPARQL (by loading
preprocessed data). However, I'd love to hear that there is a way to
solve it in SPARQL, within a reasonable execution time. 

Cheers, Joachim

> -----Original Message-----
> From: Andy Seaborne [mailto:[email protected]] 
> On Behalf Of Andy Seaborne
> Sent: Tuesday, July 10, 2012 11:54 AM
> To: [email protected]
> Subject: Re: Variables not bound in subquery
> 
> On 10/07/12 06:05, Neubert Joachim wrote:
> > In the following query
> >
> > PREFIX gnd:     <http://d-nb.info/standards/elementset/gnd#>
> >
> > SELECT ?uri
> > WHERE {
> >    BIND (<http://d-nb.info/gnd/10244669> AS ?uri1)
> >    BIND (<http://d-nb.info/gnd/1024466-9> AS ?uri2)
> >    { {
> >        SELECT (?uri1 AS ?uri)
> >        WHERE {
> >          ?uri1 a gnd:CorporateBody .
> >        }
> >      } UNION {
> >        SELECT (?uri2 AS ?uri)
> >        WHERE {
> >          ?uri2 a gnd:CorporateBody .
> >        }
> >    } }
> > }
> >
> > I'd expect that the ?uri1 and ?uri2 variables are bound in the 
> > subqueries, and as a result to get zero, one or two values for ?uri.
> > However, I get every possible gnd:CorporateBody (more than 
> a million).
> >
> > It would be nice if somebody could point out why this 
> happens, and how 
> > I could work arround it. (Duplicating the BIND part and 
> moving it into 
> > the subquery works, but since it involves a query to a 
> remote service 
> > and some function calls, I'd prefer not to).
> >
> > Cheers, Joachim
> >
> 
> Evaluation is bottom-up - subparts are evaluated then combined.
> 
> SELECT (?uri1 AS ?uri) exposes ?uri and any mention of ?uri1 
> inside the SELECT is hidden (it's a different ?uri -- 
> strictly it's the same name but it will never meet the ?uri1 BIND
> 
> So the only thing coming out of SELECT (?uri1 AS ?uri) is a 
> result row of one variable, ?uri.  There is no ?uri1 outside 
> the projection.
> 
> You have the structure:
> 
> BIND ... ?uri1
> BIND ... ?uri2
> {
> SELECT ... ?uri
>     union
> SELECT ... ?uri
> }
> 
> 
> This query
> 
> PREFIX gnd:     <http://d-nb.info/standards/elementset/gnd#>
> 
> SELECT ?uri
> WHERE {
>    ?uri a gnd:CorporateBody .
>    FILTER ( <http://d-nb.info/gnd/10244669> = ?uri ||
>             <http://d-nb.info/gnd/1024466-9> = ?uri ) }
> 
> 
> finds the ?uri that are one of the two URIs.
> 
> Or
> SELECT ?uri
> WHERE {
>    ?uri a gnd:CorporateBody .
>    FILTER ( ?uri IN (<http://d-nb.info/gnd/10244669>,
>                      <http://d-nb.info/gnd/1024466-9> )) }
> 
> which gets to the same execution plan --
> 
> It's optimized as well:
> 
> (project (?uri)
>    (disjunction
>      (assign ((?uri <http://d-nb.info/gnd/10244669>))
>        (bgp (triple <http://d-nb.info/gnd/10244669>
>                     rdf:type gnd:CorporateBody)))
>      (assign ((?uri <http://d-nb.info/gnd/1024466-9>))
>        (bgp (triple <http://d-nb.info/gnd/1024466-9>
>                     rdf:type gnd:CorporateBody)))))
> 
> i.e. it tries one case
> 
> { <http://d-nb.info/gnd/10244669> rdf:type gnd:CorporateBody }
> 
> then tries the other
> 
> { <http://d-nb.info/gnd/10244669-9> rdf:type gnd:CorporateBody }
> 
> which is two probes of the database, not filtering a million items.
> 
>       Andy
> 

Reply via email to