Hi Andy, Thank you very much for your kind explanation. I now remember that I had read somewhere about the bottom up evaluation of subqueries (it's in the spec ...), but hadn't got that into my head.
Unfortunately I had oversimplified my example - there is not one ambiguous URI, but roughly 60,000. So what I got with the IN expression was a timeout, even when I extended it to more than 24 hours. (Nevertheless I was quite impressed that the running query, which ate up most but not all cpu, did not completely block other queries). I have already worked around the problem outside SPARQL (by loading preprocessed data). However, I'd love to hear that there is a way to solve it in SPARQL, within a reasonable execution time. Cheers, Joachim > -----Original Message----- > From: Andy Seaborne [mailto:[email protected]] > On Behalf Of Andy Seaborne > Sent: Tuesday, July 10, 2012 11:54 AM > To: [email protected] > Subject: Re: Variables not bound in subquery > > On 10/07/12 06:05, Neubert Joachim wrote: > > In the following query > > > > PREFIX gnd: <http://d-nb.info/standards/elementset/gnd#> > > > > SELECT ?uri > > WHERE { > > BIND (<http://d-nb.info/gnd/10244669> AS ?uri1) > > BIND (<http://d-nb.info/gnd/1024466-9> AS ?uri2) > > { { > > SELECT (?uri1 AS ?uri) > > WHERE { > > ?uri1 a gnd:CorporateBody . > > } > > } UNION { > > SELECT (?uri2 AS ?uri) > > WHERE { > > ?uri2 a gnd:CorporateBody . > > } > > } } > > } > > > > I'd expect that the ?uri1 and ?uri2 variables are bound in the > > subqueries, and as a result to get zero, one or two values for ?uri. > > However, I get every possible gnd:CorporateBody (more than > a million). > > > > It would be nice if somebody could point out why this > happens, and how > > I could work arround it. (Duplicating the BIND part and > moving it into > > the subquery works, but since it involves a query to a > remote service > > and some function calls, I'd prefer not to). > > > > Cheers, Joachim > > > > Evaluation is bottom-up - subparts are evaluated then combined. > > SELECT (?uri1 AS ?uri) exposes ?uri and any mention of ?uri1 > inside the SELECT is hidden (it's a different ?uri -- > strictly it's the same name but it will never meet the ?uri1 BIND > > So the only thing coming out of SELECT (?uri1 AS ?uri) is a > result row of one variable, ?uri. There is no ?uri1 outside > the projection. > > You have the structure: > > BIND ... ?uri1 > BIND ... ?uri2 > { > SELECT ... ?uri > union > SELECT ... ?uri > } > > > This query > > PREFIX gnd: <http://d-nb.info/standards/elementset/gnd#> > > SELECT ?uri > WHERE { > ?uri a gnd:CorporateBody . > FILTER ( <http://d-nb.info/gnd/10244669> = ?uri || > <http://d-nb.info/gnd/1024466-9> = ?uri ) } > > > finds the ?uri that are one of the two URIs. > > Or > SELECT ?uri > WHERE { > ?uri a gnd:CorporateBody . > FILTER ( ?uri IN (<http://d-nb.info/gnd/10244669>, > <http://d-nb.info/gnd/1024466-9> )) } > > which gets to the same execution plan -- > > It's optimized as well: > > (project (?uri) > (disjunction > (assign ((?uri <http://d-nb.info/gnd/10244669>)) > (bgp (triple <http://d-nb.info/gnd/10244669> > rdf:type gnd:CorporateBody))) > (assign ((?uri <http://d-nb.info/gnd/1024466-9>)) > (bgp (triple <http://d-nb.info/gnd/1024466-9> > rdf:type gnd:CorporateBody))))) > > i.e. it tries one case > > { <http://d-nb.info/gnd/10244669> rdf:type gnd:CorporateBody } > > then tries the other > > { <http://d-nb.info/gnd/10244669-9> rdf:type gnd:CorporateBody } > > which is two probes of the database, not filtering a million items. > > Andy >
