On Tue, Feb 12, 2013 at 1:36 PM, Andy Seaborne <[email protected]> wrote: > On 12/02/13 16:20, Tayfun Gökmen Halaç wrote: >> >> Hi, >> >> I have the query below which includes filter blocks. >> >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> PREFIX void: <http://rdfs.org/ns/void#> >> SELECT (COUNT(*) AS ?count) WHERE { >> ?referrerDataset rdf:type void:Dataset. >> FILTER (?referrerDataset IN(< >> http://datasets/geonames#indv_0.32581606535856833>, >> <http://datasets/linkedMdb#indv_0.7447588411027833> ) ) . >> ?linkset void:subjectsTarget ?referrerDataset. >> ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>. >> ?linkset void:objectsTarget ?referencedDataset. >> ?referencedDataset1 rdf:type void:Dataset. >> FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.} >> ?referencedDataset void:uriSpace ?uriSpace. >> } >> >> I placed the filter blocks in specific positions to ensure performance in >> the query. When executing the query, ARQ changes the positions of the >> filter blocks, and puts them at the end as seen below. >> >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> PREFIX void: <http://rdfs.org/ns/void#> >> SELECT (COUNT(*) AS ?count) WHERE { >> ?referrerDataset rdf:type void:Dataset. >> ?linkset void:subjectsTarget ?referrerDataset. >> ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>. >> ?linkset void:objectsTarget ?referencedDataset. >> ?referencedDataset1 rdf:type void:Dataset. >> ?referencedDataset void:uriSpace ?uriSpace. >> FILTER (?referrerDataset IN(< >> http://datasets/geonames#indv_0.32581606535856833>, >> <http://datasets/linkedMdb#indv_0.7447588411027833> ) ) . >> FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.} >> } >> >> I created the query above with the code below. But, the same thing occurs >> while I am using the QueryExecution.execSelect(). >> >> Query originalQuery = QueryFactory.create(queryStr); >> Op op = QueryExecutionFactory.createPlan(originalQuery, >> DatasetGraphFactory.createMem(), null).getOp(); >> Query changedQuery = OpAsQuery.asQuery(op); >> System.out.println(changedQuery); >> >> I have read in some threads in mailing list that ARQ optimizes the query >> and places the filter blocks to the best position in the query. I use ARQ >> 2.9.4, and my data is in a Jena in-memory model. Does anybody have an idea >> why ARQ moves the filter blocks to the end of the query? I don't think >> this >> is the best position for the filter blocks. > > > The SPARQL spec says all FILTERs apply to the whole block, not the triple > patterns before it. > > { FILTER ( ?o = 57 ) > ?s ?p ?o } > > is the same algebra expression as > > { ?s ?p ?o > FILTER ( ?o = 57 ) > } > > ARQ then tries to find a better execution order but FILTER NOT EXISTS is > quite tricky. > > If you looks at the optimized algebra output (via sparql.org or qprint > --print=opt) you'll see it does a lot better without the mix of the two > filters. > > You can control this by writing similar, but technically different queries > > > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > PREFIX void: <http://rdfs.org/ns/void#> > SELECT (COUNT(*) AS ?count) WHERE { > > { > ?referrerDataset rdf:type void:Dataset. > FILTER (?referrerDataset > IN(<http://datasets/geonames#indv_0.32581606535856833>, > <http://datasets/linkedMdb#indv_0.7447588411027833> ) ) . > } > > ?linkset void:subjectsTarget ?referrerDataset. > ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>. > ?linkset void:objectsTarget ?referencedDataset. > ?referencedDataset1 rdf:type void:Dataset. > FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.} > ?referencedDataset void:uriSpace ?uriSpace. > } > > In the next release codebase it seems to put FILTER/IN in the better place > but ideal: with {...} as below the plan looks better: > > > > PREFIX void: <http://rdfs.org/ns/void#> > SELECT (COUNT(*) AS ?count) WHERE { > { > ?referrerDataset rdf:type void:Dataset. > FILTER (?referrerDataset > IN(<http://datasets/geonames#indv_0.32581606535856833>, > <http://datasets/linkedMdb#indv_0.7447588411027833> ) ) . > > > ?linkset void:subjectsTarget ?referrerDataset. > ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>. > ?linkset void:objectsTarget ?referencedDataset. > } > ?referencedDataset rdf:type void:Dataset. > ?referencedDataset void:uriSpace ?uriSpace. > > FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.} > } > > > By the way - you have an unconstrained cross product: > > > ?referencedDataset1 rdf:type void:Dataset. > > This pattern is not linked to anything else in the query. > > Andy > >
You could also try a similar query that uses the VALUES operator, which may be faster: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX void: <http://rdfs.org/ns/void#> SELECT (COUNT(*) AS ?count) WHERE { VALUES ?referrerDataset { <http://datasets/geonames#indv_0.32581606535856833> <http://datasets/linkedMdb#indv_0.7447588411027833> } ?referrerDataset rdf:type void:Dataset. ?linkset void:subjectsTarget ?referrerDataset. ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>. ?linkset void:objectsTarget ?referencedDataset. ?referencedDataset1 rdf:type void:Dataset. FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.} ?referencedDataset void:uriSpace ?uriSpace. } -Stephen
