Re: SPARQL FILTER placement in ARQ

Stephen Allen Tue, 12 Feb 2013 10:55:05 -0800

On Tue, Feb 12, 2013 at 1:36 PM, Andy Seaborne <[email protected]> wrote:
> On 12/02/13 16:20, Tayfun Gökmen Halaç wrote:
>>
>> Hi,
>>
>> I have the query below which includes filter blocks.
>>
>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>> PREFIX void: <http://rdfs.org/ns/void#>
>> SELECT (COUNT(*) AS ?count) WHERE {
>> ?referrerDataset rdf:type void:Dataset.
>> FILTER (?referrerDataset IN(<
>> http://datasets/geonames#indv_0.32581606535856833>,
>> <http://datasets/linkedMdb#indv_0.7447588411027833> ) ) .
>> ?linkset void:subjectsTarget ?referrerDataset.
>> ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
>> ?linkset void:objectsTarget ?referencedDataset.
>> ?referencedDataset1 rdf:type void:Dataset.
>> FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
>> ?referencedDataset void:uriSpace ?uriSpace.
>> }
>>
>> I placed the filter blocks in specific positions to ensure performance in
>> the query. When executing the query, ARQ changes the positions of the
>> filter blocks, and puts them at the end as seen below.
>>
>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>> PREFIX void: <http://rdfs.org/ns/void#>
>> SELECT (COUNT(*) AS ?count) WHERE {
>> ?referrerDataset rdf:type void:Dataset.
>> ?linkset void:subjectsTarget ?referrerDataset.
>> ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
>> ?linkset void:objectsTarget ?referencedDataset.
>> ?referencedDataset1 rdf:type void:Dataset.
>> ?referencedDataset void:uriSpace ?uriSpace.
>> FILTER (?referrerDataset IN(<
>> http://datasets/geonames#indv_0.32581606535856833>,
>> <http://datasets/linkedMdb#indv_0.7447588411027833> ) ) .
>> FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
>> }
>>
>> I created the query above with the code below. But, the same thing occurs
>> while I am using the QueryExecution.execSelect().
>>
>> Query originalQuery = QueryFactory.create(queryStr);
>> Op op = QueryExecutionFactory.createPlan(originalQuery,
>> DatasetGraphFactory.createMem(), null).getOp();
>> Query changedQuery = OpAsQuery.asQuery(op);
>> System.out.println(changedQuery);
>>
>> I have read in some threads in mailing list that ARQ optimizes the query
>> and places the filter blocks to the best position in the query. I use ARQ
>> 2.9.4, and my data is in a Jena in-memory model. Does anybody have an idea
>> why ARQ moves the filter blocks to the end of the query? I don't think
>> this
>> is the best position for the filter blocks.
>
>
> The SPARQL spec says all FILTERs apply to the whole block, not the triple
> patterns before it.
>
> { FILTER ( ?o = 57 )
>   ?s ?p ?o }
>
> is the same algebra expression as
>
> { ?s ?p ?o
>   FILTER ( ?o = 57 )
> }
>
> ARQ then tries to find a better execution order but FILTER NOT EXISTS is
> quite tricky.
>
> If you looks at the optimized algebra output (via sparql.org or qprint
> --print=opt) you'll see it does a lot better without the mix of the two
> filters.
>
> You can control this by writing similar, but technically different queries
>
>
>
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX void: <http://rdfs.org/ns/void#>
> SELECT (COUNT(*) AS ?count) WHERE {
>
> {
>   ?referrerDataset rdf:type void:Dataset.
>   FILTER (?referrerDataset
>        IN(<http://datasets/geonames#indv_0.32581606535856833>,
>           <http://datasets/linkedMdb#indv_0.7447588411027833> ) ) .
> }
>
> ?linkset void:subjectsTarget ?referrerDataset.
> ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
> ?linkset void:objectsTarget ?referencedDataset.
> ?referencedDataset1 rdf:type void:Dataset.
> FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
> ?referencedDataset void:uriSpace ?uriSpace.
> }
>
> In the next release codebase it seems to put FILTER/IN in the better place
> but ideal: with {...} as below the plan looks better:
>
>
>
> PREFIX void: <http://rdfs.org/ns/void#>
> SELECT (COUNT(*) AS ?count) WHERE {
> {
> ?referrerDataset rdf:type void:Dataset.
> FILTER (?referrerDataset
> IN(<http://datasets/geonames#indv_0.32581606535856833>,
> <http://datasets/linkedMdb#indv_0.7447588411027833> ) ) .
>
>
> ?linkset void:subjectsTarget ?referrerDataset.
> ?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
> ?linkset void:objectsTarget ?referencedDataset.
> }
> ?referencedDataset rdf:type void:Dataset.
> ?referencedDataset void:uriSpace ?uriSpace.
>
> FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
> }
>
>
> By the way - you have an unconstrained cross product:
>
>
> ?referencedDataset1 rdf:type void:Dataset.
>
> This pattern is not linked to anything else in the query.
>
>         Andy
>
>


You could also try a similar query that uses the VALUES operator,
which may be faster:


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void: <http://rdfs.org/ns/void#>
SELECT (COUNT(*) AS ?count) WHERE {

VALUES ?referrerDataset {
   <http://datasets/geonames#indv_0.32581606535856833>
   <http://datasets/linkedMdb#indv_0.7447588411027833>
}

?referrerDataset rdf:type void:Dataset.
?linkset void:subjectsTarget ?referrerDataset.
?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
?linkset void:objectsTarget ?referencedDataset.
?referencedDataset1 rdf:type void:Dataset.
FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
?referencedDataset void:uriSpace ?uriSpace.
}


-Stephen

Re: SPARQL FILTER placement in ARQ

Reply via email to