On 12/02/13 16:20, Tayfun Gökmen Halaç wrote:
Hi,
I have the query below which includes filter blocks.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void: <http://rdfs.org/ns/void#>
SELECT (COUNT(*) AS ?count) WHERE {
?referrerDataset rdf:type void:Dataset.
FILTER (?referrerDataset IN(<
http://datasets/geonames#indv_0.32581606535856833>,
<http://datasets/linkedMdb#indv_0.7447588411027833> ) ) .
?linkset void:subjectsTarget ?referrerDataset.
?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
?linkset void:objectsTarget ?referencedDataset.
?referencedDataset1 rdf:type void:Dataset.
FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
?referencedDataset void:uriSpace ?uriSpace.
}
I placed the filter blocks in specific positions to ensure performance in
the query. When executing the query, ARQ changes the positions of the
filter blocks, and puts them at the end as seen below.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void: <http://rdfs.org/ns/void#>
SELECT (COUNT(*) AS ?count) WHERE {
?referrerDataset rdf:type void:Dataset.
?linkset void:subjectsTarget ?referrerDataset.
?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
?linkset void:objectsTarget ?referencedDataset.
?referencedDataset1 rdf:type void:Dataset.
?referencedDataset void:uriSpace ?uriSpace.
FILTER (?referrerDataset IN(<
http://datasets/geonames#indv_0.32581606535856833>,
<http://datasets/linkedMdb#indv_0.7447588411027833> ) ) .
FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
}
I created the query above with the code below. But, the same thing occurs
while I am using the QueryExecution.execSelect().
Query originalQuery = QueryFactory.create(queryStr);
Op op = QueryExecutionFactory.createPlan(originalQuery,
DatasetGraphFactory.createMem(), null).getOp();
Query changedQuery = OpAsQuery.asQuery(op);
System.out.println(changedQuery);
I have read in some threads in mailing list that ARQ optimizes the query
and places the filter blocks to the best position in the query. I use ARQ
2.9.4, and my data is in a Jena in-memory model. Does anybody have an idea
why ARQ moves the filter blocks to the end of the query? I don't think this
is the best position for the filter blocks.
The SPARQL spec says all FILTERs apply to the whole block, not the
triple patterns before it.
{ FILTER ( ?o = 57 )
?s ?p ?o }
is the same algebra expression as
{ ?s ?p ?o
FILTER ( ?o = 57 )
}
ARQ then tries to find a better execution order but FILTER NOT EXISTS is
quite tricky.
If you looks at the optimized algebra output (via sparql.org or qprint
--print=opt) you'll see it does a lot better without the mix of the two
filters.
You can control this by writing similar, but technically different queries
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void: <http://rdfs.org/ns/void#>
SELECT (COUNT(*) AS ?count) WHERE {
{
?referrerDataset rdf:type void:Dataset.
FILTER (?referrerDataset
IN(<http://datasets/geonames#indv_0.32581606535856833>,
<http://datasets/linkedMdb#indv_0.7447588411027833> ) ) .
}
?linkset void:subjectsTarget ?referrerDataset.
?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
?linkset void:objectsTarget ?referencedDataset.
?referencedDataset1 rdf:type void:Dataset.
FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
?referencedDataset void:uriSpace ?uriSpace.
}
In the next release codebase it seems to put FILTER/IN in the better
place but ideal: with {...} as below the plan looks better:
PREFIX void: <http://rdfs.org/ns/void#>
SELECT (COUNT(*) AS ?count) WHERE {
{
?referrerDataset rdf:type void:Dataset.
FILTER (?referrerDataset
IN(<http://datasets/geonames#indv_0.32581606535856833>,
<http://datasets/linkedMdb#indv_0.7447588411027833> ) ) .
?linkset void:subjectsTarget ?referrerDataset.
?linkset void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs>.
?linkset void:objectsTarget ?referencedDataset.
}
?referencedDataset rdf:type void:Dataset.
?referencedDataset void:uriSpace ?uriSpace.
FILTER NOT EXISTS {?referencedDataset void:sparqlEndpoint ?endpoint.}
}
By the way - you have an unconstrained cross product:
?referencedDataset1 rdf:type void:Dataset.
This pattern is not linked to anything else in the query.
Andy
Thank you.
Best regards,
Tayfun Gokmen Halac