On 12/08/13 10:26, [email protected] wrote:
Hi,
(sorry - I didn't reply to the original; the lists are approaching the
levels of July and it's only the 12th)
Which version are you using?
The last release now calculates stats for rdf:type - if you don't see
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ...) ...)
in the stats file, then it was presumably built before that change.
I got a question about how the query optimizer decides which triple pattern to
evaluate first. My basic query is:
PREFIX ec: <http://www.eurocat.info/ontology/eurocat.owl#>
PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#
SELECT *
WHERE {
?pat a ec:Patient .
?pat ec:Has_Disease ?Disease .
?Disease a ?DiseaseType .
?DiseaseType ec:descendantOf nci:Diseases_and_Disorders .
}
My stats.opt file shows:
(where do the ";" come from?)
(<http://www.eurocat.info/ontology/eurocat.owl#Has_Disease>; 755)
(<http://www.eurocat.info/ontology/eurocat.owl#descendantOf>; 917730)
The execution plan for this query is:
(?DiseaseType <http://www.eurocat.info/ontology/eurocat.owl#descendantOf>;
<http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Diseases_and_Disorders>;)
(?Patient <http://www.eurocat.info/ontology/eurocat.owl#Has_Disease>; ?Disease)
(?Patient rdf:type <http://www.eurocat.info/ontology/eurocat.owl#Patient>;)
(?Disease rdf:type ?DiseaseType)
You can add to the stats file to help it.
( (VAR rdf:type <http://www.eurocat.info/ontology/eurocat.owl#Patient>)
COUNT1)
from
SELECT (count(*) AS ?COUNT1)
{ ?v a <http://www.eurocat.info/ontology/eurocat.owl#Patient> }
The optimizer had a mild adversion to using rdf:type prior to 2.10.1
With my current test data, this query takes 43 seconds and returns
777rows. Since this struck me as very long and the execution plan seems
inefficient, I removed the stats.opt file to make it use the query path
"as is" and the execution time was reduced to 1 second!
descendantOf has a much bigger count in stats.opt than Has_Disease.
Is that the reason why it chose to evaluate it first even though
Has_Disease is the better choice for narrowing down the result set quickly?
The query "?DiseaseType ec:descendantOf nci:Diseases_and_Disorders .
"
by itself takes 20 seconds. So this is obviously the expensive operation.
-Wolfgang
P.S.: descendantOf of is just rdfs:subClassOf* asserted directly.
Could you post the complete stats file please?
Andy