Attached is the output from QueryExecution.getContext().set(ARQ.symLogExec, Explain.InfoLevel.ALL) as well as the stats.opt file.
Based on what I understand, it seems to be executing the triples in the following order: ?pat ec:Has_Id ?patId . ?pat a nci:Patient . ?findingProp rdfs:subPropertyOf ec:Has_Finding . ?finding a ?findingType . ?pat ?findingProp ?finding . Instead of: ?pat a nci:Patient . ?pat ec:Has_Id ?patId . ?findingProp rdfs:subPropertyOf ec:Has_Finding . ?pat ?findingProp ?finding . ?finding a ?findingType . The very general triple pattern "?finding a ?findingType" is executed before the more specific pattern "?pat ?findingProp ?finding". What are my options for influencing this? Remove the stats file and use none.opt? Modify something else? Btw, I would also expect it to match "?pat a nci:Patient" first and then "?pat ec:Has_Id ?patId" since the ec:Has_Id property is used for all individuals, not only patients. -Wolfgang -----Original Message----- From: Andy Seaborne <[email protected]> To: users <[email protected]> Sent: Wed, May 15, 2013 6:10 pm Subject: Re: Unexpectedly slow query On 15/05/13 15:36, Damian Steer wrote: > > On 15 May 2013, at 14:10, [email protected] wrote: > >> Hello, >> >> >> >> I am using the following Sparql query against a TDB store: >> >> SELECT * >> WHERE { >> ?pat a nci:Patient . >> ?pat ec:Has_Id ?patId . >> ?findingProp rdfs:subPropertyOf ec:Has_Finding . >> ?pat ?findingProp ?finding . >> ?finding a ?findingType . >> } >> >> >> When I run this query WITHOUT the last triple (the bolded line), it returns the correct result within seconds. Assuming " ?finding a ?findingType" was bold - the email was quote-prinable - so no markup. >> >> But when I run this query WITH the last triple, the query runs a very long time. I do not know how long b/c I cancelled it after 1 hour. > > I wonder wether the optimiser sees '?finding a ?findingType .' as more ground than the previous, and thus reorders the query? > > Have a look at [1] which explains some of the diagnostic features of TDB. Getting hold of the query plan would be very useful. > > Damian > > [1] > <https://jena.apache.org/documentation/tdb/optimizer.html#Investigating_what_is_going_on> If you could do an "explain" as per Damian's suggestion that would be great A couple of questions: 1/ Which version is this? --exaplin may be available if working from the command line 2/ Is there a stats file? There was a recent fix in this area - if you are using stats.opt, you may find either adding rules to capture your usage or recalculating. It may not - ungrounded predicates may be confusing things. Andy
12:01:34 INFO exec :: QUERY PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX ecIn: <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ncicp: <http://ncicb.nci.nih.gov/xml/owl/EVS/ComplexProperties.xsd#> PREFIX list: <http://jena.hpl.hp.com/ARQ/list#> PREFIX fn: <http://www.w3.org/2005/xpath-functions#> PREFIX ec: <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#> SELECT * WHERE { ?pat rdf:type nci:Patient . ?pat ecIn:Has_Id ?patId . ?findingProp rdfs:subPropertyOf ec:Has_Finding . ?pat ?findingProp ?finding . ?finding rdf:type ?findingType } 12:01:34 INFO exec :: ALGEBRA (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?pat <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Patient>) (quad <urn:x-arq:DefaultGraphNode> ?pat <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#Has_Id> ?patId) (quad <urn:x-arq:DefaultGraphNode> ?findingProp <http://www.w3.org/2000/01/rdf-schema#subPropertyOf> <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Finding>) (quad <urn:x-arq:DefaultGraphNode> ?pat ?findingProp ?finding) (quad <urn:x-arq:DefaultGraphNode> ?finding <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?findingType) ) 12:01:34 INFO exec :: Execute :: (?pat <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#Has_Id> ?patId) (?pat rdf:type <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Patient>) (?findingProp rdfs:subPropertyOf <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Finding>) (?finding rdf:type ?findingType) (?pat ?findingProp ?finding)
