Attached is the output from QueryExecution.getContext().set(ARQ.symLogExec, 
Explain.InfoLevel.ALL) as well as the stats.opt file.

Based on what I understand, it seems to be executing the triples in the 
following order:


?pat ec:Has_Id ?patId .
?pat a nci:Patient .
?findingProp rdfs:subPropertyOf ec:Has_Finding .
?finding a ?findingType .
?pat ?findingProp ?finding .


Instead of:

?pat a nci:Patient .
?pat ec:Has_Id ?patId .
?findingProp rdfs:subPropertyOf ec:Has_Finding .
?pat ?findingProp ?finding .
?finding a ?findingType .

The very general triple pattern "?finding a ?findingType" is executed before 
the more specific pattern "?pat ?findingProp ?finding". What are my options for 
influencing this? Remove the stats file and use none.opt? Modify something else?

Btw, I would also expect it to match "?pat a nci:Patient" first and then "?pat 
ec:Has_Id ?patId" since the ec:Has_Id property is used for all individuals, not 
only patients.

-Wolfgang


-----Original Message-----
From: Andy Seaborne <[email protected]>
To: users <[email protected]>
Sent: Wed, May 15, 2013 6:10 pm
Subject: Re: Unexpectedly slow query


On 15/05/13 15:36, Damian Steer wrote:
>
> On 15 May 2013, at 14:10, [email protected] wrote:
>
>> Hello,
>>
>>
>>
>> I am using the following Sparql query against a TDB store:
>>
>> SELECT *
>> WHERE {
>>    ?pat a nci:Patient .
>>    ?pat ec:Has_Id ?patId .
>>    ?findingProp rdfs:subPropertyOf ec:Has_Finding .
>>    ?pat ?findingProp ?finding .
>>    ?finding a ?findingType .
>> }
>>
>>
>> When I run this query WITHOUT the last triple (the bolded line), it returns 
the correct result within seconds.

Assuming " ?finding a ?findingType" was bold - the email was 
quote-prinable - so no markup.

>>
>> But when I run this query WITH the last triple, the query runs a very long 
time. I do not know how long b/c I cancelled it after 1 hour.
>
> I wonder wether the optimiser sees '?finding a ?findingType .' as more ground 
than the previous, and thus reorders the query?
>
> Have a look at [1] which explains some of the diagnostic features of TDB. 
Getting hold of the query plan would be very useful.
>
> Damian
>
> [1] 
> <https://jena.apache.org/documentation/tdb/optimizer.html#Investigating_what_is_going_on>

If you could do an "explain" as per Damian's suggestion that would be great

A couple of questions:

1/ Which version is this?
    --exaplin may be available if working from the command line

2/ Is there a stats file?

There was a recent fix in this area - if you are using stats.opt, you 
may find either adding rules to capture your usage or recalculating.  It 
may not - ungrounded predicates may be confusing things.

        Andy

 
12:01:34 INFO  exec                 :: QUERY
  PREFIX  afn:  <http://jena.hpl.hp.com/ARQ/function#>
  PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  PREFIX  ecIn: <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#>
  PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
  PREFIX  nci:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
  PREFIX  owl:  <http://www.w3.org/2002/07/owl#>
  PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  PREFIX  ncicp: <http://ncicb.nci.nih.gov/xml/owl/EVS/ComplexProperties.xsd#>
  PREFIX  list: <http://jena.hpl.hp.com/ARQ/list#>
  PREFIX  fn:   <http://www.w3.org/2005/xpath-functions#>
  PREFIX  ec:   <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#>
  
  SELECT  *
  WHERE
    { ?pat rdf:type nci:Patient .
      ?pat ecIn:Has_Id ?patId .
      ?findingProp rdfs:subPropertyOf ec:Has_Finding .
      ?pat ?findingProp ?finding .
      ?finding rdf:type ?findingType
    }
12:01:34 INFO  exec                 :: ALGEBRA
  (quadpattern
    (quad <urn:x-arq:DefaultGraphNode> ?pat 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Patient>)
    (quad <urn:x-arq:DefaultGraphNode> ?pat 
<http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#Has_Id> ?patId)
    (quad <urn:x-arq:DefaultGraphNode> ?findingProp 
<http://www.w3.org/2000/01/rdf-schema#subPropertyOf> 
<http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Finding>)
    (quad <urn:x-arq:DefaultGraphNode> ?pat ?findingProp ?finding)
    (quad <urn:x-arq:DefaultGraphNode> ?finding 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?findingType)
  )
12:01:34 INFO  exec                 :: Execute :: (?pat 
<http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#Has_Id> ?patId) 
(?pat rdf:type <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Patient>) 
(?findingProp rdfs:subPropertyOf 
<http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Finding>) (?finding 
rdf:type ?findingType) (?pat ?findingProp ?finding)

Reply via email to