Sorry about the ecIn: vs. ec: namespaces/prefixes. I removed the ecIn namespace 
in the process. But in general, the Has_Id property is not the crucial one, 
even though it is also put into the wrong place in the execution plan.

My original query (modified for prefixes) is:


 SELECT  *
   WHERE
     { ?pat rdf:type nci:Patient .
       ?pat ec:Has_Id ?patId .
       ?findingProp rdfs:subPropertyOf ec:Has_Finding .
       ?pat ?findingProp ?finding .
       ?finding rdf:type ?findingType
     }
This to me also represents the most efficient order of triple patterns for the 
execution plan. But the execution plan I initially got without regenerating the 
stats file after inserting all my data triples was:

(?pat <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#Has_Id> 
?patId)
(?pat rdf:type <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Patient>)
(?findingProp rdfs:subPropertyOf 
<http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Finding>)
(?finding rdf:type ?findingType)
(?pat ?findingProp ?finding)

 
The main problem here is that (?pat ?findingProp ?finding) comes after 
(?finding rdf:type ?findingType) .
The other problem is that (?pat ec:Has_Id ?patId)   comes first, even though 
(?pat rdf:type nci:Patient>) is more restrictive.

After regenerating the stats file I got this:

(?findingProp rdfs:subPropertyOf 
<http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Finding>)
(?pat rdf:type <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Patient>)
(?pat <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Id> ?patId)
(?finding rdf:type ?findingType)
(?pat ?findingProp ?finding)



 This version fixed the ec:Has_Id problem since it is now located after the 
(?pat a nci:Patient) triple pattern. But we still have the problem with the 
last two lines. They should be reversed since (?pat ?findingProp ?finding) is 
way more restrictive than (?finding a ?findingType). The new stats file did 
however cause the (?findingProp rdfs:subPropertyOf ec:Has_Finding) triple 
pattern to be moved to the top.

?findingProp is bound to the predicates ec:Has_Dysnpea_Score and 
ec:Has_Dysphagia_Score, which are sub-properties of ec:Has_Finding. The stats 
file shows entries for ec:Has_Dysnpea_Score and ec:Has_Dysphagia_Score (counts 
are 7 and 8). But it chooses to put (?pat ?findingProp ?finding) last and 
process (?finding a ?findingType) first, which matches pretty much everything 
in the entire store first since ?finding is not bound to something more 
specific yet.

To sum things up:
Regenerating the stats file after I imported all my data individuals solved the 
ec:Has_Id problem. But it does not address the ?findingProp problem.

I hope this helps clarifying the overall picture and thank you very much for 
your help!

-Wolfgang


 

-----Original Message-----
From: Andy Seaborne <[email protected]>
To: users <[email protected]>
Sent: Thu, May 23, 2013 8:40 pm
Subject: Re: Unexpectedly slow query


Wolfgang,

I confused as to what the setup is that you have.  Let's step back and 
establish what the setup is here.

1/ The original query you sent was:

   SELECT  *
   WHERE
     { ?pat rdf:type nci:Patient .
       ?pat ecIn:Has_Id ?patId .
       ?findingProp rdfs:subPropertyOf ec:Has_Finding .
       ?pat ?findingProp ?finding .
       ?finding rdf:type ?findingType
     }

but your emails refer to  ec:Has_Id (not ecIn:)

ecIn: <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#>
ec:   <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#>

which is it?

There are no properties 'euroCAT.owl/instances#' in stats file.

2/ The stats file is not generated by the recent releases which handles

?? rdf:type :SomeType

much better.  You only need to regenerate the stats file using the 
latest release - it'll work well with previous releases.

3/ The database has no inference capabilities itself.
What are you expecting the :Has_Finding to do to influenece the rest of 
the query plan?

        Andy

On 23/05/13 08:46, [email protected] wrote:
> I am using Sparql queries via Dataset and QueryExecution. No in-memory
> inference defined. The stats file with my predicates is attached. It has
> those for Has_Id, Has_Dyspnea_Score and Has_Dysphagia_Score, but it
> apparently cannot infer that the latter two are also Has_Finding and
> therefore can be used to narrow down the result set.
>
> -Wolfgang
>
>
>
> -----Original Message-----
> From: Andy Seaborne <[email protected]>
> To: users <[email protected]>
> Sent: Sat, May 18, 2013 1:17 pm
> Subject: Re: Unexpectedly slow query
>
> On 17/05/13 13:27,[email protected]  wrote:
>>
>>
>> I ran tdbstats again on the fully loaded triple store (with all the patient
> data as individuals and their relationships). My properties appear now in the
> stats file. But only the properties with explicit triples, not the inferred
> parent properties. E.g. I am using the following property type hierarchy:
>
> What is your inference setup?
>
>>
>> Has_Finding
>>     - Has_Dysnpea_Score
>>     - Has_Dysphagia_Score
>>     - Is_Dead
>>
>>
>> There are no explicit triples stating e.g. that a patient Has_Finding
> Dyspnea_Score_2. But there are triples using the sub-properties, e.g. Patient
> Has_Dysnpea_Score Dyspnea_Score_2.
>
> Can you share the stats file?  I can't investigate the situation without
> a test case.
>
>>
>> The stats file now contains entries for the sub-properties, but not for
> Has_Finding.
>>
>> The execution plan changed slightly though, but the crucial triple patterns
> are still in the "wrong" order.
>>
>> It used to be:
>> (?pat <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl/instances#Has_Id>
> ?patId)
>
> The original stats file had no mention of #Has_Id and actaully said (at
> the end) that missing predciates were to be counted as having zero
> occurences.  The optimizer puts one of these first because the rest of
> the pattern will never be reached if it's accurate.
>
>> (?pat rdf:type <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Patient>)
>> (?findingProp rdfs:subPropertyOf 
>> <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Finding>)
>> (?finding rdf:type ?findingType)
>> (?pat ?findingProp ?finding)
>>
>> Now it is:
>> (?findingProp rdfs:subPropertyOf 
>> <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Finding>)
>> (?pat rdf:type <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Patient>)
>> (?pat <http://www.siemens.com/euroCAT/2011/8/euroCAT.owl#Has_Id> ?patId)
>> (?finding rdf:type ?findingType)
>> (?pat ?findingProp ?finding)
>
>>
>> The triple pattern for the Has_Finding sub-properties moved to the start, but
> the crucial (?finding rdf:type ?findingType) is still evaluated before (?pat
> ?findingProp ?finding) -> the query is still taking a very long time.
>>
>> I can go ahead and use "fixed.opt" instead of "stats.opt", but I am still
> interested in whether there is a solution to this problem. I am using Jena
> 2.10.0.
>>
>> Hope this info helps!
>>
>> -Wolfgang
>>
>>
>>
>>
>> -----Original Message-----
>> From: hueyl16 <[email protected]>
>> To: users <[email protected]>
>> Sent: Fri, May 17, 2013 1:43 pm
>> Subject: Re: Unexpectedly slow query
>>
>>
>>   I was wondering about that too. I could only find entries related to NCI
> terms. How or when is the stats file generated?
>>
>> I am using the .bat versions of the tdbloader for importing the NCIt first 
and
> then my own ontology, which contains Has_Id and Has_Finding plus more.
>>
>>
>> I also ran tdbstats once but it did not change the stats file, just printed
> it.
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Andy Seaborne <[email protected]>
>> To: users <[email protected]>
>> Sent: Fri, May 17, 2013 1:25 pm
>> Subject: Re: Unexpectedly slow query
>>
>>
>> (I now have the stats file)
>>
>> Wolfgang,
>>
>> I don't see entries for ec:Has_Id and ec:Has_Finding.
>>
>>      Andy
>>
>>
>>
>>
>>
>>
>


 
: 

Reply via email to