Thank you.
This will need some detailed investigation.
The disconnected
(?finding rdf:type ?findingType)
looks to be causing the explosion, which is undesirable.
You may find that "fixed.opt" does an adequate job.
You can always take control by using none.opt when the query order will
be respected.
Andy
On 16/05/13 11:10, [email protected] wrote:
Attached is the output from
QueryExecution.getContext().set(ARQ.symLogExec, Explain.InfoLevel.ALL)as
well as the stats.opt file.
Based on what I understand, it seems to be executing the triples in the
following order:
Instead of:
?pat ec:Has_Id ?patId .
?pat a nci:Patient .
?findingProp rdfs:subPropertyOf ec:Has_Finding .
?finding a ?findingType .
?pat ?findingProp ?finding .
?pat a nci:Patient .
?pat ec:Has_Id ?patId .
?findingProp rdfs:subPropertyOf ec:Has_Finding .
?pat ?findingProp ?finding .
?finding a ?findingType .
The very generaltriple pattern "?finding a ?findingType" is executed
before the more specific pattern "?pat ?findingProp ?finding". What are
my options for influencing this?Remove the stats file and use none.opt?
Modify something else?
Btw, I would also expect it to match "?pat a nci:Patient" first and then
"?pat ec:Has_Id ?patId" since the ec:Has_Id property is used for all
individuals, not only patients.
-Wolfgang
-----Original Message-----
From: Andy Seaborne <[email protected]>
To: users <[email protected]>
Sent: Wed, May 15, 2013 6:10 pm
Subject: Re: Unexpectedly slow query
On 15/05/13 15:36, Damian Steer wrote:
On 15 May 2013, at 14:10,[email protected] wrote:
Hello,
I am using the following Sparql query against a TDB store:
SELECT *
WHERE {
?pat a nci:Patient .
?pat ec:Has_Id ?patId .
?findingProp rdfs:subPropertyOf ec:Has_Finding .
?pat ?findingProp ?finding .
?finding a ?findingType .
}
When I run this query WITHOUT the last triple (the bolded line), it returns
the correct result within seconds.
Assuming " ?finding a ?findingType" was bold - the email was
quote-prinable - so no markup.
But when I run this query WITH the last triple, the query runs a very long
time. I do not know how long b/c I cancelled it after 1 hour.
I wonder wether the optimiser sees '?finding a ?findingType .' as more ground
than the previous, and thus reorders the query?
Have a look at [1] which explains some of the diagnostic features of TDB.
Getting hold of the query plan would be very useful.
Damian
[1]
<https://jena.apache.org/documentation/tdb/optimizer.html#Investigating_what_is_going_on>
If you could do an "explain" as per Damian's suggestion that would be great
A couple of questions:
1/ Which version is this?
--exaplin may be available if working from the command line
2/ Is there a stats file?
There was a recent fix in this area - if you are using stats.opt, you
may find either adding rules to capture your usage or recalculating. It
may not - ungrounded predicates may be confusing things.
Andy