On 27/07/13 13:00, [email protected] wrote:
Hi Andy,

thanks for the UNION suggestion. I am going to give this a shot.
Ultimately I am looking at 40-50 of these optional triple patterns in
one query for about 50.000 patients. I am a little worried about how
the query performance is going to be.

The union approach will scale linearly. read as "and then do", which is pretty much as it is implemented.

As the number of blocks grows, the effect of UNION vs OPTIONAL will increase as well.

I noticed the increased redundancy. It seemed a little unexpected to
me coming from the relational DB world where a query like that would
be quite fast and create a fairly efficient result set. But I guess
the DB has more hierarchical knowledge and would be designed for
exactly this purpose whereas a triple store has to accommodate a lot
more angles.

That's certainly true - a patient could be a single row of a highly denormalised table (I've seen this done to support call centers for speed and per-customer isolation) - the table had all the details in a row

(patient, dypsneaType, dypsneaType, ...)

and lots of nulls. The denormalisation is assuming one exactly one of each type. It's like precalculating the results for a particular access pattern.

Would more specific sub-properties be of any help? Instead of
Has_Finding use Has_Dyspnea, Has_Dysphagia etc. Something like that
would only affect the query itself (easier to find the triples and
the rdfs:subClassOf* triple pattern is not required anymore), not the
number of rows in the result set, correct? Since it still has to do
the cross-product?

The rdfs:subClassOf* may be expensive, depending on the vocabulary. It's not likely to alter the scaling, it does give a constant and maybe noticeable per result cost.


Would it be better to just get all Has_Finding triples and format the
result in custom post-processing?

Or should I split it into multiple queries and combine the results in
post-processing?

I was playing around with sub-queries a little bit, but I noticed
that a variable bound in the main query is not passed to the
sub-query. So Idid not make much progress there.

Evaluation is "inside outwards" : logically each {} block is evaluated and then combined. We all read queries top to bottom as written and ARQ happens to prefer to execute like that - it checks it's legal first though.

-Wolfgang

Wolfgang








Reply via email to