Re: Performance issue with OPTIONAL in CONSTRUCT

Andy Seaborne Wed, 11 Jul 2012 01:47:30 -0700

SELECT vs CONSTRUCT wil make little difference - a CONSTRUCT is a SELECTwith DISTINCT followed by making the RDF.


It's easier to work with SELECT * for analysis of the query pattern.


Having the optional first is a possible cause - your query asks for all of:

OPTIONAL {?xyz oboe-core:hasMeasurement ?object. }

but none of ?xyz or ?object are used in the following part of the query.

That is an unbounded cross product of the data from the OPTIONAL withthat of the rest of the pattern. If you use the SELECT * form, youshould see a huge number of results. I suspect that your large queryhas a similar effect, as well as the OPTIONAL being in a less than idealplace. a pattern involving ?x ?p ?z is going to be slow unless theoptimizer can ground one of the terms, which can sometimes whenre there is a


FILTER(?p = <uri1> || ?p = <uri2>)

but it can't do that if the FILTER is outside the OPTIONAL and thepattern inside.


{ ?x ?p ?z . FILTER(?p = <uri1> || ?p = <uri2>)  }

is converted to what is effectively

{ ?x <uri1> ?z .
  BIND(<uri1> AS ?p }
UNION
{ ?x <uri2> ?z .
  BIND(<uri2> AS ?p }

Usually you want OPTIONALs at the end of the query.


CONSTRUCT {?tempObservation0 oboe-core:ofEntity ?temp0 }

neither ?tempObservation0 nor ?temp0 appear in the pattern and will notbe bound: the result is going to the empty model, caluculated very slowly.


        Andy

This query is still not complete - no namespaces. One of th firstthings I'm likely to do is feed it into various tools such as


http://www.sparql.org/query-validator.html

which needs a complete query or the command line arq.qparse (to see theoptimized algebra --print=opt). Complete, minimal examples are appreciated.



On 10/07/12 23:16, Jewell, Paul wrote:

Sure thing.

1.)
I reduced the queries down so that they should (hopefully) be a bit more
readable.

The query with the OPTIONAL, which is sluggish, is as follows:

     CONSTRUCT {?tempObservation0 oboe-core:ofEntity ?temp0 }

     WHERE{
        OPTIONAL {?xyz oboe-core:hasMeasurement ?object. }

        ?tempObservation oboe-core:ofEntity ?temp .
        ?temp rdf:type testing:Fir .
        ?tempObservation oboe-core:hasMeasurement ?tempMeasurement .
        ?tempMeasurement oboe-core:ofCharacteristic ?tempPlaceholder0 .
        ?tempPlaceholder0 rdf:type testing:Height .
        ?tempMeasurement oboe-core:hasValue ?tempPlaceholder1 .
        ?tempPlaceholder1 oboe-core:hasCode ?hCode .
        ?tempMeasurement oboe-core:usesStandard ?tempStandard .
        ?tempStandard rdf:type ?tempStandardType .
     }

This query takes a very long time, but by simply removing the
OPTIONAL clause, it completes in less than a second. Also,
inside a SELECT query, the same WHERE statement, with the optional,
is nearly instant as well.

2.) The database is just over 100 MB, containing about 250,000 triples.

3.) Also, the FILTERS were a hacked-together workaround to avoid the
use of OPTIONALS, but the query above presents the same problem.

Paul,

A few questions:

1/ Do you have some readable versions of those queries?
2/ What size is the data?
3/ Why is it written with los of FILTERs?  Grounded patterns>

The presence of the OPTIONAL stops FILTER optimization - whether that's
because the optimizer is too dumb to know it can optimize though the
leftjoin (OPTIONAL) or it's the structure of the query, I can't tell.

     FILTER (?predicate3 = oboe-core:hasCode || ?predicate3 = rdf:type)
     ?object2 ?predicate3 ?object3 .
     OPTIONAL{ ?object3 rdf:type ?object4.
               ?object3 ?predicate3 ?object4 .}

Restricting on ?predicate3 outside and then

There look like places where property paths might help.

oboe-core:hasCode and rdf:type being equivalent seems odd modelling.

        Andy

Re: Performance issue with OPTIONAL in CONSTRUCT

Reply via email to