SPARQL query optimization question

Mark D Wood Mon, 08 Aug 2016 09:53:23 -0700

I am trying to piece together three different but connected portions of a graph 
extracted from a large triple store, and I am surprised by some performance 
results that I see.   Some guidance would be appreciated.


The most obvious way to construct the desired data is the following, where I'm 
trying to extract the all predicates pertaining to resources ?pc, ?e and ?fc, 
and where subjects ?pc are the critical links.   The values for ?pc are defined 
by the subquery.

CONSTRUCT {
    ?pc ?p ?o .
    ?e ?ep ?eo .
    ?fc ?fp ?fo
} WHERE {
    GRAPH <urn:guid:wood>
    {
        ?pc ?p ?o .
        ?e sem:AnotherPred+ ?pc .
        ?e ?ep ?eo .
        ?pc  sem:SomePred ?fc .
        ?fc ?fp ?fo
        {
            SELECT ?pc
            WHERE
            {
                ?pc  sem:MyPred "2"
            }
        }
    }
}

where the three different patterns in the CONSTRUCT template correspond to the 
three different types of (related) data that I'm extracting.  The subquery 
imposes a restriction on the subjects that I'm interested in.

The above form takes about 70 seconds to run, whereas if I restructure it to 
use the UNION construct, it executes in less than a second:

CONSTRUCT {
    ?pc ?p ?o .
    ?fc ?fp ?fo .
    ?e ?ep ?eo
}
WHERE {
    GRAPH <urn:guid:wood>
    {
        {
            ?pc ?p ?o .
            {
                SELECT ?pc
                WHERE
                {
                    ?pc  sem:MyPred "2"
                }
            }
        } UNION {
            ?pc  sem:SomePred ?fc .
            ?fc ?fp ?fo
            {
                SELECT ?pc
                WHERE
                {
                    ?pc  sem:MyPred "2"
                }
            }
        } UNION {
            ?e sem:AnotherPred+ ?pc .
            ?e ?ep ?eo
            {
                SELECT ?pc
                WHERE
                {
                    ?pc  sem:MyPred "2"
                }
            }
        }
    }
}


*       Why is the second form so much faster?

*       Is the SPARQL engine smart enough to see that the subquery is the same 
across the three different UNION statements? (Given the speed in which it 
executes, I would assume so!)

*       Is there any syntactic sugar-coating that I can do, to avoid repeating 
the subquery?  (I'm guessing no, but perhaps something is planned for a future 
version of SPARQL?)

Thanks,
-Mark

SPARQL query optimization question

Reply via email to