I am trying to piece together three different but connected portions of a graph
extracted from a large triple store, and I am surprised by some performance
results that I see. Some guidance would be appreciated.
The most obvious way to construct the desired data is the following, where I'm
trying to extract the all predicates pertaining to resources ?pc, ?e and ?fc,
and where subjects ?pc are the critical links. The values for ?pc are defined
by the subquery.
CONSTRUCT {
?pc ?p ?o .
?e ?ep ?eo .
?fc ?fp ?fo
} WHERE {
GRAPH <urn:guid:wood>
{
?pc ?p ?o .
?e sem:AnotherPred+ ?pc .
?e ?ep ?eo .
?pc sem:SomePred ?fc .
?fc ?fp ?fo
{
SELECT ?pc
WHERE
{
?pc sem:MyPred "2"
}
}
}
}
where the three different patterns in the CONSTRUCT template correspond to the
three different types of (related) data that I'm extracting. The subquery
imposes a restriction on the subjects that I'm interested in.
The above form takes about 70 seconds to run, whereas if I restructure it to
use the UNION construct, it executes in less than a second:
CONSTRUCT {
?pc ?p ?o .
?fc ?fp ?fo .
?e ?ep ?eo
}
WHERE {
GRAPH <urn:guid:wood>
{
{
?pc ?p ?o .
{
SELECT ?pc
WHERE
{
?pc sem:MyPred "2"
}
}
} UNION {
?pc sem:SomePred ?fc .
?fc ?fp ?fo
{
SELECT ?pc
WHERE
{
?pc sem:MyPred "2"
}
}
} UNION {
?e sem:AnotherPred+ ?pc .
?e ?ep ?eo
{
SELECT ?pc
WHERE
{
?pc sem:MyPred "2"
}
}
}
}
}
* Why is the second form so much faster?
* Is the SPARQL engine smart enough to see that the subquery is the same
across the three different UNION statements? (Given the speed in which it
executes, I would assume so!)
* Is there any syntactic sugar-coating that I can do, to avoid repeating
the subquery? (I'm guessing no, but perhaps something is planned for a future
version of SPARQL?)
Thanks,
-Mark