Mark The key thing to understand when talking about SPARQL performance is that strictly speaking evaluation is bottom up from the leftmost child operator of the query. It is easiest to talk about these things by looking at the algebra form, for your two queries:
(base <http://example/base/> (prefix ((sem: <urn:sem:>)) (graph <urn:guid:wood> (sequence (bgp (triple ?pc ?p ?o)) (path ?e (path+ sem:AnotherPred) ?pc) (bgp (triple ?e ?ep ?eo) (triple ?pc sem:SomePred ?fc) (triple ?fc ?fp ?fo) ) (project (?pc) (bgp (triple ?pc sem:MyPred "2"))))))) So in this first case he left most child is the very generic triple pattern that matches everything in the graph followed by a potentially expensive property path and another generic triple pattern, finally your specific subquery is that rightmost child so will be evaluated last. Moving the sub query earlier in your query may significantly improve performance. For your second query: (base <http://example/base/> (prefix ((sem: <urn:sem:>)) (graph <urn:guid:wood> (union (union (sequence (bgp (triple ?pc ?p ?o)) (project (?pc) (bgp (triple ?pc sem:MyPred "2")))) (sequence (bgp (triple ?pc sem:SomePred ?fc) (triple ?fc ?fp ?fo) ) (project (?pc) (bgp (triple ?pc sem:MyPred "2"))))) (sequence (path ?e (path+ sem:AnotherPred) ?pc) (bgp (triple ?e ?ep ?eo)) (project (?pc) (bgp (triple ?pc sem:MyPred "2")))))))) Hear your specific sub queries get to evaluate sooner since they are further left in the operator tree. In both these queries you see the use of the sequence operator which is essentially a streaming index join where possible solutions from the earlier operators in the sequence are substituted into the operators later in the sequence to reduce the search space. The ordering of operators in the second query presumably produces a much smaller solution space hence the faster evaluation time. Sub query results are never reused in Jena. Unfortunately there is no syntactic sugar to make repeating use of a subquery easier nor have I yet to see any proposal for such a syntax should look like. For anything to be incorporated into a future standard there typically needs to be a clear use case ( which there is) but also one or more existing extensions to the language that demonstrate such an extension is actively used. Experimenting with this in ARQ would be a nice future submission or student project. Rob On 08/08/2016 17:52, "Mark D Wood" <[email protected]> wrote: I am trying to piece together three different but connected portions of a graph extracted from a large triple store, and I am surprised by some performance results that I see. Some guidance would be appreciated. The most obvious way to construct the desired data is the following, where I'm trying to extract the all predicates pertaining to resources ?pc, ?e and ?fc, and where subjects ?pc are the critical links. The values for ?pc are defined by the subquery. CONSTRUCT { ?pc ?p ?o . ?e ?ep ?eo . ?fc ?fp ?fo } WHERE { GRAPH <urn:guid:wood> { ?pc ?p ?o . ?e sem:AnotherPred+ ?pc . ?e ?ep ?eo . ?pc sem:SomePred ?fc . ?fc ?fp ?fo { SELECT ?pc WHERE { ?pc sem:MyPred "2" } } } } where the three different patterns in the CONSTRUCT template correspond to the three different types of (related) data that I'm extracting. The subquery imposes a restriction on the subjects that I'm interested in. The above form takes about 70 seconds to run, whereas if I restructure it to use the UNION construct, it executes in less than a second: CONSTRUCT { ?pc ?p ?o . ?fc ?fp ?fo . ?e ?ep ?eo } WHERE { GRAPH <urn:guid:wood> { { ?pc ?p ?o . { SELECT ?pc WHERE { ?pc sem:MyPred "2" } } } UNION { ?pc sem:SomePred ?fc . ?fc ?fp ?fo { SELECT ?pc WHERE { ?pc sem:MyPred "2" } } } UNION { ?e sem:AnotherPred+ ?pc . ?e ?ep ?eo { SELECT ?pc WHERE { ?pc sem:MyPred "2" } } } } } * Why is the second form so much faster? * Is the SPARQL engine smart enough to see that the subquery is the same across the three different UNION statements? (Given the speed in which it executes, I would assume so!) * Is there any syntactic sugar-coating that I can do, to avoid repeating the subquery? (I'm guessing no, but perhaps something is planned for a future version of SPARQL?) Thanks, -Mark
