Hello the query algebra has the following structure
|(project (?book ?title)|| || (filter (|| (&& (&& (bound ?endvol) (<= ?startvol 2)) (>= ?endvol 2)) (&& (! (bound ?endvol)) (= ?startvol 2)))|| || (leftjoin|| || (bgp|| || (triple ?loc :locatedInWork ex:Work1)|| || (triple ?loc :startVolume ?startvol)|| || )|| || (bgp (triple ?loc :endVolume ?endvol)))))))| (optimized) |(project (?book ?title)|| || (filter (|| (&& (&& (bound ?endvol) (<= ?startvol 2)) (>= ?endvol 2)) (&& (! (bound ?endvol)) (= ?startvol 2)))|| || (conditional|| || (bgp|| || (triple ?loc :locatedInWork ex:Work1)|| || (triple ?loc :startVolume ?startvol)|| || )|| || (bgp (triple ?loc :endVolume ?endvol)))))))|| | You can see, an OPTIONAL is basically a left outer join. If you're using TDB some statistics on the data could be taken into account by an optimizer. You can check this by followoing the steps here [1] [1] https://jena.apache.org/documentation/tdb/optimizer.html > Dear Jean users, > > In short, I'm wondering if there could be an option somewhere for a > top-down SPARQL evaluation mechanism. > > Long version: the dataset I'm dealing with contains data in the following > form: > > ex:Loc1 a :Location ; > :locatedInWork ex:Work1 ; > :startPage 123 ; > :endPage 234 ; > :startVolume 1 . > > ex:Loc2 a :Location ; > :locatedInWork ex:Work1 ; > :startPage 234 ; > :endPage 345 ; > :startVolume 1 ; > :endVolume 2 . > > where the absence of :endVolume denotes that the endVolume is equal to > the startVolume. This might not be kosher in terms of semantics but > that's the dataset I'm dealing with. > > Now, I want to select all the locations in volume 2 (including those > starting before volume 2 and ending after volume 2), the most natural > for me is to write something like: > > ?loc :locatedInWork ex:Work1 ; > :startVolume ?startvol . > OPTIONAL { ?loc :endVolume ?endvol . } > FILTER ((BOUND(?endvol) && ?startvol <= 2 && ?endvol >= 2) || > (!BOUND(?endvol) && ?startvol = 2)) > > which works fine, but is slow to the extreme (about 8s) due to the > very large amount of triples with the :endVolume property. Now, I > understand the slow performance is sort of expected due what's > referred to as the bottom-up semantics of SPARQL. My understanding is > that the first thing that will get evaluated will be ?loc :endVolume > ?endvol which will return a huge amount of results. > > Here are a few questions: > > - Is my analysis correct? > > - In your experience of writing queries, how often do you rely on the > bottom-up semantics? (my experience is never) > > - The bottom-up semantics are very counter-intuititve to me, what do > you think is the reason it got into the SPARQL specs? > > - I suppose digging into the Jena code to optimize this kind of > requests in Jena must be very deep dive, am I right? > > - Is there any plan or dedicated resources to optimize this kind of requests? > > - What would be the complexity of writing an alternate query > evaluation mechanism using top-down semantics? > > - Would having an option to evaluate a sparql query using top-down > semantics make sense? (we can have discussions of where the option > would be handled, but I think it's helpful for me to get a general > answer) > > - Blazegraph advertises that they are first evaluating if the results > of a query would be the same when using a top-down and bottom-up > semantics, and if they are the same they automatically switch to the > top-down semantics, how much time do you estimate one would have to > dive into the Jena code to propose a pull request for that? > > Best, -- Lorenz Bühmann AKSW group, University of Leipzig Group: http://aksw.org - semantic web research center
