Dear Jean users,

In short, I'm wondering if there could be an option somewhere for a
top-down SPARQL evaluation mechanism.

Long version: the dataset I'm dealing with contains data in the following form:

ex:Loc1 a :Location ;
        :locatedInWork ex:Work1 ;
        :startPage 123 ;
        :endPage   234 ;
        :startVolume 1 .

ex:Loc2 a :Location ;
        :locatedInWork ex:Work1 ;
        :startPage 234 ;
        :endPage   345 ;
        :startVolume 1 ;
        :endVolume   2 .

where the absence of :endVolume denotes that the endVolume is equal to
the startVolume. This might not be kosher in terms of semantics but
that's the dataset I'm dealing with.

Now, I want to select all the locations in volume 2 (including those
starting before volume 2 and ending after volume 2), the most natural
for me is to write something like:

  ?loc :locatedInWork ex:Work1 ;
       :startVolume ?startvol .
  OPTIONAL { ?loc  :endVolume   ?endvol . }
  FILTER ((BOUND(?endvol) && ?startvol <= 2 && ?endvol >= 2) ||
(!BOUND(?endvol) && ?startvol = 2))

which works fine, but is slow to the extreme (about 8s) due to the
very large amount of triples with the :endVolume property. Now, I
understand the slow performance is sort of expected due what's
referred to as the bottom-up semantics of SPARQL. My understanding is
that the first thing that will get evaluated will be ?loc :endVolume
?endvol which will return a huge amount of results.

Here are a few questions:

- Is my analysis correct?

- In your experience of writing queries, how often do you rely on the
bottom-up semantics? (my experience is never)

- The bottom-up semantics are very counter-intuititve to me, what do
you think is the reason it got into the SPARQL specs?

- I suppose digging into the Jena code to optimize this kind of
requests in Jena must be very deep dive, am I right?

- Is there any plan or dedicated resources to optimize this kind of requests?

- What would be the complexity of writing an alternate query
evaluation mechanism using top-down semantics?

- Would having an option to evaluate a sparql query using top-down
semantics make sense? (we can have discussions of where the option
would be handled, but I think it's helpful for me to get a general
answer)

- Blazegraph advertises that they are first evaluating if the results
of a query would be the same when using a top-down and bottom-up
semantics, and if they are the same they automatically switch to the
top-down semantics, how much time do you estimate one would have to
dive into the Jena code to propose a pull request for that?

Best,
-- 
Elie

Reply via email to