Dear Jean users,
In short, I'm wondering if there could be an option somewhere for a
top-down SPARQL evaluation mechanism.
Long version: the dataset I'm dealing with contains data in the following form:
ex:Loc1 a :Location ;
:locatedInWork ex:Work1 ;
:startPage 123 ;
:endPage 234 ;
:startVolume 1 .
ex:Loc2 a :Location ;
:locatedInWork ex:Work1 ;
:startPage 234 ;
:endPage 345 ;
:startVolume 1 ;
:endVolume 2 .
where the absence of :endVolume denotes that the endVolume is equal to
the startVolume. This might not be kosher in terms of semantics but
that's the dataset I'm dealing with.
Now, I want to select all the locations in volume 2 (including those
starting before volume 2 and ending after volume 2), the most natural
for me is to write something like:
?loc :locatedInWork ex:Work1 ;
:startVolume ?startvol .
OPTIONAL { ?loc :endVolume ?endvol . }
FILTER ((BOUND(?endvol) && ?startvol <= 2 && ?endvol >= 2) ||
(!BOUND(?endvol) && ?startvol = 2))
which works fine, but is slow to the extreme (about 8s) due to the
very large amount of triples with the :endVolume property. Now, I
understand the slow performance is sort of expected due what's
referred to as the bottom-up semantics of SPARQL. My understanding is
that the first thing that will get evaluated will be ?loc :endVolume
?endvol which will return a huge amount of results.
Here are a few questions:
- Is my analysis correct?
- In your experience of writing queries, how often do you rely on the
bottom-up semantics? (my experience is never)
- The bottom-up semantics are very counter-intuititve to me, what do
you think is the reason it got into the SPARQL specs?
- I suppose digging into the Jena code to optimize this kind of
requests in Jena must be very deep dive, am I right?
- Is there any plan or dedicated resources to optimize this kind of requests?
- What would be the complexity of writing an alternate query
evaluation mechanism using top-down semantics?
- Would having an option to evaluate a sparql query using top-down
semantics make sense? (we can have discussions of where the option
would be handled, but I think it's helpful for me to get a general
answer)
- Blazegraph advertises that they are first evaluating if the results
of a query would be the same when using a top-down and bottom-up
semantics, and if they are the same they automatically switch to the
top-down semantics, how much time do you estimate one would have to
dive into the Jena code to propose a pull request for that?
Best,
--
Elie