Hello

the query algebra has the following structure


|(project (?book ?title)||
||      (filter (|| (&& (&& (bound ?endvol) (<= ?startvol 2)) (>=
?endvol 2)) (&& (! (bound ?endvol)) (= ?startvol 2)))||
||        (leftjoin||
||          (bgp||
||            (triple ?loc :locatedInWork ex:Work1)||
||            (triple ?loc :startVolume ?startvol)||
||          )||
||          (bgp (triple ?loc :endVolume ?endvol)))))))|

(optimized)

|(project (?book ?title)||
||      (filter (|| (&& (&& (bound ?endvol) (<= ?startvol 2)) (>=
?endvol 2)) (&& (! (bound ?endvol)) (= ?startvol 2)))||
||        (conditional||
||          (bgp||
||            (triple ?loc :locatedInWork ex:Work1)||
||            (triple ?loc :startVolume ?startvol)||
||          )||
||          (bgp (triple ?loc :endVolume ?endvol)))))))||
|

You can see, an OPTIONAL is basically a left outer join.

If you're using TDB some statistics on the data could be taken into
account by an optimizer. You can check this by followoing the steps here [1]


[1] https://jena.apache.org/documentation/tdb/optimizer.html



> Dear Jean users,
>
> In short, I'm wondering if there could be an option somewhere for a
> top-down SPARQL evaluation mechanism.
>
> Long version: the dataset I'm dealing with contains data in the following 
> form:
>
> ex:Loc1 a :Location ;
>         :locatedInWork ex:Work1 ;
>         :startPage 123 ;
>         :endPage   234 ;
>         :startVolume 1 .
>
> ex:Loc2 a :Location ;
>         :locatedInWork ex:Work1 ;
>         :startPage 234 ;
>         :endPage   345 ;
>         :startVolume 1 ;
>         :endVolume   2 .
>
> where the absence of :endVolume denotes that the endVolume is equal to
> the startVolume. This might not be kosher in terms of semantics but
> that's the dataset I'm dealing with.
>
> Now, I want to select all the locations in volume 2 (including those
> starting before volume 2 and ending after volume 2), the most natural
> for me is to write something like:
>
>   ?loc :locatedInWork ex:Work1 ;
>        :startVolume ?startvol .
>   OPTIONAL { ?loc  :endVolume   ?endvol . }
>   FILTER ((BOUND(?endvol) && ?startvol <= 2 && ?endvol >= 2) ||
> (!BOUND(?endvol) && ?startvol = 2))
>
> which works fine, but is slow to the extreme (about 8s) due to the
> very large amount of triples with the :endVolume property. Now, I
> understand the slow performance is sort of expected due what's
> referred to as the bottom-up semantics of SPARQL. My understanding is
> that the first thing that will get evaluated will be ?loc :endVolume
> ?endvol which will return a huge amount of results.
>
> Here are a few questions:
>
> - Is my analysis correct?
>
> - In your experience of writing queries, how often do you rely on the
> bottom-up semantics? (my experience is never)
>
> - The bottom-up semantics are very counter-intuititve to me, what do
> you think is the reason it got into the SPARQL specs?
>
> - I suppose digging into the Jena code to optimize this kind of
> requests in Jena must be very deep dive, am I right?
>
> - Is there any plan or dedicated resources to optimize this kind of requests?
>
> - What would be the complexity of writing an alternate query
> evaluation mechanism using top-down semantics?
>
> - Would having an option to evaluate a sparql query using top-down
> semantics make sense? (we can have discussions of where the option
> would be handled, but I think it's helpful for me to get a general
> answer)
>
> - Blazegraph advertises that they are first evaluating if the results
> of a query would be the same when using a top-down and bottom-up
> semantics, and if they are the same they automatically switch to the
> top-down semantics, how much time do you estimate one would have to
> dive into the Jena code to propose a pull request for that?
>
> Best,

-- 
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center

Reply via email to