Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

2015-03-10 Thread Rose Beck
Dear Andy,

Thanks a lot for your reply :)
Well these questions do not have a background I just asked them out of
curiosity.

The order by clause in Jena by default orders in ascending order. Is
there some way by which I may describe in my SPARQL query to order the
results in descending order (It will be great if someone can explain
how to do the same using an example).

On Tue, Mar 10, 2015 at 5:46 PM, Andy Seaborne  wrote:
> What's the background for all these questions ??!!
>
> Index Joins have the useful feature that they take constant memory overhead.
> This means one major area where RAM would otherwise run out is removed.
>
> ARQ does not currently use hash joins.  It should, though SPARQL tends quite
> strongly to produce chains of joins. (There are cases in SPARQL involving
> OPTIONALs where index joins, with scope elimination, does not work as an
> execution strategy.  See the literature.)
>
> I have an alternative library of a collection of various join algorithms
> that can be used to build alternative query execution strategies.  This
> evaluator is not ready for production and not part of the project (yet?).
>
> Pipeline joins are interesting as a sort of two-sided hash join which is
> non-blocking.  Useful for multi-core and distributed evaluation.
>
> Andy
>
>
>
>
> On 10/03/15 11:49, Rose Beck wrote:
>>
>> Dear Rob,
>>
>> Thanks a lot again :)
>>
>> It seems to me that Index Joins are an efficient solution when the
>> query plan is a left deep join tree. However, when the query plan is
>> bushy and the RHS itself includes many Index Join operators
>> (essentially forming a sub-tree), then dont you think computing the
>> entire RHS sub-tree for each join result of the LHS sub-tree is
>> expensive. Please correct me if I am wrong. Or does Jena only builds
>> left deep join trees.
>>
>> On Tue, Mar 10, 2015 at 5:11 PM, Rob Vesse  wrote:
>>>
>>> Jena does use hash joins where necessary which are indeed blocking
>>>
>>> However in most cases it can instead use index joins which are
>>> essentially
>>> a form of merge join whereby you substitute candidate solutions from the
>>> LHS into the RHS and evaluate the RHS for each LHS solution
>>>
>>> Rob
>>>
>>> On 10/03/2015 11:10, "Rose Beck"  wrote:
>>>
>>>> Dear Rob,
>>>>
>>>> Thanks a lot for the reply again.
>>>>
>>>> But I am curious does Jena implement Hash joins -- which are
>>>> essentially blocking in nature. If Jena does not, then how is a join
>>>> between two unsorted lists (of intermediate results) done in Jena?
>>>>
>>>>
>>>>
>>>> On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse  wrote:
>>>>>
>>>>> Yes that is pretty much what happens
>>>>>
>>>>> Note though that most query evaluation in ARQ is done in a streaming
>>>>> fashion so the full set of solutions is typically never held in memory
>>>>> for
>>>>> any query unless an operator which requires full /partial
>>>>> materialisation
>>>>> e.g. DISTINCT is encountered
>>>>>
>>>>> Rob
>>>>>
>>>>> On 10/03/2015 10:24, "Rose Beck"  wrote:
>>>>>
>>>>>> Dear Rob,
>>>>>>
>>>>>> First and foremost thank you for such a wonderful explanation.
>>>>>>
>>>>>> Just to clarify, say my example query is:
>>>>>>
>>>>>> select ?a?b?c where{?a  ?c. ?a  ?b} order by ?c limit 10
>>>>>>
>>>>>> Then for the query above all the solutions are generated just as it
>>>>>> would be the case for the SPARQL query: select ?a?b?c where{?a 
>>>>>> ?c. ?a  ?b}. But within the priority queue (which is the last
>>>>>> operation which is applied before results are output to the users) at
>>>>>> any point just 10 solutions ordered by ?c are placed. Please correct
>>>>>> me if I am wrong.
>>>>>>
>>>>>> Cheers,
>>>>>> Rose
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse 
>>>>>> wrote:
>>>>>>>
>>>>>>> Query execution in ARQ is based on nested iterators so QueryIterTopN
>>>>>>> will
>>>>>>> always apply over another iterator
>

Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

2015-03-10 Thread Rose Beck
Dear Rob,

Thanks a lot again :)

It seems to me that Index Joins are an efficient solution when the
query plan is a left deep join tree. However, when the query plan is
bushy and the RHS itself includes many Index Join operators
(essentially forming a sub-tree), then dont you think computing the
entire RHS sub-tree for each join result of the LHS sub-tree is
expensive. Please correct me if I am wrong. Or does Jena only builds
left deep join trees.

On Tue, Mar 10, 2015 at 5:11 PM, Rob Vesse  wrote:
> Jena does use hash joins where necessary which are indeed blocking
>
> However in most cases it can instead use index joins which are essentially
> a form of merge join whereby you substitute candidate solutions from the
> LHS into the RHS and evaluate the RHS for each LHS solution
>
> Rob
>
> On 10/03/2015 11:10, "Rose Beck"  wrote:
>
>>Dear Rob,
>>
>>Thanks a lot for the reply again.
>>
>>But I am curious does Jena implement Hash joins -- which are
>>essentially blocking in nature. If Jena does not, then how is a join
>>between two unsorted lists (of intermediate results) done in Jena?
>>
>>
>>
>>On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse  wrote:
>>> Yes that is pretty much what happens
>>>
>>> Note though that most query evaluation in ARQ is done in a streaming
>>> fashion so the full set of solutions is typically never held in memory
>>>for
>>> any query unless an operator which requires full /partial
>>>materialisation
>>> e.g. DISTINCT is encountered
>>>
>>> Rob
>>>
>>> On 10/03/2015 10:24, "Rose Beck"  wrote:
>>>
>>>>Dear Rob,
>>>>
>>>>First and foremost thank you for such a wonderful explanation.
>>>>
>>>>Just to clarify, say my example query is:
>>>>
>>>>select ?a?b?c where{?a  ?c. ?a  ?b} order by ?c limit 10
>>>>
>>>>Then for the query above all the solutions are generated just as it
>>>>would be the case for the SPARQL query: select ?a?b?c where{?a 
>>>>?c. ?a  ?b}. But within the priority queue (which is the last
>>>>operation which is applied before results are output to the users) at
>>>>any point just 10 solutions ordered by ?c are placed. Please correct
>>>>me if I am wrong.
>>>>
>>>>Cheers,
>>>>Rose
>>>>
>>>>
>>>>On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse  wrote:
>>>>> Query execution in ARQ is based on nested iterators so QueryIterTopN
>>>>>will
>>>>> always apply over another iterator
>>>>>
>>>>> A PriorityQueue is used internally as temporary storage within
>>>>> QueryIterTopN while it exhausts the inner iterator allowing it to only
>>>>>use
>>>>> at most the limit amount of storage in the priority queue plus
>>>>>whatever
>>>>> temporary storage the inner iterator(s) may need.
>>>>>
>>>>> There is still a "total sort" in the sense that every possible
>>>>>solution
>>>>> has to be compared to see if it should be placed into the priority
>>>>>queue
>>>>> however there is not a "total sort" in the sense of needing to
>>>>>materialise
>>>>> all possible solutions into memory and then sort.
>>>>>
>>>>> Rob
>>>>>
>>>>> p.s. Please don't post identical questions to both users@ and dev@ -
>>>>>one
>>>>> list is sufficient as the developers are on both lists.  As a general
>>>>>rule
>>>>> general support questions should go to users@ and
>>>>>technical/architecture
>>>>> questions like this should go to dev@
>>>>>
>>>>>
>>>>>
>>>>> On 10/03/2015 05:54, "Rose Beck"  wrote:
>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>I saw the following issue posted on Jena website (which has been
>>>>>>recently resolved):
>>>>>>Avoid a total sort for ORDER BY + LIMIT queries
>>>>>>(https://issues.apache.org/jira/browse/JENA-89).
>>>>>>
>>>>>>I am very interested in understanding as to how does Jena-ARQ avoids
>>>>>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned
>>>>>>that Jena-ARQ uses priority queue for avoiding a final sort, however
>>>>>>it is also mentioned that "ARQ's algebra package contains already a
>>>>>>OpTopN [3] operator. The OpExecutor [4] will need to use a new
>>>>>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
>>>>>>clear now does the priority queue benefit from OpTopN operator and
>>>>>>QueryIterTopN as the links [3] and [4] mentioned on the website does
>>>>>>not work, so I am not able to understand their operation and as to how
>>>>>>do they help in avoiding a total sort.
>>>>>>
>>>>>>Can someone please explain how does Jena-ARQ execute the queries
>>>>>>containing ORDER BY + LIMIT clause.
>>>>>>
>>>>>>With Warm Regards,
>>>>>>Rose
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>With Warm Regards,
>>>>Rose
>>>
>>>
>>>
>>>
>>
>>
>>
>>--
>>With Warm Regards,
>>Rose
>
>
>
>



-- 
With Warm Regards,
Rose


Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

2015-03-10 Thread Rose Beck
Dear Rob,

Thanks a lot for the reply again.

But I am curious does Jena implement Hash joins -- which are
essentially blocking in nature. If Jena does not, then how is a join
between two unsorted lists (of intermediate results) done in Jena?



On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse  wrote:
> Yes that is pretty much what happens
>
> Note though that most query evaluation in ARQ is done in a streaming
> fashion so the full set of solutions is typically never held in memory for
> any query unless an operator which requires full /partial materialisation
> e.g. DISTINCT is encountered
>
> Rob
>
> On 10/03/2015 10:24, "Rose Beck"  wrote:
>
>>Dear Rob,
>>
>>First and foremost thank you for such a wonderful explanation.
>>
>>Just to clarify, say my example query is:
>>
>>select ?a?b?c where{?a  ?c. ?a  ?b} order by ?c limit 10
>>
>>Then for the query above all the solutions are generated just as it
>>would be the case for the SPARQL query: select ?a?b?c where{?a 
>>?c. ?a  ?b}. But within the priority queue (which is the last
>>operation which is applied before results are output to the users) at
>>any point just 10 solutions ordered by ?c are placed. Please correct
>>me if I am wrong.
>>
>>Cheers,
>>Rose
>>
>>
>>On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse  wrote:
>>> Query execution in ARQ is based on nested iterators so QueryIterTopN
>>>will
>>> always apply over another iterator
>>>
>>> A PriorityQueue is used internally as temporary storage within
>>> QueryIterTopN while it exhausts the inner iterator allowing it to only
>>>use
>>> at most the limit amount of storage in the priority queue plus whatever
>>> temporary storage the inner iterator(s) may need.
>>>
>>> There is still a "total sort" in the sense that every possible solution
>>> has to be compared to see if it should be placed into the priority queue
>>> however there is not a "total sort" in the sense of needing to
>>>materialise
>>> all possible solutions into memory and then sort.
>>>
>>> Rob
>>>
>>> p.s. Please don't post identical questions to both users@ and dev@ - one
>>> list is sufficient as the developers are on both lists.  As a general
>>>rule
>>> general support questions should go to users@ and technical/architecture
>>> questions like this should go to dev@
>>>
>>>
>>>
>>> On 10/03/2015 05:54, "Rose Beck"  wrote:
>>>
>>>>Hi,
>>>>
>>>>I saw the following issue posted on Jena website (which has been
>>>>recently resolved):
>>>>Avoid a total sort for ORDER BY + LIMIT queries
>>>>(https://issues.apache.org/jira/browse/JENA-89).
>>>>
>>>>I am very interested in understanding as to how does Jena-ARQ avoids
>>>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned
>>>>that Jena-ARQ uses priority queue for avoiding a final sort, however
>>>>it is also mentioned that "ARQ's algebra package contains already a
>>>>OpTopN [3] operator. The OpExecutor [4] will need to use a new
>>>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
>>>>clear now does the priority queue benefit from OpTopN operator and
>>>>QueryIterTopN as the links [3] and [4] mentioned on the website does
>>>>not work, so I am not able to understand their operation and as to how
>>>>do they help in avoiding a total sort.
>>>>
>>>>Can someone please explain how does Jena-ARQ execute the queries
>>>>containing ORDER BY + LIMIT clause.
>>>>
>>>>With Warm Regards,
>>>>Rose
>>>
>>>
>>>
>>>
>>
>>
>>
>>--
>>With Warm Regards,
>>Rose
>
>
>
>



-- 
With Warm Regards,
Rose


Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

2015-03-10 Thread Rose Beck
Dear Rob,

First and foremost thank you for such a wonderful explanation.

Just to clarify, say my example query is:

select ?a?b?c where{?a  ?c. ?a  ?b} order by ?c limit 10

Then for the query above all the solutions are generated just as it
would be the case for the SPARQL query: select ?a?b?c where{?a 
?c. ?a  ?b}. But within the priority queue (which is the last
operation which is applied before results are output to the users) at
any point just 10 solutions ordered by ?c are placed. Please correct
me if I am wrong.

Cheers,
Rose


On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse  wrote:
> Query execution in ARQ is based on nested iterators so QueryIterTopN will
> always apply over another iterator
>
> A PriorityQueue is used internally as temporary storage within
> QueryIterTopN while it exhausts the inner iterator allowing it to only use
> at most the limit amount of storage in the priority queue plus whatever
> temporary storage the inner iterator(s) may need.
>
> There is still a "total sort" in the sense that every possible solution
> has to be compared to see if it should be placed into the priority queue
> however there is not a "total sort" in the sense of needing to materialise
> all possible solutions into memory and then sort.
>
> Rob
>
> p.s. Please don't post identical questions to both users@ and dev@ - one
> list is sufficient as the developers are on both lists.  As a general rule
> general support questions should go to users@ and technical/architecture
> questions like this should go to dev@
>
>
>
> On 10/03/2015 05:54, "Rose Beck"  wrote:
>
>>Hi,
>>
>>I saw the following issue posted on Jena website (which has been
>>recently resolved):
>>Avoid a total sort for ORDER BY + LIMIT queries
>>(https://issues.apache.org/jira/browse/JENA-89).
>>
>>I am very interested in understanding as to how does Jena-ARQ avoids
>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned
>>that Jena-ARQ uses priority queue for avoiding a final sort, however
>>it is also mentioned that "ARQ's algebra package contains already a
>>OpTopN [3] operator. The OpExecutor [4] will need to use a new
>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
>>clear now does the priority queue benefit from OpTopN operator and
>>QueryIterTopN as the links [3] and [4] mentioned on the website does
>>not work, so I am not able to understand their operation and as to how
>>do they help in avoiding a total sort.
>>
>>Can someone please explain how does Jena-ARQ execute the queries
>>containing ORDER BY + LIMIT clause.
>>
>>With Warm Regards,
>>Rose
>
>
>
>



-- 
With Warm Regards,
Rose


Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

2015-03-10 Thread Rose Beck
Sorry for yet another mail. Is there some document from which I can
understand how ORDER BY and LIMIT clause are evaluated by Jena. I read
the documentation here:
http://jena.apache.org/documentation/query/arq-query-eval.html but I
am not able to understand the solution from it.

On Tue, Mar 10, 2015 at 11:24 AM, Rose Beck  wrote:
> Hi,
>
> I saw the following issue posted on Jena website (which has been
> recently resolved):
> Avoid a total sort for ORDER BY + LIMIT queries
> (https://issues.apache.org/jira/browse/JENA-89).
>
> I am very interested in understanding as to how does Jena-ARQ avoids
> total sort for ORDER BY + LIMIT queries. In the post it is mentioned
> that Jena-ARQ uses priority queue for avoiding a final sort, however
> it is also mentioned that "ARQ's algebra package contains already a
> OpTopN [3] operator. The OpExecutor [4] will need to use a new
> QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
> clear now does the priority queue benefit from OpTopN operator and
> QueryIterTopN as the links [3] and [4] mentioned on the website does
> not work, so I am not able to understand their operation and as to how
> do they help in avoiding a total sort.
>
> Can someone please explain how does Jena-ARQ execute the queries
> containing ORDER BY + LIMIT clause.
>
> With Warm Regards,
> Rose



-- 
With Warm Regards,
Rose


How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

2015-03-09 Thread Rose Beck
Hi,

I saw the following issue posted on Jena website (which has been
recently resolved):
Avoid a total sort for ORDER BY + LIMIT queries
(https://issues.apache.org/jira/browse/JENA-89).

I am very interested in understanding as to how does Jena-ARQ avoids
total sort for ORDER BY + LIMIT queries. In the post it is mentioned
that Jena-ARQ uses priority queue for avoiding a final sort, however
it is also mentioned that "ARQ's algebra package contains already a
OpTopN [3] operator. The OpExecutor [4] will need to use a new
QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
clear now does the priority queue benefit from OpTopN operator and
QueryIterTopN as the links [3] and [4] mentioned on the website does
not work, so I am not able to understand their operation and as to how
do they help in avoiding a total sort.

Can someone please explain how does Jena-ARQ execute the queries
containing ORDER BY + LIMIT clause.

With Warm Regards,
Rose


Query regarding Jena's indexes

2013-07-29 Thread Rose Beck
Hi

I read the classic paper "Efficient RDF Storage and Retrieval in Jena2".
However, from the paper I am unable to understand how GSPO, GOSP, etc
(employed within Jena TDB) indexes are stored.

Can you please give me pointers from where I can understand more about
these indexes in Jena TDB. I'll be highly thankful to you for the same.


With warm regards,
Rose