[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

2019-03-14 Thread Christine Poerschke (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793052#comment-16793052
 ] 

Christine Poerschke commented on SOLR-11831:


Hello [~mjosephidou] and [~diegoceccarelli], thank you for opening this ticket 
and associated pull request!

Just to say that I've started (a little bit) to take a look at the changes (and 
hope to look more next week):
 * The ASF GitHub Bot mirrors pull request comments into the 'Work Log' but not 
'Comments' for clarity.
 * Of course anyone can comment on the pull request or on the JIRA ticket here, 
whatever works.
 * Your pull request currently includes changes to both Lucene and Solr code, 
nothing wrong with that. I've opened LUCENE-8728 separately (for clarity, 
hopefully) to explore w.r.t. modification vs. extension vs. something else of 
the classes in question.

>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> 
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Malvina Josephidou
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

2018-05-10 Thread Diego Ceccarelli (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470605#comment-16470605
 ] 

Diego Ceccarelli commented on SOLR-11831:
-

It applies only to distributed, I was discussing with [~romseygeek] about the 
possibility to do the same inside lucene for the non distributed case. 

>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> 
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Malvina Josephidou
>Priority: Minor
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

2018-05-10 Thread Ilayaraja (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470603#comment-16470603
 ] 

Ilayaraja commented on SOLR-11831:
--

Does this apply to both distributed and non distributed solr setups?

>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> 
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Malvina Josephidou
>Priority: Minor
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316636#comment-16316636
 ] 

ASF GitHub Bot commented on SOLR-11831:
---

GitHub user mjosephidou opened a pull request:

https://github.com/apache/lucene-solr/pull/300

SOLR-11831: Skip second grouping step if group.limit is 1 (aka Las Vegas 
Patch)

Summary:
In cases where we do grouping and ask for  {{group.limit=1}} only it is 
possible to skip the second grouping step. In our test datasets it improved 
speed by around 40%.

Essentially, in the first grouping step each shard returns the top K groups 
based on the highest scoring document in each group. The top K groups from each 
shard are merged in the federator and in the second step we ask all the shards 
to return the top documents from each of the top ranking groups.

If we only want to return the highest scoring document per group we can 
return the top document id in the first step, merge results in the federator to 
retain the top K groups and then skip the second grouping step entirely.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr SOLR-11831

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #300


commit 6b918c86cd0f37320c32eb669eca722a9e74f768
Author: Malvina Josephidou 
Date:   2018-01-04T15:00:35Z

SOLR-11831: Skip second grouping step if group.limit is 1 (aka Las Vegas 
patch)

Summary:
In cases where we do grouping and ask for  {{group.limit=1}} only it is 
possible to skip the second grouping step. In our test datasets it improved 
speed by around 40%.

Essentially, in the first grouping step each shard returns the top K groups 
based on the highest scoring document in each group. The top K groups from each 
shard are merged in the federator and in the second step we ask all the shards 
to return the top documents from each of the top ranking groups.

If we only want to return the highest scoring document per group we can 
return the top document id in the first step, merge results in the federator to 
retain the top K groups and then skip the second grouping step entirely.




>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> 
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Malvina Josephidou
>Priority: Minor
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

2018-01-08 Thread Diego Ceccarelli (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316605#comment-16316605
 ] 

Diego Ceccarelli commented on SOLR-11831:
-

patch is coming, give us a few minutes :D

>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> 
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Malvina Josephidou
>Priority: Minor
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

2018-01-08 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316593#comment-16316593
 ] 

Anshum Gupta commented on SOLR-11831:
-

[~mjosephidou] I think you forgot to attach the patch :)

>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> 
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Malvina Josephidou
>Priority: Minor
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org