subject:"\[jira\] \[Comment Edited\] \(SOLR\-8776\) Support RankQuery in grouping"

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2017-05-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005060#comment-16005060
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 5/10/17 5:40 PM:
-

Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic to rerank groups and not only documents: for example if 
a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.
(it's worth to note that for simplicity, in distribute mode first pass will 
retrieve the top 100 groups from all the shards, the federator will compute the 
top 100 groups and send them to the shards to get the reranking scores, and 
finally the federator will return the top 10) 

IMO the patch is now complete and I've working unit tests. Please, can someone 
review it? 





was (Author: diegoceccarelli):
Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic to rerank groups and not only documents: for example if 
a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.
(it's worth to note that for simplicity, in distribute mode first pass will 
retrieve the top 100 groups from all the shards, the federator will compute the 
top 100 groups to the shards to get the reranking scores, and finally the 
federator will select the top 10) 

IMO the patch is now complete and I've working unit tests. Please, can someone 
review it? 




> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
>

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2017-05-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005060#comment-16005060
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 5/10/17 5:39 PM:
-

Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic to rerank groups and not only documents: for example if 
a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.
(it's worth to note that for simplicity, in distribute mode first pass will 
retrieve the top 100 groups from all the shards, the federator will compute the 
top 100 groups to the shards to get the reranking scores, and finally the 
federator will select the top 10) 

IMO the patch is now complete and I've working unit tests. Please, can someone 
review it? 





was (Author: diegoceccarelli):
Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic of reranking to rerank groups, for example if a user 
ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.

IMO the patch is now complete and I've working unit tests. Please, can someone 
review my patch? 




> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-05-26 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302188#comment-15302188
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 5/26/16 3:01 PM:
-

Thanks [~aanilpala], a file was missing in the patch, I just submitted a new 
patch with the missing file, and I tested it on the latest upstream version 
(last commit 268da5be4), please do not hesitate to contact me if you have 
comments :) 


was (Author: diegoceccarelli):
add Add RerankTermSecondPassGroupingCollector


> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-11 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/11/16 12:21 PM:
--

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
{{IndexSearcher}} and {{SolrIndexSearcher}}, I moved {{RankQuery}} into Lucene 
and created {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works by 
manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment  in {{SolrIndexSearcher}} there's a special case if a query is a 
{{RankQuery}},
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and TOP-n documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so "group-reranking" 
should: 
   *  in the first stage, iterate on the documents scoring them as usual and 
keep a map {{ score>}};
   * for each group, apply RankQuery to the top documents in the group;
   * rerank the groups according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the {{AbstractSecondPassGroupingCollector}} is that for 
each group a collector is created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score (I 
added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. Otherwise {{RankQuery}} could become an 
interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in Solr but used only for getting {{Sort}}, {{len}} was 
never used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 
  - Please keep in mind that, as starting point, I'm trying to solve the issue 
in the non distributed setting and if we're grouping on a field. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works by 
manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd)

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:26 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works by 
manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and TOP-n documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so "group-reranking" 
should: 
   *  in the first stage, iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, apply RankQuery to the top documents in the group;
   * rerank the groups according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the {{AbstractSecondPassGroupingCollector}} is that for 
each group a collector is created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score (I 
added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. Otherwise {{RankQuery}} could become an 
interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in Solr but used only for getting {{Sort}}, {len} was 
never used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:16 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and top-k documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   *  in the first stage, we iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, RankQuery is applied to the top documents in the group;
   * groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:16 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and TOP-n documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   *  in the first stage, we iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, RankQuery is applied to the top documents in the group;
   * groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:13 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   *  in the first stage, we iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, RankQuery is applied to the top documents in the group;
   * groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery)

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:12 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   *  in the first stage, we iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, RankQuery is applied to the top documents in the group;
   * groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:10 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map { score>} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   1 in the first stage, we iterate on the documents scoring them as usual and 
keep a map {group -> score>};
   2 for each group, RankQuery is applied to the top documents in the group;
   3 groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {MergeStrategy}. I uploaded 
a new patch with a first step. I agree that merge strategy must stay there, 
that's why I wrote "partially moved" :)   as well as there's IndexSearcher and 
SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene 
{SolrRankQuery}.  The reason is that the {RankQuery} works by manipulating the 
collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd,

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:08 PM:
-

[~joel.bernstein] thanks for pointing out about the {MergeStrategy}. I uploaded 
a new patch with a first step. I agree that merge strategy must stay there, 
that's why I wrote "partially moved" :)   as well as there's IndexSearcher and 
SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene 
{SolrRankQuery}.  The reason is that the {RankQuery} works by manipulating the 
collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {TopScoreDocCollector.create}, we 
wrap a topScoreCollector into a 'RankQuery' collector.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map { score>} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{Abstract(First|Second)PassGroupingCollector} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind RankQuery is that you don't want to 
apply the query to all the documents in the collection, so the "group-reranking"
should: 

   1 in the first stage, we iterate on the documents scoring them as usual and 
keep a map {group -> score>};
   2 for each group, RankQuery is applied to the top documents in the group;
   3 groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, 
because what happens in the 
{AbstractSecondPassGroupingCollector} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
RankQuery into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to 
check if it is a problem. RankQuery could become an interface maybe.
  - I did some changes to the interface of {RankQuery.getTopDocsCollector}: 
{QueryCommand} was in solr but used only for getting {Sort}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {RankQuery}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the MergeStrategy. I uploaded a 
new patch with a first step.
I agree that merge strategy must stay there, that's why I wrote "partially 
moved" :)  
as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in 
Lucene and created lucene {SolrRankQuery}. 
The reason is that the {RankQuery} works by manipulating the collector, through 
this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

11 matches

Site Navigation

Mail list logo

Footer information