[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005060#comment-16005060 ] Diego Ceccarelli edited comment on SOLR-8776 at 5/10/17 5:40 PM: - Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), highlights: [~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} to inject my own {{GroupReducer}}. Could you please take a look at let me know if it makes sense? also in [SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54] {code:title=SecondPassGroupingCollector.java|borderStyle=solid} public SecondPassGroupingCollector(GroupSelector groupSelector, Collectiongroups, GroupReducer reducer) { //System.out.println("SP init"); //Do we want to check if groups is null here? instead of checking at line 62? if (groups.isEmpty()) { throw new IllegalArgumentException("no groups to collect (groups is empty)"); } this.groupSelector = Objects.requireNonNull(groupSelector); this.groupSelector.setGroups(groups); this.groups = Objects.requireNonNull(groups); {code} I would check if {{groups != null}} before {{groups.isEmpty()}}. 2. I changed the logic to rerank groups and not only documents: for example if a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}: * the top 100 groups matching {{greeting}} are retrieved; * top 100 groups are reranked by {{rqq}}; * finally the top 10 reranked groups are returned; * inside each group documents will be reranked as well. (it's worth to note that for simplicity, in distribute mode first pass will retrieve the top 100 groups from all the shards, the federator will compute the top 100 groups and send them to the shards to get the reranking scores, and finally the federator will return the top 10) IMO the patch is now complete and I've working unit tests. Please, can someone review it? was (Author: diegoceccarelli): Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), highlights: [~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} to inject my own {{GroupReducer}}. Could you please take a look at let me know if it makes sense? also in [SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54] {code:title=SecondPassGroupingCollector.java|borderStyle=solid} public SecondPassGroupingCollector(GroupSelector groupSelector, Collection groups, GroupReducer reducer) { //System.out.println("SP init"); //Do we want to check if groups is null here? instead of checking at line 62? if (groups.isEmpty()) { throw new IllegalArgumentException("no groups to collect (groups is empty)"); } this.groupSelector = Objects.requireNonNull(groupSelector); this.groupSelector.setGroups(groups); this.groups = Objects.requireNonNull(groups); {code} I would check if {{groups != null}} before {{groups.isEmpty()}}. 2. I changed the logic to rerank groups and not only documents: for example if a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}: * the top 100 groups matching {{greeting}} are retrieved; * top 100 groups are reranked by {{rqq}}; * finally the top 10 reranked groups are returned; * inside each group documents will be reranked as well. (it's worth to note that for simplicity, in distribute mode first pass will retrieve the top 100 groups from all the shards, the federator will compute the top 100 groups to the shards to get the reranking scores, and finally the federator will select the top 10) IMO the patch is now complete and I've working unit tests. Please, can someone review it? > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: 6.0 > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, >
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005060#comment-16005060 ] Diego Ceccarelli edited comment on SOLR-8776 at 5/10/17 5:39 PM: - Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), highlights: [~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} to inject my own {{GroupReducer}}. Could you please take a look at let me know if it makes sense? also in [SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54] {code:title=SecondPassGroupingCollector.java|borderStyle=solid} public SecondPassGroupingCollector(GroupSelector groupSelector, Collectiongroups, GroupReducer reducer) { //System.out.println("SP init"); //Do we want to check if groups is null here? instead of checking at line 62? if (groups.isEmpty()) { throw new IllegalArgumentException("no groups to collect (groups is empty)"); } this.groupSelector = Objects.requireNonNull(groupSelector); this.groupSelector.setGroups(groups); this.groups = Objects.requireNonNull(groups); {code} I would check if {{groups != null}} before {{groups.isEmpty()}}. 2. I changed the logic to rerank groups and not only documents: for example if a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}: * the top 100 groups matching {{greeting}} are retrieved; * top 100 groups are reranked by {{rqq}}; * finally the top 10 reranked groups are returned; * inside each group documents will be reranked as well. (it's worth to note that for simplicity, in distribute mode first pass will retrieve the top 100 groups from all the shards, the federator will compute the top 100 groups to the shards to get the reranking scores, and finally the federator will select the top 10) IMO the patch is now complete and I've working unit tests. Please, can someone review it? was (Author: diegoceccarelli): Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), highlights: [~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} to inject my own {{GroupReducer}}. Could you please take a look at let me know if it makes sense? also in [SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54] {code:title=SecondPassGroupingCollector.java|borderStyle=solid} public SecondPassGroupingCollector(GroupSelector groupSelector, Collection groups, GroupReducer reducer) { //System.out.println("SP init"); //Do we want to check if groups is null here? instead of checking at line 62? if (groups.isEmpty()) { throw new IllegalArgumentException("no groups to collect (groups is empty)"); } this.groupSelector = Objects.requireNonNull(groupSelector); this.groupSelector.setGroups(groups); this.groups = Objects.requireNonNull(groups); {code} I would check if {{groups != null}} before {{groups.isEmpty()}}. 2. I changed the logic of reranking to rerank groups, for example if a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}: * the top 100 groups matching {{greeting}} are retrieved; * top 100 groups are reranked by {{rqq}}; * finally the top 10 reranked groups are returned; * inside each group documents will be reranked as well. IMO the patch is now complete and I've working unit tests. Please, can someone review my patch? > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: 6.0 > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302188#comment-15302188 ] Diego Ceccarelli edited comment on SOLR-8776 at 5/26/16 3:01 PM: - Thanks [~aanilpala], a file was missing in the patch, I just submitted a new patch with the missing file, and I tested it on the latest upstream version (last commit 268da5be4), please do not hesitate to contact me if you have comments :) was (Author: diegoceccarelli): add Add RerankTermSecondPassGroupingCollector > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: 6.0 > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli edited comment on SOLR-8776 at 3/11/16 12:21 PM: -- [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's {{IndexSearcher}} and {{SolrIndexSearcher}}, I moved {{RankQuery}} into Lucene and created {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment in {{SolrIndexSearcher}} there's a special case if a query is a {{RankQuery}}, {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a top collector using the {{TopScoreDocCollector.create}}, we wrap a topScoreCollector into a 'RankQuery collector'. Let me remind that grouping works in two separate stages: * in the first stage, we iterate on the documents scoring them and keep a map {{ score>}} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); * for each group, the documents in the group are ranked and TOP-n documents for each group are returned. This logic is mainly implemented into {{Abstract(First|Second)PassGroupingCollector}} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind {{RankQuery}} is that you don't want to apply the query to all the documents in the collection, so "group-reranking" should: * in the first stage, iterate on the documents scoring them as usual and keep a map {{ score>}}; * for each group, apply RankQuery to the top documents in the group; * rerank the groups according to the new scores. In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, because what happens in the {{AbstractSecondPassGroupingCollector}} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the RankQuery collector from Solr. Moving the {{RankQuery}} into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked based on the RankQuery scores. I'll work now on 3. i.e., reordering the groups based on the new RankQuery score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I have to check if it is a problem. Otherwise {{RankQuery}} could become an interface maybe. - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: {{QueryCommand}} was in Solr but used only for getting {{Sort}}, {{len}} was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {{RankQuery}}. - Please keep in mind that, as starting point, I'm trying to solve the issue in the non distributed setting and if we're grouping on a field. was (Author: diegoceccarelli): [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd)
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:26 PM: - [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a top collector using the {{TopScoreDocCollector.create}}, we wrap a topScoreCollector into a 'RankQuery collector'. Let me remind that grouping works in two separate stages: * in the first stage, we iterate on the documents scoring them and keep a map {{ score>}} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); * for each group, the documents in the group are ranked and TOP-n documents for each group are returned. This logic is mainly implemented into {{Abstract(First|Second)PassGroupingCollector}} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind {{RankQuery}} is that you don't want to apply the query to all the documents in the collection, so "group-reranking" should: * in the first stage, iterate on the documents scoring them as usual and keep a map {{group -> score>}}; * for each group, apply RankQuery to the top documents in the group; * rerank the groups according to the new scores. In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, because what happens in the {{AbstractSecondPassGroupingCollector}} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the RankQuery collector from Solr. Moving the {{RankQuery}} into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked based on the RankQuery scores. I'll work now on 3. i.e., reordering the groups based on the new RankQuery score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I have to check if it is a problem. Otherwise {{RankQuery}} could become an interface maybe. - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: {{QueryCommand}} was in Solr but used only for getting {{Sort}}, {len} was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {{RankQuery}}. was (Author: diegoceccarelli): [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:16 PM: - [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a top collector using the {{TopScoreDocCollector.create}}, we wrap a topScoreCollector into a 'RankQuery collector'. Let me remind that grouping works in two separate stages: * in the first stage, we iterate on the documents scoring them and keep a map {{ score>}} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); * for each group, the documents in the group are ranked and top-k documents for each group are returned. This logic is mainly implemented into {{Abstract(First|Second)PassGroupingCollector}} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind {{RankQuery}} is that you don't want to apply the query to all the documents in the collection, so the "group-reranking" should: * in the first stage, we iterate on the documents scoring them as usual and keep a map {{group -> score>}}; * for each group, RankQuery is applied to the top documents in the group; * groups will be reranked according to the new scores. In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, because what happens in the {{AbstractSecondPassGroupingCollector}} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the RankQuery collector from Solr. Moving the {{RankQuery}} into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked based on the RankQuery scores. I'll work now on 3. i.e., reordering the groups based on the new RankQuery score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I have to check if it is a problem. {{RankQuery}} could become an interface maybe. - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: {{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {{RankQuery}}. was (Author: diegoceccarelli): [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q;
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:16 PM: - [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a top collector using the {{TopScoreDocCollector.create}}, we wrap a topScoreCollector into a 'RankQuery collector'. Let me remind that grouping works in two separate stages: * in the first stage, we iterate on the documents scoring them and keep a map {{ score>}} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); * for each group, the documents in the group are ranked and TOP-n documents for each group are returned. This logic is mainly implemented into {{Abstract(First|Second)PassGroupingCollector}} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind {{RankQuery}} is that you don't want to apply the query to all the documents in the collection, so the "group-reranking" should: * in the first stage, we iterate on the documents scoring them as usual and keep a map {{group -> score>}}; * for each group, RankQuery is applied to the top documents in the group; * groups will be reranked according to the new scores. In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, because what happens in the {{AbstractSecondPassGroupingCollector}} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the RankQuery collector from Solr. Moving the {{RankQuery}} into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked based on the RankQuery scores. I'll work now on 3. i.e., reordering the groups based on the new RankQuery score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I have to check if it is a problem. {{RankQuery}} could become an interface maybe. - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: {{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {{RankQuery}}. was (Author: diegoceccarelli): [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q;
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:13 PM: - [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a top collector using the {{TopScoreDocCollector.create}}, we wrap a topScoreCollector into a 'RankQuery collector'. Let me remind that grouping works in two separate stages: * in the first stage, we iterate on the documents scoring them and keep a map {{ score>}} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); * for each group in the top groups documents in the group are ranked and top documents for each group are returned. This logic is mainly implemented into {{Abstract(First|Second)PassGroupingCollector}} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind {{RankQuery}} is that you don't want to apply the query to all the documents in the collection, so the "group-reranking" should: * in the first stage, we iterate on the documents scoring them as usual and keep a map {{group -> score>}}; * for each group, RankQuery is applied to the top documents in the group; * groups will be reranked according to the new scores. In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, because what happens in the {{AbstractSecondPassGroupingCollector}} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the RankQuery collector from Solr. Moving the {{RankQuery}} into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked based on the RankQuery scores. I'll work now on 3. i.e., reordering the groups based on the new RankQuery score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I have to check if it is a problem. {{RankQuery}} could become an interface maybe. - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: {{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {{RankQuery}}. was (Author: diegoceccarelli): [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery)
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:12 PM: - [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a top collector using the {{TopScoreDocCollector.create}}, we wrap a topScoreCollector into a 'RankQuery collector'. Let me remind that grouping works in two separate stages: * in the first stage, we iterate on the documents scoring them and keep a map {{ score>}} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); * for each group in the top groups documents in the group are ranked and top documents for each group are returned. This logic is mainly implemented into {{Abstract(First|Second)PassGroupingCollector}} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind {{RankQuery}} is that you don't want to apply the query to all the documents in the collection, so the "group-reranking" should: * in the first stage, we iterate on the documents scoring them as usual and keep a map {{group -> score>}}; * for each group, RankQuery is applied to the top documents in the group; * groups will be reranked according to the new scores. In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, because what happens in the {{AbstractSecondPassGroupingCollector}} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the reranking collector from Solr. Moving the {{RankQuery}} into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked. I'll work now on 3. i.e., reordering the groups based on the new rerank score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I have to check if it is a problem. {{RankQuery}} could become an interface maybe. - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: {{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {{RankQuery}}. was (Author: diegoceccarelli): [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:10 PM: - [~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and created lucene {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a top collector using the {{TopScoreDocCollector.create}}, we wrap a topScoreCollector into a 'RankQuery collector'. Let me remind that grouping works in two separate stages: * in the first stage, we iterate on the documents scoring them and keep a map { score>} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); * for each group in the top groups documents in the group are ranked and top documents for each group are returned. This logic is mainly implemented into {{Abstract(First|Second)PassGroupingCollector}} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind {{RankQuery}} is that you don't want to apply the query to all the documents in the collection, so the "group-reranking" should: 1 in the first stage, we iterate on the documents scoring them as usual and keep a map {group -> score>}; 2 for each group, RankQuery is applied to the top documents in the group; 3 groups will be reranked according to the new scores. In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, because what happens in the {{AbstractSecondPassGroupingCollector}} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the reranking collector from Solr. Moving the {{RankQuery}} into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked. I'll work now on 3. i.e., reordering the groups based on the new rerank score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I have to check if it is a problem. {{RankQuery}} could become an interface maybe. - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: {{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {{RankQuery}}. was (Author: diegoceccarelli): [~joel.bernstein] thanks for pointing out about the {MergeStrategy}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene {SolrRankQuery}. The reason is that the {RankQuery} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd,
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:08 PM: - [~joel.bernstein] thanks for pointing out about the {MergeStrategy}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene {SolrRankQuery}. The reason is that the {RankQuery} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a top collector using the {TopScoreDocCollector.create}, we wrap a topScoreCollector into a 'RankQuery' collector. Let me remind that grouping works in two separate stages: * in the first stage, we iterate on the documents scoring them and keep a map { score>} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); * for each group in the top groups documents in the group are ranked and top documents for each group are returned. This logic is mainly implemented into {Abstract(First|Second)PassGroupingCollector} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind RankQuery is that you don't want to apply the query to all the documents in the collection, so the "group-reranking" should: 1 in the first stage, we iterate on the documents scoring them as usual and keep a map {group -> score>}; 2 for each group, RankQuery is applied to the top documents in the group; 3 groups will be reranked according to the new scores. In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, because what happens in the {AbstractSecondPassGroupingCollector} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the reranking collector from Solr. Moving the RankQuery into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked. I'll work now on 3. i.e., reordering the groups based on the new rerank score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to check if it is a problem. RankQuery could become an interface maybe. - I did some changes to the interface of {RankQuery.getTopDocsCollector}: {QueryCommand} was in solr but used only for getting {Sort}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {RankQuery}. was (Author: diegoceccarelli): [~joel.bernstein] thanks for pointing out about the MergeStrategy. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene {SolrRankQuery}. The reason is that the {RankQuery} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a