from:"Diego Ceccarelli \(JIRA\)"

[jira] [Created] (SOLR-13676) Reduce log verbosity in TestDistributedGrouping using ignoreException

2019-08-02 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created SOLR-13676:
---

 Summary: Reduce log verbosity in TestDistributedGrouping using 
ignoreException
 Key: SOLR-13676
 URL: https://issues.apache.org/jira/browse/SOLR-13676
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Reporter: Diego Ceccarelli


SOLR-13404 added a test that expects Solr to fail if grouping is called with 
{{group.offset < 0}}. When the test runs it succeeds but the whole stack trace 
is printed out in the logs. This small patch avoid the stack trace by using 
{{ignoreException}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13576) factor out a TopGroupsShardResponseProcessor.fillResultIds method

2019-07-05 Thread Diego Ceccarelli (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879333#comment-16879333
 ] 

Diego Ceccarelli commented on SOLR-13576:
-

LGTM, thanks [~cpoerschke]

> factor out a TopGroupsShardResponseProcessor.fillResultIds method
> -
>
> Key: SOLR-13576
> URL: https://issues.apache.org/jira/browse/SOLR-13576
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (9.0), 8.2
>
> Attachments: SOLR-13576.patch
>
>
> The {{TopGroupsShardResponseProcessor.process}} method e.g. 
> [#L54-L215|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.1.1/solr/core/src/java/org/apache/solr/search/grouping/distributed/responseprocessor/TopGroupsShardResponseProcessor.java#L54-L215]
>  does quite a few things and factoring out a {{fillResultIds}} (or similarly 
> named) method for the logically distinct 
> [#L192-L214|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.1.1/solr/core/src/java/org/apache/solr/search/grouping/distributed/responseprocessor/TopGroupsShardResponseProcessor.java#L192-L214]
>  portion could help with code comprehension.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2019-05-13 Thread Diego Ceccarelli (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838896#comment-16838896
 ] 

Diego Ceccarelli commented on SOLR-8776:


[~ichattopadhyaya] are you still interested in moving this forward? It is 
really very painful to update it upstream when it gets stale :D 

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2019-04-26 Thread Diego Ceccarelli (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826814#comment-16826814
 ] 

Diego Ceccarelli commented on SOLR-8776:


Good morning, I updated the patch fixing the comment by Ilaygit, now the number 
of groups retrieved is correct. 

While I was fixing the issue I realized that this patch won't work with 
pagination :(

Make it work it with pagination requires also to modify the first step 
collector - I don't mind doing that but I would do it in a separate patch, 
because otherwise this will become very hard to review. 

I might throw an exception if someone try to use grouping + ltr + pagination.. 

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2019-04-23 Thread Diego Ceccarelli (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824409#comment-16824409
 ] 

Diego Ceccarelli commented on SOLR-8776:


I updated the patch to the latest version upstream: 

[https://github.com/apache/lucene-solr/pull/162/files]

I spent several days hunting a bug :P

{{ant precommit}} is happy and tests are successful - there is a comment by 
Ilaygit about the number of groups retrieved in the non distributed setting 
that I want to double check tomorrow, but all the rest is done. 
[~ichattopadhyaya] [~joel.bernstein] [~romseygeek] can you take a look? I would 
be able to work on it in these days.. Please lets close this! :D :D :D

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2019-04-11 Thread Diego Ceccarelli (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815399#comment-16815399
 ] 

Diego Ceccarelli commented on SOLR-8776:


I'm updating the patch

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2019-04-11 Thread Diego Ceccarelli (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815174#comment-16815174
 ] 

Diego Ceccarelli commented on SOLR-8776:


I would like to move this forward too, I can start rebasing it to the current 
master, what do you think [~romseygeek] [~joel.bernstein] [~ichattopadhyaya] ? 

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8539) Fix typos and style in TestStopFilter

2018-10-22 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created LUCENE-8539:


 Summary: Fix typos and style in TestStopFilter
 Key: LUCENE-8539
 URL: https://issues.apache.org/jira/browse/LUCENE-8539
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Diego Ceccarelli


This patch fixes some typos in TestStopFilter, it contains also some 
refactoring of the tests to make them more clear. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

2018-05-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470605#comment-16470605
 ] 

Diego Ceccarelli commented on SOLR-11831:
-

It applies only to distributed, I was discussing with [~romseygeek] about the 
possibility to do the same inside lucene for the non distributed case. 

>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> 
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Malvina Josephidou
>Priority: Minor
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

2018-01-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316605#comment-16316605
 ] 

Diego Ceccarelli commented on SOLR-11831:
-

patch is coming, give us a few minutes :D

>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> 
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Malvina Josephidou
>Priority: Minor
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-05 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313284#comment-16313284
 ] 

Diego Ceccarelli commented on LUCENE-8118:
--

I agree, that was just a workaround for [~laura-dietz] :) 

> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at 
> edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional

[jira] [Comment Edited] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-05 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312971#comment-16312971
 ] 

Diego Ceccarelli edited comment on LUCENE-8118 at 1/5/18 11:53 AM:
---

Looking at your code it seems that there is only one commit at the end, and 
your collection is big. What if you try to commit every, let's say, 50k docs?  


was (Author: diegoceccarelli):
Looking at your code it seems that there is only one commit at the end, and 
your collection is big. Could you please try to commit every, let's say, 50k 
docs?  

> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
>

[jira] [Commented] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-05 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312971#comment-16312971
 ] 

Diego Ceccarelli commented on LUCENE-8118:
--

Looking at your code it seems that there is only one commit at the end, and 
your collection is big. Could you please try to commit every, let's say, 50k 
docs?  

> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at 
> edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (SOLR-11804) Test RankQuery in distributed mode

2017-12-29 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-11804:

Description: Currently {{RankQuery}} is not tested in distribute mode. I 
added a few tests in `TestDistributedSearch` to check that it works properly in 
distributed mode.   (was: Currently RankQuery is not tested in distribute mode. 
I added a few tests in `TestDistributedSearch` to check that it works properly 
in distributed mode. )

> Test RankQuery in distributed mode
> --
>
> Key: SOLR-11804
> URL: https://issues.apache.org/jira/browse/SOLR-11804
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Diego Ceccarelli
>Priority: Minor
>
> Currently {{RankQuery}} is not tested in distribute mode. I added a few tests 
> in `TestDistributedSearch` to check that it works properly in distributed 
> mode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-11804) Test RankQuery in distributed mode

2017-12-29 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-11804:

Description: Currently {{RankQuery}} is not tested in distributed mode. I 
added a few tests in `TestDistributedSearch` to check that it works properly.  
(was: Currently {{RankQuery}} is not tested in distribute mode. I added a few 
tests in `TestDistributedSearch` to check that it works properly in distributed 
mode. )

> Test RankQuery in distributed mode
> --
>
> Key: SOLR-11804
> URL: https://issues.apache.org/jira/browse/SOLR-11804
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Diego Ceccarelli
>Priority: Minor
>
> Currently {{RankQuery}} is not tested in distributed mode. I added a few 
> tests in `TestDistributedSearch` to check that it works properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-11804) Test RankQuery in distributed mode

2017-12-29 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-11804:

Summary: Test RankQuery in distributed mode  (was: Test RankQuery query in 
distribute mode)

> Test RankQuery in distributed mode
> --
>
> Key: SOLR-11804
> URL: https://issues.apache.org/jira/browse/SOLR-11804
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Diego Ceccarelli
>Priority: Minor
>
> Currently RankQuery is not tested in distribute mode. I added a few tests in 
> `TestDistributedSearch` to check that it works properly in distributed mode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-11804) Test RankQuery query in distribute mode

2017-12-29 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created SOLR-11804:
---

 Summary: Test RankQuery query in distribute mode
 Key: SOLR-11804
 URL: https://issues.apache.org/jira/browse/SOLR-11804
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Diego Ceccarelli
Priority: Minor


Currently RankQuery is not tested in distribute mode. I added a few tests in 
`TestDistributedSearch` to check that it works properly in distributed mode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-11800) Improve error message when parsing RankQuery

2017-12-28 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-11800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-11800:

Description: When a user specifies something wrong for the parameter {{rq}} 
sometimes it is hard to understand where is the problem, this patch attempts to 
improve the error message returned in the response.  (was: when a user specify 
something wrong for the parameter rq sometimes it is hard to understand where 
is the problem, this patch attempts to improve the error message returned in 
the response.)

> Improve error message when parsing RankQuery
> 
>
> Key: SOLR-11800
> URL: https://issues.apache.org/jira/browse/SOLR-11800
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Diego Ceccarelli
>Priority: Minor
>
> When a user specifies something wrong for the parameter {{rq}} sometimes it 
> is hard to understand where is the problem, this patch attempts to improve 
> the error message returned in the response.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-11800) Improve error message when parsing RankQuery

2017-12-28 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created SOLR-11800:
---

 Summary: Improve error message when parsing RankQuery
 Key: SOLR-11800
 URL: https://issues.apache.org/jira/browse/SOLR-11800
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Diego Ceccarelli
Priority: Minor


when a user specify something wrong for the parameter rq sometimes it is hard 
to understand where is the problem, this patch attempts to improve the error 
message returned in the response.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10811) Speed up MultipleAdditiveTreesModel by using QuickScorer algorithm

2017-10-01 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187423#comment-16187423
 ] 

Diego Ceccarelli commented on SOLR-10811:
-

Please note that QuickScorer is undergoing a patent process. 
http://learningtorank.isti.cnr.it

> Speed up MultipleAdditiveTreesModel by using QuickScorer algorithm
> --
>
> Key: SOLR-10811
> URL: https://issues.apache.org/jira/browse/SOLR-10811
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Reporter: Yuki Yano
>Priority: Minor
> Attachments: quickscorer_model.pdf, SOLR-10811_master.patch, 
> SOLR-10811.patch
>
>
> QuickScorer is an algorithm which can calculate multiple additive trees fast 
> by using bitvectors for detecting target leaves.
> It was first published in SIGIR 2015 and won the best paper award of the 
> conference.
> refs: 
> http://zola.di.unipi.it/rossano/wp-content/papercite-data/pdf/sigir15.pdf
> We implemented QuickScorer as one of LTRScoringModel.
> This model uses same configuration of MultipleAdditiveTreesModel, thus it is 
> easy to replace the model.
> Our experiments show our model can calculate scores about twice faster than 
> MultipleAdditiveTreesModel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-11137) LTR: misleading error message when loading a model

2017-07-23 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created SOLR-11137:
---

 Summary: LTR: misleading error message when loading a model
 Key: SOLR-11137
 URL: https://issues.apache.org/jira/browse/SOLR-11137
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Diego Ceccarelli
Priority: Minor


Loading a model can fail for several reasons when calling the model 
constructor, but the error message always reports that the Model type does not 
exist.

https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/src/java/org/apache/solr/ltr/model/LTRScoringModel.java#L103



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10990) QueryComponent.process breakup (for readability)

2017-07-03 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072153#comment-16072153
 ] 

Diego Ceccarelli commented on SOLR-10990:
-

Could we move in a separate method also the grouping business 
https://github.com/apache/lucene-solr/blob/jira/solr-10990/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L355-L372
 ? 

> QueryComponent.process breakup (for readability)
> 
>
> Key: SOLR-10990
> URL: https://issues.apache.org/jira/browse/SOLR-10990
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>
> The method is currently very long i.e. 
> https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L300-L565
>  and breaking it up along logical lines (ids, grouped distributed first 
> phase, grouped distributed second phase, undistributed grouped, ungrouped) 
> would make it more readable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10710) LTR contrib failures

2017-05-24 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023755#comment-16023755
 ] 

Diego Ceccarelli commented on SOLR-10710:
-

Thanks [~tomasflobbe], I'll provide better tests! :) 

> LTR contrib failures
> 
>
> Key: SOLR-10710
> URL: https://issues.apache.org/jira/browse/SOLR-10710
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Reporter: Steve Rowe
>Priority: Blocker
> Fix For: master (7.0)
>
>
> Reproducing failures 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/1304/] - {{git 
> bisect}} says {{06a6034d9}}, the commit on LUCENE-7730, is where the 
> {{TestFieldLengthFeature.testRanking()}} failure started:
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestFieldLengthFeature -Dtests.method=testRanking 
> -Dtests.seed=740EF58DAA5926DA -Dtests.slow=true -Dtests.locale=ja-JP 
> -Dtests.timezone=America/Port_of_Spain -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] ERROR   0.06s J1 | TestFieldLengthFeature.testRanking <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: mismatch: '8'!='1' 
> @ response/docs/[0]/id
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([740EF58DAA5926DA:EB385C1332233915]:0)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:248)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:192)
>[junit4]>  at 
> org.apache.solr.ltr.feature.TestFieldLengthFeature.testRanking(TestFieldLengthFeature.java:117)
> {noformat}
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestParallelWeightCreation 
> -Dtests.method=testLTRScoringQueryParallelWeightCreationResultOrder 
> -Dtests.seed=740EF58DAA5926DA -Dtests.slow=true -Dtests.locale=ar-SD 
> -Dtests.timezone=Europe/Skopje -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] ERROR   1.59s J1 | 
> TestParallelWeightCreation.testLTRScoringQueryParallelWeightCreationResultOrder
>  <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: mismatch: '3'!='4' 
> @ response/docs/[0]/id
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([740EF58DAA5926DA:1142D5ED603B4132]:0)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:248)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:192)
>[junit4]>  at 
> org.apache.solr.ltr.TestParallelWeightCreation.testLTRScoringQueryParallelWeightCreationResultOrder(TestParallelWeightCreation.java:45)
> {noformat}
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestSelectiveWeightCreation 
> -Dtests.method=testSelectiveWeightsRequestFeaturesFromDifferentStore 
> -Dtests.seed=740EF58DAA5926DA -Dtests.slow=true -Dtests.locale=hr-HR 
> -Dtests.timezone=Australia/Victoria -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] ERROR   0.03s J1 | 
> TestSelectiveWeightCreation.testSelectiveWeightsRequestFeaturesFromDifferentStore
>  <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: mismatch: '3'!='4' 
> @ response/docs/[0]/id
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([740EF58DAA5926DA:293FE248276551B1]:0)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:248)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:192)
>[junit4]>  at 
> org.apache.solr.ltr.TestSelectiveWeightCreation.testSelectiveWeightsRequestFeaturesFromDifferentStore(TestSelectiveWeightCreation.java:230)
> {noformat}
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestLTRQParserPlugin -Dtests.method=ltrMoreResultsThanReRankedTest 
> -Dtests.seed=740EF58DAA5926DA -Dtests.slow=true -Dtests.locale=es-NI 
> -Dtests.timezone=Africa/Mogadishu -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] ERROR   0.03s J1 | 
> TestLTRQParserPlugin.ltrMoreResultsThanReRankedTest <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: mismatch: 
> '0.09271725'!='0.105360515' @ response/docs/[3]/score
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([740EF58DAA5926DA:BD7644EA7596711B]:0)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:248)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:192)
>[junit4]>  at 
> org.apache.solr.ltr.TestLTRQParserPlugin.ltrMoreResultsThanReRankedTest(TestLTRQParserPlugin.java:94)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (SOLR-10717) Learning to rank: a query will fail if the feature vector is requested without providing external feature information parameters

2017-05-19 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-10717:

Summary: Learning to rank: a query will fail if the feature vector is 
requested without providing external feature information parameters  (was: 
Learning to rank: Solr a query will fail if the feature vector is requested 
without providing external feature information parameters)

> Learning to rank: a query will fail if the feature vector is requested 
> without providing external feature information parameters
> 
>
> Key: SOLR-10717
> URL: https://issues.apache.org/jira/browse/SOLR-10717
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.5.1
>Reporter: Diego Ceccarelli
>Priority: Minor
>
> In ltr some features can depend on External Feature Informations that have to 
> be provided at query time. If we query solr only to retrieve the feature 
> vectors for the documents (without doing reranking), and without providing 
> all the external feature informations used in the feature store the query 
> will fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10717) Learning to rank: Solr a query will fail if the feature vector is requested without providing external feature information parameters

2017-05-19 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-10717:

Summary: Learning to rank: Solr a query will fail if the feature vector is 
requested without providing external feature information parameters  (was: 
Learning to rank: Solr query fails if you ask for the feature vector without 
providing external feature information parameters)

> Learning to rank: Solr a query will fail if the feature vector is requested 
> without providing external feature information parameters
> -
>
> Key: SOLR-10717
> URL: https://issues.apache.org/jira/browse/SOLR-10717
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.5.1
>Reporter: Diego Ceccarelli
>Priority: Minor
>
> In ltr some features can depend on External Feature Informations that have to 
> be provided at query time. If we query solr only to retrieve the feature 
> vectors for the documents (without doing reranking), and without providing 
> all the external feature informations used in the feature store the query 
> will fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10717) Learning to rank: Solr query fails if you ask for the feature vector without providing external feature information parameters

2017-05-19 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017651#comment-16017651
 ] 

Diego Ceccarelli commented on SOLR-10717:
-

I can see different ways to fix this:

1) Return only the features for which it was possible to compute the value 
(ignore, or set to NaN features for which we had errors)
2) Return a proper response error
3) Add a parameter ({{ignoreEfiErrors}}) that the user can set to {{true}} if 
s/he wants 1) otherwise 2). 

[~cpoerschke], [~alessandro.benedetti], [~mnilsson] comments? 

> Learning to rank: Solr query fails if you ask for the feature vector without 
> providing external feature information parameters
> --
>
> Key: SOLR-10717
> URL: https://issues.apache.org/jira/browse/SOLR-10717
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.5.1
>Reporter: Diego Ceccarelli
>Priority: Minor
>
> In ltr some features can depend on External Feature Informations that have to 
> be provided at query time. If we query solr only to retrieve the feature 
> vectors for the documents (without doing reranking), and without providing 
> all the external feature informations used in the feature store the query 
> will fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10717) Learning to rank: Solr query fails if you ask for the feature vector without providing external feature information parameters

2017-05-19 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-10717:

Priority: Minor  (was: Major)

> Learning to rank: Solr query fails if you ask for the feature vector without 
> providing external feature information parameters
> --
>
> Key: SOLR-10717
> URL: https://issues.apache.org/jira/browse/SOLR-10717
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.5.1
>Reporter: Diego Ceccarelli
>Priority: Minor
>
> In ltr some features can depend on External Feature Informations that have to 
> be provided at query time. If we query solr only to retrieve the feature 
> vectors for the documents (without doing reranking), and without providing 
> all the external feature informations used in the feature store the query 
> will fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10717) Learning to rank: Solr query fails if you ask for the feature vector without providing external feature information parameters

2017-05-19 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-10717:

Summary: Learning to rank: Solr query fails if you ask for the feature 
vector without providing external feature information parameters  (was: 
Learning to rank: Solr query fails if you ask for the feature vector without 
providing external feature informations)

> Learning to rank: Solr query fails if you ask for the feature vector without 
> providing external feature information parameters
> --
>
> Key: SOLR-10717
> URL: https://issues.apache.org/jira/browse/SOLR-10717
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.5.1
>Reporter: Diego Ceccarelli
>
> In ltr some features can depend on External Feature Informations that have to 
> be provided at query time. If we query solr only to retrieve the feature 
> vectors for the documents (without doing reranking), and without providing 
> all the external feature informations used in the feature store the query 
> will fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10717) Learning to rank: Solr query fails if you ask for the feature vector without providing external feature informations

2017-05-19 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created SOLR-10717:
---

 Summary: Learning to rank: Solr query fails if you ask for the 
feature vector without providing external feature informations
 Key: SOLR-10717
 URL: https://issues.apache.org/jira/browse/SOLR-10717
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 6.5.1
Reporter: Diego Ceccarelli


In ltr some features can depend on External Feature Informations that have to 
be provided at query time. If we query solr only to retrieve the feature 
vectors for the documents (without doing reranking), and without providing all 
the external feature informations used in the feature store the query will 
fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10710) LTR contrib failures

2017-05-19 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017608#comment-16017608
 ] 

Diego Ceccarelli commented on SOLR-10710:
-

Thanks Steve. This patch fixes the tests failing in LTR. There will be more 
work to do on the tests because some rely on absolute scores from Apache Solr, 
so change in the index / scoring function could break them (in this case it was 
LUCENE-7730). I would merge this to fix the problems with the tests failing, 
and then open a new Jira item to enhance the tests. 

> LTR contrib failures
> 
>
> Key: SOLR-10710
> URL: https://issues.apache.org/jira/browse/SOLR-10710
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Reporter: Steve Rowe
>
> Reproducing failures 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/1304/] - {{git 
> bisect}} says {{06a6034d9}}, the commit on LUCENE-7730, is where the 
> {{TestFieldLengthFeature.testRanking()}} failure started:
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestFieldLengthFeature -Dtests.method=testRanking 
> -Dtests.seed=740EF58DAA5926DA -Dtests.slow=true -Dtests.locale=ja-JP 
> -Dtests.timezone=America/Port_of_Spain -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] ERROR   0.06s J1 | TestFieldLengthFeature.testRanking <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: mismatch: '8'!='1' 
> @ response/docs/[0]/id
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([740EF58DAA5926DA:EB385C1332233915]:0)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:248)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:192)
>[junit4]>  at 
> org.apache.solr.ltr.feature.TestFieldLengthFeature.testRanking(TestFieldLengthFeature.java:117)
> {noformat}
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestParallelWeightCreation 
> -Dtests.method=testLTRScoringQueryParallelWeightCreationResultOrder 
> -Dtests.seed=740EF58DAA5926DA -Dtests.slow=true -Dtests.locale=ar-SD 
> -Dtests.timezone=Europe/Skopje -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] ERROR   1.59s J1 | 
> TestParallelWeightCreation.testLTRScoringQueryParallelWeightCreationResultOrder
>  <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: mismatch: '3'!='4' 
> @ response/docs/[0]/id
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([740EF58DAA5926DA:1142D5ED603B4132]:0)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:248)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:192)
>[junit4]>  at 
> org.apache.solr.ltr.TestParallelWeightCreation.testLTRScoringQueryParallelWeightCreationResultOrder(TestParallelWeightCreation.java:45)
> {noformat}
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestSelectiveWeightCreation 
> -Dtests.method=testSelectiveWeightsRequestFeaturesFromDifferentStore 
> -Dtests.seed=740EF58DAA5926DA -Dtests.slow=true -Dtests.locale=hr-HR 
> -Dtests.timezone=Australia/Victoria -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] ERROR   0.03s J1 | 
> TestSelectiveWeightCreation.testSelectiveWeightsRequestFeaturesFromDifferentStore
>  <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: mismatch: '3'!='4' 
> @ response/docs/[0]/id
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([740EF58DAA5926DA:293FE248276551B1]:0)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:248)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:192)
>[junit4]>  at 
> org.apache.solr.ltr.TestSelectiveWeightCreation.testSelectiveWeightsRequestFeaturesFromDifferentStore(TestSelectiveWeightCreation.java:230)
> {noformat}
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestLTRQParserPlugin -Dtests.method=ltrMoreResultsThanReRankedTest 
> -Dtests.seed=740EF58DAA5926DA -Dtests.slow=true -Dtests.locale=es-NI 
> -Dtests.timezone=Africa/Mogadishu -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] ERROR   0.03s J1 | 
> TestLTRQParserPlugin.ltrMoreResultsThanReRankedTest <<<
>[junit4]> Throwable #1: java.lang.RuntimeException: mismatch: 
> '0.09271725'!='0.105360515' @ response/docs/[3]/score
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([740EF58DAA5926DA:BD7644EA7596711B]:0)
>[junit4]>  at 
> org.apache.solr.util.RestTestBase.assertJQ(RestTestBase.java:248)
>[junit4]>  at 
>

[jira] [Updated] (SOLR-10703) Add prepare() and finish() into DocTransformer

2017-05-18 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-10703:

Description: This patch adds a {{prepare}} and a {{finish}} method to the 
interface of {{DocTransformer}} allowing a developer to perform actions 
before/after a doc transformer is applied to a result set. My use case was to 
benchmark the performance of a transformer, since transformer time is not part 
of {{QTime}}.   (was: This patch add a `prepare` and a `finish` method to the 
interface of `DocTransformer` allowing a developer to perform actions 
before/after a doc transformer is applied to a result set. My use case was to 
benchmark the performance of a transformer, since transformer time is not part 
of `QTime`. )

> Add prepare() and finish() into DocTransformer 
> ---
>
> Key: SOLR-10703
> URL: https://issues.apache.org/jira/browse/SOLR-10703
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master (7.0)
>
>
> This patch adds a {{prepare}} and a {{finish}} method to the interface of 
> {{DocTransformer}} allowing a developer to perform actions before/after a doc 
> transformer is applied to a result set. My use case was to benchmark the 
> performance of a transformer, since transformer time is not part of 
> {{QTime}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10703) Add prepare() and finish() into DocTransformer

2017-05-18 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created SOLR-10703:
---

 Summary: Add prepare() and finish() into DocTransformer 
 Key: SOLR-10703
 URL: https://issues.apache.org/jira/browse/SOLR-10703
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Diego Ceccarelli
Priority: Minor
 Fix For: master (7.0)


This patch add a `prepare` and a `finish` method to the interface of 
`DocTransformer` allowing a developer to perform actions before/after a doc 
transformer is applied to a result set. My use case was to benchmark the 
performance of a transformer, since transformer time is not part of `QTime`. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8776) Support RankQuery in grouping

2017-05-11 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8776:
---
Fix Version/s: (was: 6.0)
   master (7.0)

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2017-05-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005060#comment-16005060
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 5/10/17 5:40 PM:
-

Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic to rerank groups and not only documents: for example if 
a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.
(it's worth to note that for simplicity, in distribute mode first pass will 
retrieve the top 100 groups from all the shards, the federator will compute the 
top 100 groups and send them to the shards to get the reranking scores, and 
finally the federator will return the top 10) 

IMO the patch is now complete and I've working unit tests. Please, can someone 
review it? 





was (Author: diegoceccarelli):
Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic to rerank groups and not only documents: for example if 
a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.
(it's worth to note that for simplicity, in distribute mode first pass will 
retrieve the top 100 groups from all the shards, the federator will compute the 
top 100 groups to the shards to get the reranking scores, and finally the 
federator will select the top 10) 

IMO the patch is now complete and I've working unit tests. Please, can someone 
review it? 




> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
>

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2017-05-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005060#comment-16005060
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 5/10/17 5:39 PM:
-

Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic to rerank groups and not only documents: for example if 
a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.
(it's worth to note that for simplicity, in distribute mode first pass will 
retrieve the top 100 groups from all the shards, the federator will compute the 
top 100 groups to the shards to get the reranking scores, and finally the 
federator will select the top 10) 

IMO the patch is now complete and I've working unit tests. Please, can someone 
review it? 





was (Author: diegoceccarelli):
Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic of reranking to rerank groups, for example if a user 
ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.

IMO the patch is now complete and I've working unit tests. Please, can someone 
review my patch? 




> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2017-05-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005060#comment-16005060
 ] 

Diego Ceccarelli commented on SOLR-8776:


Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), 
highlights: 

[~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping 
code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} 
to inject my own {{GroupReducer}}. Could you please take a look at let me know 
if it makes sense? also in 
[SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54]
 
{code:title=SecondPassGroupingCollector.java|borderStyle=solid}
public SecondPassGroupingCollector(GroupSelector groupSelector, 
Collection groups, GroupReducer reducer) {

//System.out.println("SP init");
//Do we want to check if groups is null here? instead of checking at line 
62?
if (groups.isEmpty()) {
  throw new IllegalArgumentException("no groups to collect (groups is 
empty)");
}

this.groupSelector = Objects.requireNonNull(groupSelector);
this.groupSelector.setGroups(groups);
this.groups = Objects.requireNonNull(groups);
{code}

I would check if {{groups != null}} before {{groups.isEmpty()}}.

2.  I changed the logic of reranking to rerank groups, for example if a user 
ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank 
reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}:
  * the top 100 groups matching {{greeting}} are retrieved;
  * top 100 groups are reranked by {{rqq}};
  * finally the top 10 reranked groups are returned;
  * inside each group documents will be reranked as well.

IMO the patch is now complete and I've working unit tests. Please, can someone 
review my patch? 




> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10607) Improve RTimerTree documentation

2017-05-04 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-10607:

Attachment: 0001-SOLR-10607-Improve-RTimerTree-documentation.patch

> Improve RTimerTree documentation
> 
>
> Key: SOLR-10607
> URL: https://issues.apache.org/jira/browse/SOLR-10607
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-SOLR-10607-Improve-RTimerTree-documentation.patch
>
>
> The comment of public {{RTimerTree sub(String desc)}} is a bit misleading, 
> stating that the method creates new subtimer with given name and status 
> {{START}}, this is partially true because if the timer was previously 
> requested and stopped, the old timer will be returned with status {{STOP}}. I 
> changed the comment in order to clarify the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10607) Improve RTimerTree documentation

2017-05-04 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created SOLR-10607:
---

 Summary: Improve RTimerTree documentation
 Key: SOLR-10607
 URL: https://issues.apache.org/jira/browse/SOLR-10607
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Diego Ceccarelli
Priority: Minor


The comment of public {{RTimerTree sub(String desc)}} is a bit misleading, 
stating that the method creates new subtimer with given name and status 
{{START}}, this is partially true because if the timer was previously requested 
and stopped, the old timer will be returned with status {{STOP}}. I changed the 
comment in order to clarify the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (LUCENE-7816) Improve RTimerTree documentation

2017-05-04 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli closed LUCENE-7816.

Resolution: Invalid

> Improve RTimerTree documentation
> 
>
> Key: LUCENE-7816
> URL: https://issues.apache.org/jira/browse/LUCENE-7816
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-LUCENE-7816-Improve-RTimerTree-documentation.patch
>
>
> The comment of {{public RTimerTree sub(String desc)}} is a bit misleading, 
> stating that the method {{creates new subtimer with given name}} and status 
> {{START}}, this is partially true because if the timer was previously 
> requested and stopped, the old timer will be returned with status {{STOP}}. I 
> changed the comment in order to clarify the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7816) Improve RTimerTree documentation

2017-05-04 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated LUCENE-7816:
-
Description: The comment of {{public RTimerTree sub(String desc)}} is a bit 
misleading, stating that the method {{creates new subtimer with given name}} 
and status {{START}}, this is partially true because if the timer was 
previously requested and stopped, the old timer will be returned with status 
{{STOP}}. I changed the comment in order to clarify the behaviour.  (was: 
`public RTimerTree sub(String desc)` is a bit misleading, stating that the 
method 'creates new subtimer with given name' and status 'START', this is 
partially true because if the timer was previously requested and stopped, the 
old timer will be returned with status 'STOP'. I changed the comment in order 
to clarify the behaviour.)

> Improve RTimerTree documentation
> 
>
> Key: LUCENE-7816
> URL: https://issues.apache.org/jira/browse/LUCENE-7816
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-LUCENE-7816-Improve-RTimerTree-documentation.patch
>
>
> The comment of {{public RTimerTree sub(String desc)}} is a bit misleading, 
> stating that the method {{creates new subtimer with given name}} and status 
> {{START}}, this is partially true because if the timer was previously 
> requested and stopped, the old timer will be returned with status {{STOP}}. I 
> changed the comment in order to clarify the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7816) Improve RTimerTree documentation

2017-05-04 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated LUCENE-7816:
-
Attachment: 0001-LUCENE-7816-Improve-RTimerTree-documentation.patch

> Improve RTimerTree documentation
> 
>
> Key: LUCENE-7816
> URL: https://issues.apache.org/jira/browse/LUCENE-7816
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-LUCENE-7816-Improve-RTimerTree-documentation.patch
>
>
> `public RTimerTree sub(String desc)` is a bit misleading, stating that the 
> method 'creates new subtimer with given name' and status 'START', this is 
> partially true because if the timer was previously requested and stopped, the 
> old timer will be returned with status 'STOP'. I changed the comment in order 
> to clarify the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-7816) Improve RTimerTree documentation

2017-05-04 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created LUCENE-7816:


 Summary: Improve RTimerTree documentation
 Key: LUCENE-7816
 URL: https://issues.apache.org/jira/browse/LUCENE-7816
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Diego Ceccarelli
Priority: Minor


`public RTimerTree sub(String desc)` is a bit misleading, stating that the 
method 'creates new subtimer with given name' and status 'START', this is 
partially true because if the timer was previously requested and stopped, the 
old timer will be returned with status 'STOP'. I changed the comment in order 
to clarify the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2017-04-03 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953586#comment-15953586
 ] 

Diego Ceccarelli commented on SOLR-8776:


Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162) now 
it supports reranking by field and by function query, and reranking by field in 
distribute setting - I have working tests for all the use cases. 
There are still some details to fix:

  * I had to remove final from {{GroupDocs::maxScore}} and {{GroupDocs::score}} 
(I can easily fix this I think)
  * {{Rerank(Function|Group)SecondPassGroupingCollector}} have the number of 
documents to rerank hardcoded in the class because {{RankQuery}} doesn't expose 
that value (I think it should)

[~joel.bernstein], [~martijn.v.groningen], [~romseygeek] any feedback? 

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10359) User Events Logger Component

2017-03-25 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941694#comment-15941694
 ] 

Diego Ceccarelli commented on SOLR-10359:
-

Thanks for opening this item, I like the idea and I would be happy to help. 

@[~arafalov]
{quote}
The second one seems to be happening well out of Solr control (UI clicks, what 
user selected, etc). I am not sure if that fits into Solr itself. Commercial 
platforms (such as Fusion) might be integrating it, but they control more of a 
stack.
{quote}

Solr could expose an API (e.g. {{addUserInteraction}}) that could be called by 
the UI when the user interacts with the results. 

I like the idea of {{storeDir}} in the configuration, that would allow also to 
import/export the collection if there's the need to reindex the 
collection. 

Random thoughts/questions?:
  * how to create a unique search id? (should be responsability of solr? I 
think yes)
  * if I want to use metric like the {{CTR}} (i.e., Click Through Rate, 
{{number of clicks / number of impressions}})  in the scoring formula how can I 
do that without joining the two collections? ( (maybe that could be a way to 
'import' a particular metric into the main collection? )  
  * how this could work in case of multiple shards? 
  * it should be easy to implement complex metrics that are computed from 
simple metrics, some examples: *1.* the click through rate: for a document,  or 
a document and a particular query, collect the number of clicks and divide by 
the number of impressions (ignoring multiple requests from the same user? *2.* 
time spent on a document after a query: if a log time of click and time of 
closure of a document, I want to compute how much time the users spent on the 
document *3.* number of clicks per query.
   

with respect to the data model, I would add: 
* a {{user-id}}
* a blob containing an optional payload 
* score of the document




> User Events Logger Component
> 
>
> Key: SOLR-10359
> URL: https://issues.apache.org/jira/browse/SOLR-10359
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Alessandro Benedetti
>  Labels: CTR, evaluation
>
> *Introduction*
> Being able to evaluate the quality of your search engine is becoming more and 
> more important day by day.
> This issue is to put a milestone to integrate online evaluation metrics with 
> Solr.
> *Scope*
> Scope of this issue is to provide a set of components able to :
> 1) Collect Search Results impressions ( results shown per query)
> 2) Collect Users events ( user interactions on the search results per query 
> e.g. clicks, bookmarking,ect )
> 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ...
> *Technical Design*
> A SearchComponent can be designed :
> *UsersEventsLoggerComponent*
> A property (such as storeDir) will define where the data collected will be 
> stored.
> Different data structures can be explored, to keep it simple, a first 
> implementation can be a Lucene Index.
> *Data Model*
> The user event can be modelled in the following way :
>  - the user query the event is related to
>  - the ID of the search result involved in the interaction
>  - the position in the ranking of the search result involved 
> in the interaction
>  - time when the interaction happened
>  - 0 for impressions, a value between 1-5 to identify the 
> type of user event, the semantic will depend on the domain and use cases
>  - this can identify a variant, in A/B testing
> *Impressions Logging*
> When the SearchComponent  is assigned to a request handler, everytime it 
> processes a request and return to the user a result set for a query, the 
> component will collect the impressions ( results returned) and index them in 
> the auxiliary lucene index.
> This will happen in parallel as soon as you return the results to avoid 
> affecting the query time.
> Of course an impact on CPU load and memory is expected, will be interesting 
> to minimise it.
> * User Events Logging *
> An UpdateHandler will be exposed to accept POST requests and collect user 
> events.
> Everytime a request is sent, the user event will be indexed in the underline 
> auxiliary Lucene Index.
> * Stats Calculation *
> A RequestHandler will be exposed to be able to calculate stats and 
> aggregations for the metrics :
> /evaluation?metric=ctr=query=testA,testB
> This request could calculate the CTR for our testA and testB to compare.
> Showing stats in total and per query ( to highlight the queries with 
> lower/higher CTR).
> The calculations will happen separating the  for an easy 
> comparison.
> Will be important to keep it as simple as possible for a first version, to 
> then extend it as much as we like



--
This message was sent by Atlassian

[jira] [Comment Edited] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-06 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805240#comment-15805240
 ] 

Diego Ceccarelli edited comment on SOLR-9929 at 1/6/17 6:41 PM:


Thanks [~jefferyyuan] for opening the issue, I submitted a patch to the 
learning to rank example readme, trying to explain better how a user can 
produce a training set from feedback data. The new version is available here: 
https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md

Please let me know if you have comments or more questions. Thanks! 


was (Author: diegoceccarelli):
Improve Learning to Rank example readme

> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>Assignee: Christine Poerschke
>  Labels: learning-to-rank, machine_learning, solr
> Fix For: master (7.0), 6.4
>
> Attachments: 0001-Improve-Learning-to-Rank-example-Readme.patch
>
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-06 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-9929:
---
Attachment: 0001-Improve-Learning-to-Rank-example-Readme.patch

Improve Learning to Rank example readme

> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>Assignee: Christine Poerschke
>  Labels: learning-to-rank, machine_learning, solr
> Fix For: master (7.0), 6.4
>
> Attachments: 0001-Improve-Learning-to-Rank-example-Readme.patch
>
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8542) Integrate Learning to Rank into Solr

2016-05-27 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8542:
---
Attachment: (was: README.md)

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-8542-branch_5x.patch, SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-05-26 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302188#comment-15302188
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 5/26/16 3:01 PM:
-

Thanks [~aanilpala], a file was missing in the patch, I just submitted a new 
patch with the missing file, and I tested it on the latest upstream version 
(last commit 268da5be4), please do not hesitate to contact me if you have 
comments :) 


was (Author: diegoceccarelli):
add Add RerankTermSecondPassGroupingCollector


> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8776) Support RankQuery in grouping

2016-05-26 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8776:
---
Attachment: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch

add Add RerankTermSecondPassGroupingCollector


> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: 6.0
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2016-04-21 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252110#comment-15252110
 ] 

Diego Ceccarelli commented on SOLR-8542:


Great! 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2016-04-21 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252108#comment-15252108
 ] 

Diego Ceccarelli commented on SOLR-8542:


Thanks Alessandro, 
Please refer to the plugin master branch 
https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-rfc, we are 
going to merge there Christine's changes. 

> How can I contribute back improvements/bug-fix ?
Github PR are welcome. 

>Could make sense to create a separate repo, containing only the plugin, self 
>contained without the entire Solr.

I'm not against having a separate repo only with the plugin. what do you think 
[~cpoerschke]? 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2016-03-14 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193892#comment-15193892
 ] 

Diego Ceccarelli commented on SOLR-8776:


Update: I found a way to change the behavior of the collectors without moving 
RankQuery into Lucene. This new patch performs the group reranking without 
changing Lucene. The only difference is that if the Query is a RankQuery 
instead of using {{TermSecondPassGroupingCollector}} I'll use a 
{{RerankTermSecondPassGroupingCollector}}. The new collector will scan the 
groups collectors and wrap them in 'rerank collectors': 
{code:java}

for (SearchGroup group : groups) {
if (query != null) {
  collector = groupMap.get(group.groupValue).collector;
  collector = query.getTopDocsCollector(collector, groupSort, searcher);
  groupMap.put(group.groupValue, new 
SearchGroupDocs(group.groupValue, collector));
}
}
{code}

 

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: master
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8776) Support RankQuery in grouping

2016-03-14 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8776:
---
Attachment: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: master
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2016-03-11 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190864#comment-15190864
 ] 

Diego Ceccarelli commented on SOLR-8776:


I uploaded a new patch, now groups are reranked according to the reranking max 
scores, in the {{finish()}} method of the grouping {{CommandField}} I added: 

{code:java}
if (result != null && query instanceof RankQuery && groupSort == 
Sort.RELEVANCE){
// if we are sorting for relevance and query is a RankQuery, it may be 
that
// the order of the groups changed, we need to reorder
GroupDocs[] groups = result.groups;
Arrays.sort(groups, new Comparator() {
  @Override
  public int compare(GroupDocs o1, GroupDocs o2) {
  if (o1.maxScore > o2.maxScore) return -1;
  if (o1.maxScore < o2.maxScore) return 1; 
  return 0;
  }});
  }
{code}

This will reorder the groups if we re-rank the documents with the rank query. 
The second test succeeds. 

I'm still thinking what it should be the correct semantic to implement 
reranking + grouping: 

When you apply a query {{q}} and then a rank-query {{rq}} , you first score all 
the documents and then rescore top-N documents with the rank-query. The problem 
with grouping is that in order to get the top-groups you first need to score 
the collection: you may have a document that scored really low with {{q}} but 
got a high score with {{rq}}, but the only way to find it is to rerank the 
whole collection (impracticable). There are two possible solutions then:
  - if we want to apply {{rq}} on the top 1000 documents, we can collect the 
groups in the top-1000 documents, and they will be the same obtained scoring 
directly with {{rq}}, but in a different order;
  - we can collect more groups than what we need, and then rerank the top 
documents in each group - I would call this solution: **Group Reranking**.

In my opinion group reranking is a better solution: imagine we have a group 
containing the top-1000 documents ranked with {{q}} we will rerank them maybe 
just to return one document. I guess the best would be, assuming that we want 
to apply rerank query to N documents and return the top K groups you can 
retrieve top K*y groups and then rerank N/(K*y) documents in each group.



> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: master
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8776) Support RankQuery in grouping

2016-03-11 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8776:
---
Attachment: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: master
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-11 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/11/16 12:21 PM:
--

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
{{IndexSearcher}} and {{SolrIndexSearcher}}, I moved {{RankQuery}} into Lucene 
and created {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works by 
manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment  in {{SolrIndexSearcher}} there's a special case if a query is a 
{{RankQuery}},
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and TOP-n documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so "group-reranking" 
should: 
   *  in the first stage, iterate on the documents scoring them as usual and 
keep a map {{ score>}};
   * for each group, apply RankQuery to the top documents in the group;
   * rerank the groups according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the {{AbstractSecondPassGroupingCollector}} is that for 
each group a collector is created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score (I 
added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. Otherwise {{RankQuery}} could become an 
interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in Solr but used only for getting {{Sort}}, {{len}} was 
never used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 
  - Please keep in mind that, as starting point, I'm trying to solve the issue 
in the non distributed setting and if we're grouping on a field. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works by 
manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd)

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:26 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works by 
manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and TOP-n documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so "group-reranking" 
should: 
   *  in the first stage, iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, apply RankQuery to the top documents in the group;
   * rerank the groups according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the {{AbstractSecondPassGroupingCollector}} is that for 
each group a collector is created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score (I 
added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. Otherwise {{RankQuery}} could become an 
interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in Solr but used only for getting {{Sort}}, {len} was 
never used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:16 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and top-k documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   *  in the first stage, we iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, RankQuery is applied to the top documents in the group;
   * groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:16 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and TOP-n documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   *  in the first stage, we iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, RankQuery is applied to the top documents in the group;
   * groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:13 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   *  in the first stage, we iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, RankQuery is applied to the top documents in the group;
   * groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery)

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:12 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{ score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   *  in the first stage, we iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, RankQuery is applied to the top documents in the group;
   * groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:10 PM:
-

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created lucene {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works 
by manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map { score>} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so the 
"group-reranking"
should: 

   1 in the first stage, we iterate on the documents scoring them as usual and 
keep a map {group -> score>};
   2 for each group, RankQuery is applied to the top documents in the group;
   3 groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the 
{{AbstractSecondPassGroupingCollector}} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. {{RankQuery}} could become an interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in solr but used only for getting {{Sort}}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {MergeStrategy}. I uploaded 
a new patch with a first step. I agree that merge strategy must stay there, 
that's why I wrote "partially moved" :)   as well as there's IndexSearcher and 
SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene 
{SolrRankQuery}.  The reason is that the {RankQuery} works by manipulating the 
collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd,

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:08 PM:
-

[~joel.bernstein] thanks for pointing out about the {MergeStrategy}. I uploaded 
a new patch with a first step. I agree that merge strategy must stay there, 
that's why I wrote "partially moved" :)   as well as there's IndexSearcher and 
SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene 
{SolrRankQuery}.  The reason is that the {RankQuery} works by manipulating the 
collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a top collector using the {TopScoreDocCollector.create}, we 
wrap a topScoreCollector into a 'RankQuery' collector.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map { score>} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{Abstract(First|Second)PassGroupingCollector} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind RankQuery is that you don't want to 
apply the query to all the documents in the collection, so the "group-reranking"
should: 

   1 in the first stage, we iterate on the documents scoring them as usual and 
keep a map {group -> score>};
   2 for each group, RankQuery is applied to the top documents in the group;
   3 groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, 
because what happens in the 
{AbstractSecondPassGroupingCollector} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
RankQuery into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to 
check if it is a problem. RankQuery could become an interface maybe.
  - I did some changes to the interface of {RankQuery.getTopDocsCollector}: 
{QueryCommand} was in solr but used only for getting {Sort}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {RankQuery}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the MergeStrategy. I uploaded a 
new patch with a first step.
I agree that merge strategy must stay there, that's why I wrote "partially 
moved" :)  
as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in 
Lucene and created lucene {SolrRankQuery}. 
The reason is that the {RankQuery} works by manipulating the collector, through 
this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552
 ] 

Diego Ceccarelli commented on SOLR-8776:


[~joel.bernstein] thanks for pointing out about the MergeStrategy. I uploaded a 
new patch with a first step.
I agree that merge strategy must stay there, that's why I wrote "partially 
moved" :)  
as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in 
Lucene and created lucene {SolrRankQuery}. 
The reason is that the {RankQuery} works by manipulating the collector, through 
this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

Query q = cmd.getQuery();
if (q instanceof RankQuery) {
  RankQuery rq = (RankQuery) q;
  return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}

Instead of creating a topCollector using the {TopScoreDocCollector.create}, we 
wrap a topScoreCollector into a ReRanking 
collector.

Let me remind that grouping works in two separate stages:
  1. in the first stage, we iterate on the documents scoring them and keep a 
map { score>} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
  2. for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{Abstract(First|Second)PassGroupingCollector} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind RankQuery is that you don't want to 
apply the query to all the documents in the collection, so the "group-reranking"
should: 

   1 in the first stage, we iterate on the documents scoring them as usual and 
keep a map {group -> score>};
   2 for each group, RankQuery is applied to the top documents in the group;
   3 groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, 
because what happens in the 
{AbstractSecondPassGroupingCollector} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup group : groups) {
  //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
  TopDocsCollector collector;
  if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
RankQuery into lucene I modified the code in: 

{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
  collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
}
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to 
check if it is a problem. RankQuery could become an interface maybe.
  - I did some changes to the interface of {RankQuery.getTopDocsCollector}: 
{QueryCommand} was in solr but used only for getting {Sort}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {RankQuery}. 

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: master
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch

[jira] [Updated] (SOLR-8776) Support RankQuery in grouping

2016-03-10 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8776:
---
Attachment: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: master
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2016-03-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184876#comment-15184876
 ] 

Diego Ceccarelli commented on SOLR-8542:


I had the same idea. My only concern is: would then be possible to update the 
{{solrconfig.xml}} without bouncing Solr? with the managed resources we would 
be able to add a feature/model at runtime and start to use it. Would be 
possible to get the same behavior with the solr config? (...and first, do we 
want it? :) ) 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-03-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184865#comment-15184865
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 3/8/16 12:49 PM:
-

Alessandro, thanks for the questions: 

  # At the moment RankQuery (on which LTR relies) is not supported in grouping 
(but we are working on that - see SOLR-8776), I think the correct solution 
would be to perform the steps 1,2,3. Maybe we can move the discussion on 
SOLR-8776 since it affects, in general, RankQueries and grouping. The easy 
solution is to use collapsing instead of grouping, collapsing is supported by 
RankQuery and we tested that LTR works as well.  
  # Join - Parent Search.  I would if RankQuery supports block join, it should 
work, but we didn't check.


was (Author: diegoceccarelli):
Alessandro, thanks for the questions: 

  # At the moment RankQuery (on which LTR relies) is not supported in grouping 
(but we are working on that - see SOLR-8776), I think the correct solution 
would be to perform the steps 1,2,3. Maybe we can move the discussion on 
SOLR-8776 since it affects, in general, RankQueries and grouping. The easy 
solution is to use collapsing instead of grouping, collapsing is supported by 
RankQuery and we tested that LTR works as well.  

  # Join - Parent Search.  I would if RankQuery supports block join, it should 
work, but we didn't check.

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2016-03-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184865#comment-15184865
 ] 

Diego Ceccarelli commented on SOLR-8542:


Alessandro, thanks for the questions: 

  # At the moment RankQuery (on which LTR relies) is not supported in grouping 
(but we are working on that - see SOLR-8776), I think the correct solution 
would be to perform the steps 1,2,3. Maybe we can move the discussion on 
SOLR-8776 since it affects, in general, RankQueries and grouping. The easy 
solution is to use collapsing instead of grouping, collapsing is supported by 
RankQuery and we tested that LTR works as well.  

  # Join - Parent Search.  I would if RankQuery supports block join, it should 
work, but we didn't check.

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-03-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184848#comment-15184848
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 3/8/16 12:38 PM:
-

We decided to decouple models and features because:

  * the general use case is that you use a particular model (+ relying on a set 
of features) to rank your documents, but you also want to compute (and log) new 
features for training a new model to use in the future. All the features in a 
feature store will be computed but the model will receive only the requested 
features (allowing also to update the feature store adding new features without 
affecting the model) 
  * two models could use the same feature, but normalize the feature values in 
a different way (see the {{Normalizer}} class) 


was (Author: diegoceccarelli):
We decided to decouple models and features because:

  1. the general use case is that you use a particular model (+ relying on a 
set of features) to rank your documents, but you also want to compute (and log) 
new features for training a new model to use in the future. All the features in 
a feature store will be computed but the model will receive only the requested 
features (allowing also to update the feature store adding new features without 
affecting the model) 
  2. two models could use the same feature, but normalize the feature values in 
a different way (see the {{Normalizer}} class) 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-03-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184848#comment-15184848
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 3/8/16 12:38 PM:
-

We decided to decouple models and features because:

  1. the general use case is that you use a particular model (+ relying on a 
set of features) to rank your documents, but you also want to compute (and log) 
new features for training a new model to use in the future. All the features in 
a feature store will be computed but the model will receive only the requested 
features (allowing also to update the feature store adding new features without 
affecting the model) 
  2. two models could use the same feature, but normalize the feature values in 
a different way (see the {{Normalizer}} class) 


was (Author: diegoceccarelli):
We decided to decouple models and features because a) the general use case is 
that you use a particular model (+ relying on a set of features) to rank your 
documents, but you also want to compute (and log) new features for training a 
new model to use in the future. All the features in a feature store will be 
computed but the model will receive only the requested features (allowing also 
to update the feature store adding new features without affecting the model) b) 
two models could use the same feature, but normalize the feature values in a 
different way (see the {{Normalizer}} class) 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-03-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184848#comment-15184848
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 3/8/16 12:37 PM:
-

We decided to decouple models and features because a) the general use case is 
that you use a particular model (+ relying on a set of features) to rank your 
documents, but you also want to compute (and log) new features for training a 
new model to use in the future. All the features in a feature store will be 
computed but the model will receive only the requested features (allowing also 
to update the feature store adding new features without affecting the model) b) 
two models could use the same feature, but normalize the feature values in a 
different way (see the {{Normalizer}} class) 


was (Author: diegoceccarelli):
we decided to decouple models and features because a) the general use case is 
that you use a particular model (+ relying on a set of features) to rank your 
documents, but you also want to compute (and log) new features for training a 
new model to use in the future. All the features in a feature store will be 
computed but the model will receive only the requested features (allowing also 
to update the feature store adding new features without affecting the model) b) 
two models could use the same feature, but normalize the feature values in a 
different way (see the {{Normalizer
 class) 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-03-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184848#comment-15184848
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 3/8/16 12:35 PM:
-

we decided to decouple models and features because a) the general use case is 
that you use a particular model (+ relying on a set of features) to rank your 
documents, but you also want to compute (and log) new features for training a 
new model to use in the future. All the features in a feature store will be 
computed but the model will receive only the requested features (allowing also 
to update the feature store adding new features without affecting the model) b) 
two models could use the same feature, but normalize the feature values in a 
different way (see the {{Normalizer
 class) 


was (Author: diegoceccarelli):
we decided to decouple models and features because a) the general use case is 
that you use a particular model (+ relying on a set of features) to rank your 
documents, but you also want to compute (and log) new features for training a 
new model to use in the future. All the features in a feature store will be 
computed but the model will receive only the requested features (allowing also 
to update the feature store adding new features without affecting the model) b) 
two models could use the same feature, but normalize the feature values in a 
different way (see the {Normalizer} class}) 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2016-03-08 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184848#comment-15184848
 ] 

Diego Ceccarelli commented on SOLR-8542:


we decided to decouple models and features because a) the general use case is 
that you use a particular model (+ relying on a set of features) to rank your 
documents, but you also want to compute (and log) new features for training a 
new model to use in the future. All the features in a feature store will be 
computed but the model will receive only the requested features (allowing also 
to update the feature store adding new features without affecting the model) b) 
two models could use the same feature, but normalize the feature values in a 
different way (see the {Normalizer} class}) 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8776) Support RankQuery in grouping

2016-03-02 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8776:
---
Attachment: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch

add a unit test that fails since grouping ignores the RankQuery

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: master
>Reporter: Diego Ceccarelli
>Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-8776) Support RankQuery in grouping

2016-03-02 Thread Diego Ceccarelli (JIRA)

Diego Ceccarelli created SOLR-8776:
--

 Summary: Support RankQuery in grouping
 Key: SOLR-8776
 URL: https://issues.apache.org/jira/browse/SOLR-8776
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: master
Reporter: Diego Ceccarelli
Priority: Minor
 Fix For: master


Currently it is not possible to use RankQuery [1] and Grouping [2] together 
(see also [3]). In some situations Grouping can be replaced by Collapse and 
Expand Results [4] (that supports reranking), but i) collapse cannot guarantee 
that at least a minimum number of groups will be returned for a query, and ii) 
in the Solr Cloud setting you will have constraints on how to partition the 
documents among the shards.

I'm going to start working on supporting RankQuery in grouping. I'll start 
attaching a patch with a test that fails because grouping does not support the 
rank query and then I'll try to fix the problem, starting from the non 
distributed setting (GroupingSearch).

My feeling is that since grouping is mostly performed by Lucene, RankQuery 
should be refactored and moved (or partially moved) there. 

Any feedback is welcome.

[1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
[2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
[3] 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
[4] https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-29 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123848#comment-15123848
 ] 

Diego Ceccarelli commented on SOLR-8542:


Hi Tommaso, It was removed during the transition from svn to git. We'll reopen 
the PR today. 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102768#comment-15102768
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 1/16/16 10:23 AM:
--

Thanks Christine and Shawn for your comments, 
The above patch for the current trunk fix the problems that you highlighted: 
   - now the README fits in 80 columns
   - {{ant validate}} works. 
   - {{solr/contrib/ltr/test-lib/jcl-over-slf4j-1.7.7.jar}} is not part of the 
patch

The patch also contains some example files and an explanation (reported in the 
JIRA description) on 
how to test the plugin on the techproducts example of Solr. 




was (Author: diegoceccarelli):
Thanks Christine and Shawn for your comments, 
The above patch for the current trunk fix the problems that you highlighted: 
   - now the README fits in 80 columns
   - ``ant validate`` works. 
   - ``solr/contrib/ltr/test-lib/jcl-over-slf4j-1.7.7.jar`` is not part of the 
patch

The patch also contains some example files and an explanation (reported in the 
JIRA description) on 
how to test the plugin on the techproducts example of Solr. 



> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103260#comment-15103260
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 1/16/16 5:15 PM:
-

Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[
{"name":"isInStock"},
{"name":"price"},
{"name":"originalScore"},
{"name":"productNameMatchQuery"}
],
"params":{
"model-file":"/data/ranking/ranking-GBDT.txt"
}
}
{code}

The plugin will create a RankLib model by using the model in 
{/data/ranking/ranking-GBDT.txt} and you'll be able 
to use it at ranking time using its name {ranklib-GBDT}, by adding the {ltr} 
param to the query: 

{code}
http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr 
model=ranklib-GBDT reRankDocs=25} 
{code}

At query time, the features {isInStock}, {price}, {originalScore}, and 
{productNameMatchQuery} will be computed and 
and provided in the {score(float[] modelFeatureValuesNormalized)} method in 
order to get the new predicted score 
for each document. If RankLib's licence is compatible I think we could plug 
this into the plugin. Any comments? 


was (Author: diegoceccarelli):
Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code:json}

{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103260#comment-15103260
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 1/16/16 5:13 PM:
-

Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular ranklib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code:json}

{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[
{"name":"isInStock"},
{"name":"price"},
{"name":"originalScore"},
{"name":"productNameMatchQuery"}
],
"params":{
"model-file":"/data/ranking/ranking-GBDT.txt"
}
}
{code}

The plugin will create a RankLib model by using the model in 
{/data/ranking/ranking-GBDT.txt} and you'll be able 
to use it at ranking time using its name {ranklib-GBDT}, by adding the {ltr} 
param to the query: 

{code}
http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr 
model=ranklib-GBDT reRankDocs=25} 
{code}

At query time, the features {isInStock}, {price}, {originalScore}, and 
{productNameMatchQuery} will be computed and 
and provided in the {score(float[] modelFeatureValuesNormalized)} method in 
order to get the new predicted score 
for each document. If RankLib's licence is compatible I think we could plug 
this into the plugin. Any comments? 


was (Author: diegoceccarelli):
Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(point);
}
  
}
{code}

This code will load a particular ranklib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code:json}

{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103260#comment-15103260
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 1/16/16 5:16 PM:
-

Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[
{"name":"isInStock"},
{"name":"price"},
{"name":"originalScore"},
{"name":"productNameMatchQuery"}
],
"params":{
"model-file":"/data/ranking/ranking-GBDT.txt"
}
}
{code}

The plugin will create a RankLib model by using the model in 
{/data/ranking/ranking-GBDT.txt} and you'll be able 
to use it at ranking time using its name {ranklib-GBDT}, by adding the {ltr} 
param to the query: 

{code}
http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr 
model=ranklib-GBDT reRankDocs=25} 
{code}

At query time, the features {{isInStock}} , {{price}} , {{originalScore}} , and 
{{productNameMatchQuery}} will be computed and 
and provided in the {{score(float[] modelFeatureValuesNormalized)}} method in 
order to get the new predicted score 
for each document. If RankLib's licence is compatible I think we could plug 
this into the plugin. Any comments? 


was (Author: diegoceccarelli):
Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103260#comment-15103260
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 1/16/16 5:16 PM:
-

Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[
{"name":"isInStock"},
{"name":"price"},
{"name":"originalScore"},
{"name":"productNameMatchQuery"}
],
"params":{
"model-file":"/data/ranking/ranking-GBDT.txt"
}
}
{code}

The plugin will create a RankLib model by using the model in 
{{/data/ranking/ranking-GBDT.txt}} and you'll be able 
to use it at ranking time using its name {{ranklib-GBDT}}, by adding the 
{{ltr}} param to the query: 

{code}
http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr 
model=ranklib-GBDT reRankDocs=25} 
{code}

At query time, the features {{isInStock}} , {{price}} , {{originalScore}} , and 
{{productNameMatchQuery}} will be computed and 
and provided in the {{score(float[] modelFeatureValuesNormalized)}} method in 
order to get the new predicted score 
for each document. If RankLib's licence is compatible I think we could plug 
this into the plugin. Any comments? 


was (Author: diegoceccarelli):
Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103260#comment-15103260
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 1/16/16 5:17 PM:
-

Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[
{"name":"isInStock"},
{"name":"price"},
{"name":"originalScore"},
{"name":"productNameMatchQuery"}
],
"params":{
"model-file":"/data/ranking/ranking-GBDT.txt"
}
}
{code}

The plugin will create a RankLib model by using the model in 
{{/data/ranking/ranking-GBDT.txt}} and you'll be able 
to use it at ranking time using its name {{ranklib-GBDT}}, adding the {{ltr}} 
param to the query: 

{code}
http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr 
model=ranklib-GBDT reRankDocs=25} 
{code}

At query time, the features {{isInStock}} , {{price}} , {{originalScore}} , and 
{{productNameMatchQuery}} will be computed and 
and provided in the {{score(float[] modelFeatureValuesNormalized)}} method in 
order to get the new predicted score 
for each document. If RankLib's licence is compatible I think we could plug 
this into the plugin. Any comments? 


was (Author: diegoceccarelli):
Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103260#comment-15103260
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 1/16/16 5:18 PM:
-

Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[
{"name":"isInStock"},
{"name":"price"},
{"name":"originalScore"},
{"name":"productNameMatchQuery"}
],
"params":{
"model-file":"/data/ranking/ranking-GBDT.txt"
}
}
{code}

The plugin will create a RankLib model by using the model in 
{{/data/ranking/ranking-GBDT.txt}} and you'll be able 
to use it at ranking time using its name {{ranklib-GBDT}}, adding the {{ltr}} 
param to the query: 

{code}
http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr 
model=ranklib-GBDT reRankDocs=25} 
{code}

At query time, the features {{isInStock}} , {{price}} , {{originalScore}} , and 
{{productNameMatchQuery}} will be computed and provided in the {{score(float[] 
modelFeatureValuesNormalized)}} method in order to get the new predicted score 
for each document. If RankLib's licence is compatible I think we could plug 
this into the plugin. Any comments? 


was (Author: diegoceccarelli):
Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code}
{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103260#comment-15103260
 ] 

Diego Ceccarelli commented on SOLR-8542:


Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(point);
}
  
}
{code}

This code will load a particular ranklib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code:json}

{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[
{"name":"isInStock"},
{"name":"price"},
{"name":"originalScore"},
{"name":"productNameMatchQuery"}
],
"params":{
"model-file":"/data/ranking/ranking-GBDT.txt"
}
}
{code}

The plugin will create a RankLib model by using the model in 
{/data/ranking/ranking-GBDT.txt} and you'll be able 
to use it at ranking time using its name {ranklib-GBDT}, by adding the {ltr} 
param to the query: 

{code}
http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr 
model=ranklib-GBDT reRankDocs=25} 
{code}

At query time, the features {isInStock}, {price}, {originalScore}, and 
{productNameMatchQuery} will be computed and 
and provided in the {score(float[] modelFeatureValuesNormalized)} method in 
order to get the new predicted score 
for each document. If RankLib's licence is compatible I think we could plug 
this into the plugin. Any comments? 

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5.

[jira] [Comment Edited] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-16 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103260#comment-15103260
 ] 

Diego Ceccarelli edited comment on SOLR-8542 at 1/16/16 5:14 PM:
-

Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular RankLib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code:json}

{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",
"features":[
{"name":"isInStock"},
{"name":"price"},
{"name":"originalScore"},
{"name":"productNameMatchQuery"}
],
"params":{
"model-file":"/data/ranking/ranking-GBDT.txt"
}
}
{code}

The plugin will create a RankLib model by using the model in 
{/data/ranking/ranking-GBDT.txt} and you'll be able 
to use it at ranking time using its name {ranklib-GBDT}, by adding the {ltr} 
param to the query: 

{code}
http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr 
model=ranklib-GBDT reRankDocs=25} 
{code}

At query time, the features {isInStock}, {price}, {originalScore}, and 
{productNameMatchQuery} will be computed and 
and provided in the {score(float[] modelFeatureValuesNormalized)} method in 
order to get the new predicted score 
for each document. If RankLib's licence is compatible I think we could plug 
this into the plugin. Any comments? 


was (Author: diegoceccarelli):
Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems 
quite related. 
We can plug RankLib creating a new class representing the new LTR model, 
extending 
[ModelMetadata|https://github.com/bloomberg/lucene-solr/blob/trunk-learning-to-rank-plugin/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/ModelMetadata.java],
 for example:

{code:java}
public class RankLibModel extends ModelMetadata {

Ranker rankLibRanker;
RankerFactory rankerFactory = new RankerFactory();
DenseDataPoint documentFeatures = new DenseDataPoint(); // this 
contructor is missing, we will need a way to create a datapoint

public RankLibModel(String name, String type, List features,
  String featureStoreName, Collection allFeatures,
  NamedParams params) {
  super(name, type, features, featureStoreName, allFeatures, 
params);
  // the  file containing the model is  a parameter
  String ranklibModelFile = getParams().getParam("model-file")
  // load the model
  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
}

@Override
public float score(float[] modelFeatureValuesNormalized) {
// set the feature vector in the datapoint object
documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
// predict the score using the ranklib model
return rankLibRanker.eval(documentFeatures);
}
  
}
{code}

This code will load a particular ranklib model, using the file specified into 
the model store configuration. 
If you send to Solr a model configuration file like this:

{code:json}

{
"type":"org.apache.solr.ltr.ranking.RankLibModel",
"name":"ranklib-GBDT",

[jira] [Updated] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-15 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8542:
---
Attachment: README.txt

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.txt, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-15 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8542:
---
Attachment: (was: README.txt)

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-15 Thread Diego Ceccarelli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-8542:
---
Attachment: README.md

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2016-01-15 Thread Diego Ceccarelli (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102768#comment-15102768
 ] 

Diego Ceccarelli commented on SOLR-8542:


Thanks Christine and Shawn for your comments, 
The above patch for the current trunk fix the problems that you highlighted: 
   - now the README fits in 80 columns
   - ``ant validate`` works. 
   - ``solr/contrib/ltr/test-lib/jcl-over-slf4j-1.7.7.jar`` is not part of the 
patch

The patch also contains some example files and an explanation (reported in the 
JIRA description) on 
how to test the plugin on the techproducts example of Solr. 



> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
> 
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on=test=json={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}=*,[features],price,score,name=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

89 matches

Mail list logo