[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...
Github user diegoceccarelli closed the pull request at: https://github.com/apache/lucene-solr/pull/4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...
Github user diegoceccarelli commented on the pull request: https://github.com/apache/lucene-solr/pull/4#issuecomment-222163577 thanks Alessandro, we integrated part of your PR in the new patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...
GitHub user mnilsson23 opened a pull request: https://github.com/apache/lucene-solr/pull/40 SOLR-8542: Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015. See the [README](https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-release/solr/contrib/ltr) for more information on how to get started. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr master-ltr-plugin-release Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/40.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #40 commit 073de9b2719abe91e106119b23b977e521e8b32f Author: Diego CeccarelliDate: 2016-01-13T22:29:17Z SOLR-8542: Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015 commit b2bbe8c13122280ee5a76149bfb55fd1b7324279 Author: Michael Nilsson Date: 2016-05-25T22:13:05Z Learning to Rank plugin updates - Updated our documentation about the training phase and how to train a real model for those that are not familiar with this process. We provided a step by step example building a rankSVM model externally, and supplied a sample script which does this using liblinear. - Formatted the code based on the lucene eclipse style - Updated the hashCode and equals functions of the ModelQuery as [~Alessandro.Benedetti] pointed out - Renamed ModelMetadata, the class you would subclass to add a new model for scoring docs, to LTRScoringAlgorithm - Cleaned up the LTRScoringAlgorithm to no longer have a type parameter - Added IntelliJ support. Thank you [~Alessandro.Benedetti] for adding it - Renamed mstore and fstore endpoints to feature-store and model-store as per [~Upayavira]'s suggestion - Added support for default efi parameters using the same Solr standard in solrconfig. When defining a feature in the config, put $\{isFromManchester:0\} to get 0 as a default, and you won't have to specify it in the request's efi params. Thanks for the enhancement suggestion [~Alessandro.Benedetti] - Removed the fv=true param requirement for extracting features. - You do not have to provide a "dummy model" first for extracting features, so you can request the transformer without the need of an rq ranking query. Inside the transformer you can provide a store=myFeatureStore param, and it will extract all features from that feature store directly. You can also provide local efi params if needed when extracting without an rq. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...
Github user alessandrobenedetti commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/4#discussion_r7481 --- Diff: solr/contrib/ltr/src/java/org/apache/solr/ltr/ranking/ModelQuery.java --- @@ -0,0 +1,540 @@ +package org.apache.solr.ltr.ranking; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.DisiPriorityQueue; +import org.apache.lucene.search.DisiWrapper; +import org.apache.lucene.search.DisjunctionDISIApproximation; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.Explanation; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.Scorer; +import org.apache.lucene.search.Weight; +import org.apache.lucene.search.Scorer.ChildScorer; +import org.apache.solr.ltr.feature.ModelMetadata; +import org.apache.solr.ltr.feature.norm.Normalizer; +import org.apache.solr.ltr.feature.norm.impl.IdentityNormalizer; +import org.apache.solr.ltr.log.FeatureLogger; +import org.apache.solr.request.SolrQueryRequest; + +/** + * The ranking query that is run, reranking results using the ModelMetadata + * algorithm + */ +public class ModelQuery extends Query { + + // contains a description of the model + protected ModelMetadata meta; + // feature logger to output the features. + private FeatureLogger fl = null; + // Map of external parameters, such as query intent, that can be used by + // features + protected Mapefi; + // Original solr query used to fetch matching documents + protected Query originalQuery; + // Original solr request + protected SolrQueryRequest request; + + public ModelQuery(ModelMetadata meta) { +this.meta = meta; + } + + public ModelMetadata getMetadata() { +return meta; + } + + public void setFeatureLogger(FeatureLogger fl) { +this.fl = fl; + } + + public FeatureLogger getFeatureLogger() { +return this.fl; + } + + public Collection getAllFeatures() { +return meta.getAllFeatures(); + } + + public void setOriginalQuery(Query mainQuery) { +this.originalQuery = mainQuery; + } + + public void setExternalFeatureInfo(Map externalFeatureInfo) { +this.efi = externalFeatureInfo; + } + + public void setRequest(SolrQueryRequest request) { +this.request = request; + } + + @Override + public int hashCode() { +final int prime = 31; +int result = super.hashCode(); +result = prime * result + ((meta == null) ? 0 : meta.hashCode()); +result = prime * result ++ ((originalQuery == null) ? 0 : originalQuery.hashCode()); +result = prime * result + ((efi == null) ? 0 : originalQuery.hashCode()); --- End diff -- I think this is a typo. It should be : result = prime * result + ((efi == null) ? 0 : efi.hashCode()); This is a small thing but actually currently make the system not usable when you experiment different refi variable values. Basically the cache is always hit, even if your refi variables change dynamically. Anyway is really a minimal fix :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail:
[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...
Github user alessandrobenedetti commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/4#discussion_r55499494 --- Diff: solr/contrib/ltr/README.txt --- @@ -0,0 +1,330 @@ +Apache Solr Learning to Rank + + +This is the main [learning to rank integrated into solr](http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp) +repository. +[Read up on learning to rank](https://en.wikipedia.org/wiki/Learning_to_rank) + +Apache Solr Learning to Rank (LTR) provides a way for you to extract features +directly inside Solr for use in training a machine learned model. You can then +deploy that model to Solr and use it to rerank your top X search results. + + +# Changes to solrconfig.xml +```xml + + ... + + + + + + + + + + + + + explicit + json + true + id + + + + ltrComponent + + + + +... + + + + + + + +``` + + +# Build the plugin +In the solr/contrib/ltr directory run +`ant dist` + +# Install the plugin +In your solr installation, navigate to your collection's lib directory. +In the solr install example, it would be solr/collection1/lib. +If lib doesn't exist you will have to make it, and then copy the plugin's jar there. + +`cp lucene-solr/solr/dist/solr-ltr-X.Y.Z-SNAPSHOT.jar mySolrInstallPath/solr/myCollection/lib` + +Restart your collection using the admin page and you are good to go. +You can find more detailed instructions [here](https://wiki.apache.org/solr/SolrPlugins). + + +# Defining Features +In the learning to rank plugin, you can define features in a feature space +using standard Solr queries. As an example: + +## features.json +```json +[ +{ "name": "isBook", + "type": "org.apache.solr.ltr.feature.impl.SolrFeature", + "params":{ "fq": ["{!terms f=category}book"] } +}, +{ + "name": "documentRecency", + "type": "org.apache.solr.ltr.feature.impl.SolrFeature", + "params": { + "q": "{!func}recip( ms(NOW,publish_date), 3.16e-11, 1, 1)" + } +}, +{ + "name":"originalScore", + "type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature", + "params":{} +}, +{ + "name" : "userTextTitleMatch", + "type" : "org.apache.solr.ltr.feature.impl.SolrFeature", + "params" : { "q" : "{!field f=title}${user_text}" } +} +] +``` + +Defines four features. Anything that is a valid Solr query can be used to define +a feature. + +### Filter Query Features +The first feature isBook fires if the term 'book' matches the category field +for the given examined document. Since in this feature q was not specified, +either the score 1 (in case of a match) or the score 0 (in case of no match) +will be returned. + +### Query Features +In the second feature (documentRecency) q was specified using a function query. +In this case the score for the feature on a given document is whatever the query +returns (1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs dated +2 years ago, etc..) . If both an fq and q is used, documents that don't match +the fq will receive a score of 0 for the documentRecency feature, all other +documents will receive the score specified by the query for this feature. + +### Original Score Feature +The third feature (originalScore) has no parameters, and uses the +OriginalScoreFeature class instead of the SolrFeature class. Its purpose is +to simply return the score for the original search request against the current +matching document. + +### External Features +Users can specify external information that can to be passed in as +part of the query to the ltr ranking framework. In this case, the +fourth feature (userTextPhraseMatch) will be looking for an external field +called 'user_text' passed in through the request, and will fire if there is +a term match for the document field 'title' from the value of the external +field 'user_text'. See the "Run a Rerank Query" section for how +to pass in external information. + +### Custom Features +Custom features can be created by extending from +org.apache.solr.ltr.ranking.Feature, however this is generally not recommended. +The majority of features should be possible to create using the methods described +above. + +# Defining Models +Currently the Learning to Rank plugin supports 2 main types of +ranking models: [Ranking
[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...
GitHub user diegoceccarelli opened a pull request: https://github.com/apache/lucene-solr/pull/4 SOLR-8542: Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr master-ltr-plugin-rfc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/4.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4 commit 1bee2ad0ce64b2f091e34f7fb42e00387616c987 Author: Diego CeccarelliDate: 2016-01-13T22:29:17Z SOLR-8542: Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...
GitHub user diegoceccarelli opened a pull request: https://github.com/apache/lucene-solr/pull/217 SOLR-8542: Integrate Learning to Rank into Solr See https://issues.apache.org/jira/i#browse/SOLR-8542 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-learning-to-rank-plugin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/217.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #217 commit 336db4ccf6434e690a745a4af88b5d9c21edc25e Author: Diego CeccarelliDate: 2016-01-13T22:29:17Z SOLR-8542: Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org