[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...

2016-05-27 Thread diegoceccarelli
Github user diegoceccarelli closed the pull request at:

https://github.com/apache/lucene-solr/pull/4


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...

2016-05-27 Thread diegoceccarelli
Github user diegoceccarelli commented on the pull request:

https://github.com/apache/lucene-solr/pull/4#issuecomment-222163577
  
thanks Alessandro, we integrated part of your PR in the new patch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...

2016-05-27 Thread mnilsson23
GitHub user mnilsson23 opened a pull request:

https://github.com/apache/lucene-solr/pull/40

SOLR-8542: Integrate Learning to Rank into Solr

Solr Learning to Rank (LTR) provides a way for you to extract features
directly inside Solr for use in training a machine learned model. You
can then deploy that model to Solr and use it to rerank your top X
search results. This concept was previously presented by the authors at
Lucene/Solr Revolution 2015.

See the 
[README](https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-release/solr/contrib/ltr)
 for more information on how to get started.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr 
master-ltr-plugin-release

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/40.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #40


commit 073de9b2719abe91e106119b23b977e521e8b32f
Author: Diego Ceccarelli 
Date:   2016-01-13T22:29:17Z

SOLR-8542: Integrate Learning to Rank into Solr

Solr Learning to Rank (LTR) provides a way for you to extract features
directly inside Solr for use in training a machine learned model. You
can then deploy that model to Solr and use it to rerank your top X
search results. This concept was previously presented by the authors at
Lucene/Solr Revolution 2015

commit b2bbe8c13122280ee5a76149bfb55fd1b7324279
Author: Michael Nilsson 
Date:   2016-05-25T22:13:05Z

Learning to Rank plugin updates

- Updated our documentation about the training phase and how to train a 
real model for those that are not familiar with this process.  We provided a 
step by step example building a rankSVM model externally, and supplied a sample 
script which does this using liblinear.
- Formatted the code based on the lucene eclipse style
- Updated the hashCode and equals functions of the ModelQuery as 
[~Alessandro.Benedetti] pointed out
- Renamed ModelMetadata, the class you would subclass to add a new model 
for scoring docs, to LTRScoringAlgorithm
- Cleaned up the LTRScoringAlgorithm to no longer have a type parameter
- Added IntelliJ support.  Thank you [~Alessandro.Benedetti] for adding it
- Renamed mstore and fstore endpoints to feature-store and model-store as 
per [~Upayavira]'s suggestion
- Added support for default efi parameters using the same Solr  standard in 
solrconfig.  When defining a feature in the config, put $\{isFromManchester:0\} 
to get 0 as a default, and you won't have to specify it in the request's efi 
params. Thanks for the enhancement suggestion [~Alessandro.Benedetti]
- Removed the fv=true param requirement for extracting features.
- You do not have to provide a "dummy model" first for extracting features, 
so you can request the transformer without the need of an rq ranking query.  
Inside the transformer you can provide a store=myFeatureStore param, and it 
will extract all features from that feature store directly.  You can also 
provide local efi params if needed when extracting without an rq.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...

2016-03-09 Thread alessandrobenedetti
Github user alessandrobenedetti commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/4#discussion_r7481
  
--- Diff: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/ranking/ModelQuery.java ---
@@ -0,0 +1,540 @@
+package org.apache.solr.ltr.ranking;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.DisiPriorityQueue;
+import org.apache.lucene.search.DisiWrapper;
+import org.apache.lucene.search.DisjunctionDISIApproximation;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.Explanation;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.Scorer;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.search.Scorer.ChildScorer;
+import org.apache.solr.ltr.feature.ModelMetadata;
+import org.apache.solr.ltr.feature.norm.Normalizer;
+import org.apache.solr.ltr.feature.norm.impl.IdentityNormalizer;
+import org.apache.solr.ltr.log.FeatureLogger;
+import org.apache.solr.request.SolrQueryRequest;
+
+/**
+ * The ranking query that is run, reranking results using the ModelMetadata
+ * algorithm
+ */
+public class ModelQuery extends Query {
+
+  // contains a description of the model
+  protected ModelMetadata meta;
+  // feature logger to output the features.
+  private FeatureLogger fl = null;
+  // Map of external parameters, such as query intent, that can be used by
+  // features
+  protected Map efi;
+  // Original solr query used to fetch matching documents
+  protected Query originalQuery;
+  // Original solr request
+  protected SolrQueryRequest request;
+
+  public ModelQuery(ModelMetadata meta) {
+this.meta = meta;
+  }
+
+  public ModelMetadata getMetadata() {
+return meta;
+  }
+
+  public void setFeatureLogger(FeatureLogger fl) {
+this.fl = fl;
+  }
+
+  public FeatureLogger getFeatureLogger() {
+return this.fl;
+  }
+
+  public Collection getAllFeatures() {
+return meta.getAllFeatures();
+  }
+
+  public void setOriginalQuery(Query mainQuery) {
+this.originalQuery = mainQuery;
+  }
+
+  public void setExternalFeatureInfo(Map 
externalFeatureInfo) {
+this.efi = externalFeatureInfo;
+  }
+
+  public void setRequest(SolrQueryRequest request) {
+this.request = request;
+  }
+
+  @Override
+  public int hashCode() {
+final int prime = 31;
+int result = super.hashCode();
+result = prime * result + ((meta == null) ? 0 : meta.hashCode());
+result = prime * result
++ ((originalQuery == null) ? 0 : originalQuery.hashCode());
+result = prime * result + ((efi == null) ? 0 : 
originalQuery.hashCode());
--- End diff --

I think this is a typo.
It should be :
result = prime * result + ((efi == null) ? 0 : efi.hashCode());

This is a small thing but actually currently make the system not usable 
when you experiment different refi variable values. Basically the cache is 
always hit, even if your refi variables change dynamically.
Anyway is really a minimal fix :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: 

[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...

2016-03-09 Thread alessandrobenedetti
Github user alessandrobenedetti commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/4#discussion_r55499494
  
--- Diff: solr/contrib/ltr/README.txt ---
@@ -0,0 +1,330 @@
+Apache Solr Learning to Rank
+
+
+This is the main [learning to rank integrated into 
solr](http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp)
+repository.
+[Read up on learning to 
rank](https://en.wikipedia.org/wiki/Learning_to_rank)
+
+Apache Solr Learning to Rank (LTR) provides a way for you to extract 
features
+directly inside Solr for use in training a machine learned model.  You can 
then
+deploy that model to Solr and use it to rerank your top X search results.
+
+
+# Changes to solrconfig.xml
+```xml
+
+  ...
+
+  
+  
+
+  
+  
+
+
+  
+  
+  
+
+  explicit
+  json
+  true
+  id
+
+
+  
+  ltrComponent
+
+  
+
+  
+...
+
+
+
+  
+
+
+
+```
+
+
+# Build the plugin
+In the solr/contrib/ltr directory run
+`ant dist`
+
+# Install the plugin
+In your solr installation, navigate to your collection's lib directory.
+In the solr install example, it would be solr/collection1/lib.
+If lib doesn't exist you will have to make it, and then copy the plugin's 
jar there.
+
+`cp lucene-solr/solr/dist/solr-ltr-X.Y.Z-SNAPSHOT.jar 
mySolrInstallPath/solr/myCollection/lib`
+
+Restart your collection using the admin page and you are good to go.
+You can find more detailed instructions 
[here](https://wiki.apache.org/solr/SolrPlugins).
+
+
+# Defining Features
+In the learning to rank plugin, you can define features in a feature space
+using standard Solr queries. As an example:
+
+## features.json
+```json
+[
+{ "name": "isBook",
+  "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
+  "params":{ "fq": ["{!terms f=category}book"] }
+},
+{
+  "name":  "documentRecency",
+  "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
+  "params": {
+  "q": "{!func}recip( ms(NOW,publish_date), 3.16e-11, 1, 1)"
+  }
+},
+{
+  "name":"originalScore",
+  "type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature",
+  "params":{}
+},
+{
+  "name" : "userTextTitleMatch",
+  "type" : "org.apache.solr.ltr.feature.impl.SolrFeature",
+  "params" : { "q" : "{!field f=title}${user_text}" }
+}
+]
+```
+
+Defines four features. Anything that is a valid Solr query can be used to 
define
+a feature.
+
+### Filter Query Features
+The first feature isBook fires if the term 'book' matches the category 
field
+for the given examined document. Since in this feature q was not specified,
+either the score 1 (in case of a match) or the score 0 (in case of no 
match)
+will be returned.
+
+### Query Features
+In the second feature (documentRecency) q was specified using a function 
query.
+In this case the score for the feature on a given document is whatever the 
query
+returns (1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs 
dated
+2 years ago, etc..) . If both an fq and q is used, documents that don't 
match
+the fq will receive a score of 0 for the documentRecency feature, all other
+documents will receive the score specified by the query for this feature.
+
+### Original Score Feature
+The third feature (originalScore) has no parameters, and uses the
+OriginalScoreFeature class instead of the SolrFeature class.  Its purpose 
is
+to simply return the score for the original search request against the 
current
+matching document.
+
+### External Features
+Users can specify external information that can to be passed in as
+part of the query to the ltr ranking framework. In this case, the
+fourth feature (userTextPhraseMatch) will be looking for an external field
+called 'user_text' passed in through the request, and will fire if there is
+a term match for the document field 'title' from the value of the external
+field 'user_text'. See the "Run a Rerank Query" section for how
+to pass in external information.
+
+### Custom Features
+Custom features can be created by extending from
+org.apache.solr.ltr.ranking.Feature, however this is generally not 
recommended.
+The majority of features should be possible to create using the methods 
described
+above.
+
+# Defining Models
+Currently the Learning to Rank plugin supports 2 main types of
+ranking models: [Ranking 

[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...

2016-01-29 Thread diegoceccarelli
GitHub user diegoceccarelli opened a pull request:

https://github.com/apache/lucene-solr/pull/4

SOLR-8542: Integrate Learning to Rank into Solr

Solr Learning to Rank (LTR) provides a way for you to extract features
directly inside Solr for use in training a machine learned model. You
can then deploy that model to Solr and use it to rerank your top X
search results. This concept was previously presented by the authors at
Lucene/Solr Revolution 2015

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr master-ltr-plugin-rfc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/4.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4


commit 1bee2ad0ce64b2f091e34f7fb42e00387616c987
Author: Diego Ceccarelli 
Date:   2016-01-13T22:29:17Z

SOLR-8542: Integrate Learning to Rank into Solr

Solr Learning to Rank (LTR) provides a way for you to extract features
directly inside Solr for use in training a machine learned model. You
can then deploy that model to Solr and use it to rerank your top X
search results. This concept was previously presented by the authors at
Lucene/Solr Revolution 2015




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: SOLR-8542: Integrate Learning to Rank in...

2016-01-15 Thread diegoceccarelli
GitHub user diegoceccarelli opened a pull request:

https://github.com/apache/lucene-solr/pull/217

SOLR-8542: Integrate Learning to Rank into Solr

See https://issues.apache.org/jira/i#browse/SOLR-8542

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr 
trunk-learning-to-rank-plugin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #217


commit 336db4ccf6434e690a745a4af88b5d9c21edc25e
Author: Diego Ceccarelli 
Date:   2016-01-13T22:29:17Z

SOLR-8542: Integrate Learning to Rank into Solr

Solr Learning to Rank (LTR) provides a way for you to extract features
directly inside Solr for use in training a machine learned model. You
can then deploy that model to Solr and use it to rerank your top X
search results. This concept was previously presented by the authors at
Lucene/Solr Revolution 2015




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org