Re: change the ranking function

2018-07-27 Thread Joël Trigalo
Hi,

It is not possible in general because similarities are computing norms at
index time. (
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java#L46
)
My understanding is that you should double a field and set different
similarity to the new field in order to be able to change similarity for
every query. If someone has a better idea, I am also interested.


On Thu, Jul 26, 2018 at 8:51 AM Reem  wrote:

> Hello,
>
> Is it possible to change the ranking function (e.g., BM25Similarity,
> ClassicSimilarity, LMDirichletSimilarity, etc) in search time?
>
> The way I found to change the ranking function is by setting the
> similarity property of text fields in schema.xml as follows:
> ``
>
> However, this means we can only set the similarity/ranking function only
> in indexing time. As Solr is built over Lucene which allows changing the
> ranking function in search time, I find it not logical that Solr doesn’t
> support it, so it seems I’m missing something here!
>
> Any idea on how to achieve this?
>
> Reem
>


Re: SOLR Score Range Changed

2018-02-23 Thread Joël Trigalo
The difference seems due to the fact that default similarity in solr 7 is
BM25 while it used to be TF-IDF in solr 4. As you realised, BM25 function
is smoother.
You can configure schema.xml to use ClassicSimilarity, for instance
https://lucene.apache.org/solr/guide/6_6/major-changes-from-solr-5-to-solr-6.html#default-similarity-changes
https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#FieldTypeDefinitionsandProperties-FieldTypeSimilarity

But as said before, maybe you are using properties that are not guaranteed
so it would be better to change score function or sorting (rather than
coming back to ClassicSimilarity)

2018-02-22 18:39 GMT+01:00 Shawn Heisey :

> On 2/22/2018 9:50 AM, Hodder, Rick wrote:
>
>> I am migrating from SOLR 4.10.2 to SOLR 7.1.
>>
>> All seems to be going well, except for one thing: the score that is
>> coming back for the resulting documents is giving different scores.
>>
>
> The absolute score has no meaning when you change something -- the index,
> the query, the software version, etc.  You can't compare absolute scores.
>
> What matters is the relative score of one document to another *in the same
> query*.  The amount of difference is almost irrelevant -- the goal of
> Lucene's score calculation gymnastics is to have one document score higher
> than another, so the *order* is reasonably correct.
>
> Assuming you're using the default relevancy sort, does the order of your
> search results change dramatically from one version to the other?  If it
> does, is the order generally better from a relevance standpoint, or
> generally worse?  If you are specifying an explicit sort, then the scores
> will likely be ignored.
>
> What I am describing is also why it's strongly recommended that you never
> try to convert scores to percentages:
>
> https://wiki.apache.org/lucene-java/ScoresAsPercentages
>
> Thanks,
> Shawn
>
>


Re: Learn To Rank Questions

2017-05-15 Thread Joël Trigalo
1.
So I think it is a spark problem first (https://issues.apache.org/jir
a/browse/SPARK-10413). What we can do is to create our own model (cf
https://github.com/apache/lucene-solr/tree/master/solr/contr
ib/ltr/src/java/org/apache/solr/ltr/model) that applies the prediction, it
should be easy to do for a simple model, like logistic regression.
For PMML, the idea would also be to implement a Model that reuse a java lib
able to apply PMML.

2.
This function query gives you TF IDF of textField vs userQuery for the doc

 {!edismax qf='textField' mm=100% v=${userQuery} tie=0.1}

Also it seems to me LTR only allows float features which is a limitation.


3.
If the boost value is an index time boost I don't think it is possible. You
could put the feature you want in a field at index time and then use
FieldValueFeature
to extract it.

On Thu, May 11, 2017 at 8:16 PM, Grant Ingersoll 
wrote:

> Hi,
>
> Just getting up to speed on LTR and have a few questions (most of which are
> speculative at this point and exploratory, as I have a couple of talks
> coming up on this and other relevance features):
>
> 1. Has anyone looked at what's involved with supporting SparkML or other
> models (e.g. PMML)?
>
> 2. Has anyone looked at features for text?  i.e. returning TF-IDF vectors
> or similar.  FieldValueFeature is kind of like this, but I might want
> weights for the terms, not just the actual values.  I could get this via
> term vectors, but then it doesn't fit the framework.
>
> 3. How about payloads and/or things like boost values for documents as
> features?
>
> 4. Are there example docs of training and using the
> MultipleAdditiveTreesModel?  I see unit tests for them, but looking for
> something similar to the python script in the example dir.
>
> On 2 and 3, I imagine some of this can be done creatively via the
> SolrFeature and function queries.
>
> Thanks,
> Grant
>