Hi,

I wondered the same before and failed to decipher TFIDFSimilarity.
Scoring looks like tf*idf*idf to me.

I appreciate someone who will shed some light on this.

Thanks,
Ahmet



On Friday, June 10, 2016 12:37 AM, Upayavira <u...@odoko.co.uk> wrote:
I've just done a very simple, single term query against a 4.10 system
and a 5.5 system, each with much the same data.

The score for the 4.10 system was essentially made up of the field
weight, which is:
   score = tf * idf 

Whereas, in the 5.5 system, there is an additional "query weight", which
is idf * query norm. If query norm is 1, then the final score is now:
  score = query_weight * field_weight
          = ( idf * 1 ) * (tf * idf)
          = tf * idf^2

Can anyone explain why this new "query weight" element has appeared in
our scores somewhere between 4.10 and 5.5?

Thanks!

Upayavira

4.10 score ========================================================
      "2937439": {
        "match": true,
        "value": 5.5993805,
        "description": "weight(description:obama in 394012)
        [DefaultSimilarity], result of:",
        "details": [
          {
            "match": true,
            "value": 5.5993805,
            "description": "fieldWeight in 394012, product of:",
            "details": [
              {
                "match": true,
                "value": 1,
                "description": "tf(freq=1.0), with freq of:",
                "details": [
                  {
                    "match": true,
                    "value": 1,
                    "description": "termFreq=1.0"
                  }
                ]
              },
              {
                "match": true,
                "value": 5.5993805,
                "description": "idf(docFreq=56010, maxDocs=5568765)"
              },
              {
                "match": true,
                "value": 1,
                "description": "fieldNorm(doc=394012)"
              }
            ]
          }
        ]
5.5 score ========================================================
      "2502281":{
        "match":true,
        "value":28.51136,
        "description":"weight(description:obama in 43472) [], result
        of:",
        "details":[{
            "match":true,
            "value":28.51136,
            "description":"score(doc=43472,freq=1.0), product of:",
            "details":[{
                "match":true,
                "value":5.339603,
                "description":"queryWeight, product of:",
                "details":[{
                    "match":true,
                    "value":5.339603,
                    "description":"idf(docFreq=31905,
                    maxDocs=2446459)"},
                  {
                    "match":true,
                    "value":1.0,
                    "description":"queryNorm"}]},
              {
                "match":true,
                "value":5.339603,
                "description":"fieldWeight in 43472, product of:",
                "details":[{
                    "match":true,
                    "value":1.0,
                    "description":"tf(freq=1.0), with freq of:",
                    "details":[{
                        "match":true,
                        "value":1.0,
                        "description":"termFreq=1.0"}]},
                  {
                    "match":true,
                    "value":5.339603,
                    "description":"idf(docFreq=31905,
                    maxDocs=2446459)"},
                  {
                    "match":true,
                    "value":1.0,
                    "description":"fieldNorm(doc=43472)"}]}]}]},

Reply via email to