[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-21 Thread Nathan Meisels (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569289#comment-17569289
 ] 

Nathan Meisels commented on LUCENE-10650:
-

Hi [~jpountz]!

Appreciate your help until now!

Another question.
I did a reindex and I get different scores.

query is:

 
{code:java}
{
  "query": {
    "term": {
      "sessionIds": "1234-1234"
    }
  }
}{code}
 

New index explain:
{code:java}
{
  "_index": "entities-new",
  "_type": "entity",
  "_id": "AWByRrSPIGshPfnDk4hN",
  "matched": true,
  "explanation": {
    "value": 22.941677,
    "description": "weight(sessionIds:1234-1234 in 1400) [PerFieldSimilarity], 
result of:",
    "details": [
      {
        "value": 22.941677,
        "description": "score from 
ScriptedSimilarity(weightScript=[Script{type=inline, lang='painless', 
idOrCode='return query.boost * 
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);', options={}, 
params={}}], script=[Script{type=inline, lang='painless', idOrCode='return 
weight;', options={}, params={}}]) computed from:",
        "details": [
          {
            "value": 22.941677,
            "description": "weight",
            "details": []
          },
          {
            "value": 1.0,
            "description": "query.boost",
            "details": []
          },
          {
            "value": 12084378,
            "description": "field.docCount",
            "details": []
          },
          {
            "value": 4.730932E+7,
            "description": "field.sumDocFreq",
            "details": []
          },
          {
            "value": -1.0,
            "description": "field.sumTotalTermFreq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "term.docFreq",
            "details": []
          },
          {
            "value": -1.0,
            "description": "term.totalTermFreq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "doc.freq",
            "details": []
          },
          {
            "value": 1.0,
            "description": "doc.length",
            "details": []
          }
        ]
      }
    ]
  }
}{code}
 

Old index explain:
{code:java}
{
  "_index" : "entities-old",
  "_type" : "entity",
  "_id" : "AWByRrSPIGshPfnDk4hN",
  "matched" : true,
  "explanation" : {
    "value" : 21.23644,
    "description" : "weight(sessionIds:1234-1234 in 527154) 
[PerFieldSimilarity], result of:",
    "details" : [
      {
        "value" : 21.23644,
        "description" : "score(DFRSimilarity, doc=527154, freq=1.0), computed 
from:",
        "details" : [
          {
            "value" : 1.0,
            "description" : "no normalization",
            "details" : [ ]
          },
          {
            "value" : 21.23644,
            "description" : "BasicModelIn, computed from: ",
            "details" : [
              {
                "value" : 1.605901E7,
                "description" : "numberOfDocuments",
                "details" : [ ]
              },
              {
                "value" : 6.0,
                "description" : "docFreq",
                "details" : [ ]
              }
            ]
          },
          {
            "value" : 1.0,
            "description" : "no aftereffect",
            "details" : [ ]
          }
        ]
      }
    ]
  }
}{code}

Does this make sense? I need the scores to stay the same.

Thanks

 

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can 

[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-14 Thread Nathan Meisels (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566715#comment-17566715
 ] 

Nathan Meisels commented on LUCENE-10650:
-

Thanks for the tip. I will go the reindex path since I want to test latency etc.

Thank you so much for all the help! I really appreciate it

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-13 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566422#comment-17566422
 ] 

Adrien Grand commented on LUCENE-10650:
---

Indeed Elasticsearch would change the after effect to `L` instead of `no` to 
work around the fact that Lucene removed support for `no`. You may not need to 
reindex, I believe it would be possible to close your index, update settings to 
use this new scripted similarity, and then open the index again to make the 
change effective (I did not test this).

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-13 Thread Nathan Meisels (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566410#comment-17566410
 ] 

Nathan Meisels commented on LUCENE-10650:
-

For future reference seems like if you update to new elastic version you get 
this behavior when using 
{code:java}
after_effect:no {code}
{code:java}
[2022-07-13T11:58:16,312][WARN ][o.e.d.i.s.SimilarityProviders] [192.168.1.1] 
After effect [no] isn't supported anymore and has arbitrarily been replaced 
with [l].{code}
To solve this I plan to first reindex on es6 with the similarity script and 
only after upgrade to es7. 

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-13 Thread Nathan Meisels (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566191#comment-17566191
 ] 

Nathan Meisels commented on LUCENE-10650:
-

Thanks for the response!

I have another question. What happens if you upgrade to new -elastic- lucene 
version with these old settings? (e.g after effect no)?

Does elastic run multiple lucene versions so it will know how to deal with it? 
Or do I have to reindex with new setting before the upgrade?

Thanks

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-12 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565384#comment-17565384
 ] 

Adrien Grand commented on LUCENE-10650:
---

{{query.boost}} is the {{query.getBoost()}} from DFRSimilarity's {{double 
score(BasicStats stats, double freq, double docLen)}}, which does 
{{stats.getBoost() * basicModel.score(stats, tfn, aeTimes1pTfn)}}.

The division by log(2) is not the tfn but a way to turn Math.log, which is a 
log in base 10 into a log in base 2.

I wouldn't expect latency to be higher, this should get compiled to more or 
less the same code that you used to rely on in DFRSimilarity.

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565003#comment-17565003
 ] 

Nathan Meisels commented on LUCENE-10650:
-

Thanks for the answer!

Just to clarify:
query.boost * # Which part is this?
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 
1) / (n + 0.5)))
/Math.log(2); # Is this equal to tfn?

Thanks!

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564988#comment-17564988
 ] 

Adrien Grand commented on LUCENE-10650:
---

Hi Nathan. When we introduced dynamic pruning to Lucene, we also introduced the 
requirement that similarities produce scores that are non-decreasing when tf 
increases or when the length norm decreases (all other things equal). 
Unfortunately, this property could not be retained while keeping DFR 
similarities pluggable as they were so we removed support for the no after 
effect and only retained L and B.

It looks like this specific similarity that you are looking for could still be 
implemented in a way that scores are non-decreasing with increasing tf or 
decreasing norm, so you should be able to re-implement it using a scripted 
similarity for instance 
(https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html#scripted_similarity)
 with something like below (untested):

{code}
"similarity": {
  "my_dfr_sim": {
"type": "scripted",
"weight_script": {
  "source": "return query.boost * 
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);"
},
"script": {
  "source": "return weight * doc.freq;"
}
  }
}
{code}

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org