[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997.patch Updated patch. I hooked in CheckHits for more explains testing, and test nearby norm and nearby slightly rarer term to ensure relevance doesn't go backwards in those cases too. I updated the AwaitsFix url to a separate issue to fix sims with bugs / move to sandbox: LUCENE-8010 Finally i optimized the tests to have more reasonable runtime. I think its ready for now. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch Updated patch that also tests floating point tf values. We assume a computeSlopFactor has the range {{(0 .. 1]}} for testing. This found a leftover buggy float cast in DFR {{I(F)}} but also a new bug: Axiomatic model F1 will most likely return NaN values if you use SloppyPhraseQuery! frequency values < 1 cause its first log to go negative, then the next log to go NaN: formula is {{1 + log(1 + log(freq))}}. Imagine freq=0.3, this is {{1 + log(1 + -1.2)}} = {{1 + log(-0.2)}} = NaN. If we alter the formula to use {{log(1 + freq)}} then tests pass but needs investigation/may not be an appropriate solution, so i marked AwaitsFix for now. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch updated to test all sims and parameters. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch Patch randomizing values of parameters, adding missing range checks/docs for these parameters. These are just the valid ranges documented by the formulas, for unbounded parameters (such as normalization c, smoothing parameter mu) we treat them the same as BM25's k1 and just ensure non-negative/finite in the range check, and test the range of 0..Integer.MAX_VALUE. Still TODO is axiomatic parameters, need to look at paper and existing code (it has some range checks already so it may be easy). > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch Updated patch with all remaining sims (axiomatic and language models) now tested. The axiomatic F3EXP and F3LOG fail due to their gamma function driving scores negative, I added a warning to their javadocs about this. Also note that these two models don't have default parameter-free ctors. The other 4 models (F1EXP, F1LOG, F2EXP, F2LOG) are all fine, they don't have this gamma function. At least now we have the lay of the land, it is as expected. Still need to deal with many parameters which aren't yet tested. In many cases these are also missing any range checks, we need to dig up/figure out the valid domain, randomize them, look for issues etc. But the default values are tested. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch updated with the information-based models. LL passes the test, and SPL fails as expected, it has warnings in its javadocs. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch patch with the 3 DFI models passing too. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch Updated patch with DFR passing/failing the new tests as expected: * scoring models without warnings in the javadocs pass: models {{G}}, {{I(F)}}, {{I\(n)}}, {{I(ne)}} * ones with warnings in javadocs all fail: models {{BE}}, {{D}}, and {{P}} I think this is a good sign it works to do what we need. To make DFR pass at all, I changed SimilarityBase to use {{double}} everywhere internally, then cast to 32-bit float at the end. This fixed all the numerical errors. I think this makes sense as this subclass is supposed to be simple and easy to use (separately, we should take another look at the whole thing now that a lot of ClassicSimilarity's complexity has been removed). It makes the formulas more elegant in many cases too because constants like {{5.0}} are naturally doubles and all java Math functions take doubles, so some casts etc get removed. Will work thru the other models and look at potential improvements to explain etc here too for consistency. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch Added ClassicSimilarity and BooleanSimilarity to testing, randomized bm25 parameters and boosts. ClassicSimilarity was fine just needed explain() cleaned up to exactly match score(). Note that query boosts and bm25's k1 parameter are only tested within a "reasonable" ranges (0..Integer.MAX_VALUE) so we can fail the test if the sim has internal unexpected overflows, this is just trying to kick out the sim bugs. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch Updated patch with more cleanups around explain. I tried to add descriptions for parts of the formula and also use standard nomenclature. I think its better now, here is typical output: {noformat} 20.629753 = score(doc=0,freq=979.0), product of: 2.2 = scaling factor, k1 + 1 9.388654 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from: 1.0 = n, number of documents containing term 17927.0 = N, total number of documents with field 0.9987758 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from: 979.0 = freq, occurrences of term within document 1.2 = k1, term saturation parameter 0.75 = b, length normalization parameter 1.0 = dl, length of field 1.0 = avgdl, average length of field {noformat} You can more easily see term frequency saturation including extreme cases such as 1.0 where no more occurrences can help. You can kinda visualize how it can work for maxScore now :) {noformat} ... 1.0 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from: 5.9470048E8 = freq, occurrences of term within document 1.2 = k1, term saturation parameter 0.75 = b, length normalization parameter 40.0 = dl, length of field (approximate) 3.72180768E8 = avgdl, average length of field ... {noformat} > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch, LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch Updated patch, also enforcing that explain == score (exactly, no floating point differences). I cleaned up the BM25 explain to be transparent and reflect how the calculation is done. Most importantly, explanation is now broken out as {{scaling * df * tf}}, like how we compute it, and described in http://kak.tx0.org/Information-Retrieval/TFxIDF rather than displaying the "re-arranged formula" with tf including the {{k1 + 1}} scaling factor. Maybe its an improvement for debugging, too since it pulls out the independent scaling factor, making it easier to see the specifics of term frequency saturation and IDF across docs/terms? > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch, > LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch I updated patch with a possible fix for the monotonic issue. at least so tests pass for now and we can add other checks (like try to fix explain) and understand all the issues. > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-7997_wip.patch, LUCENE-7997_wip.patch > > > LUCENE-7993 is a potential optimization that we could only apply if the > similarity is an increasing functions of {{freq}} (all other things like DF > and length being equal). This sounds like a very reasonable requirement for a > similarity, so we should test it in the base similarity test case and maybe > move broken similarities to sandbox? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7997) More sanity testing of similarities
[ https://issues.apache.org/jira/browse/LUCENE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7997: Attachment: LUCENE-7997_wip.patch hacky patch with my current state. Spent a lot of time looking at reasonable state space, which is really hard since we don't have limits to number of documents, no bounds on boosts, etc. Tried really hard (maybe too much) to be super-fair to the similarity, e.g. test shouldn't generate scenarios that are impossible to create with IndexWriter. But some things (like huge tf values but tiny norm values) are fair game because we don't limit stacking terms/synonyms and so forth. This stuff may still have interesting test bugs if beasted enough. Currently the test fails, it seems like our bm25 may "go backwards" for largish term freqs, looks like floating point issues to me. Haven't tried to debug that yet, other crabs to chase down first. Can't really debug anything about this test until i think, we first force explain() to *exactly* match score() for a sim. I realize this is a PITA, but I think we need that and will look into that next. Here is an example of test output for the "going backwards" example, where it fails the pairwise test but the explanation doesnt show it. Still need to improve this, so its really easy to write a one-line test method for any scenario, and so on. {noformat} [junit4:pickseed] Seed property 'tests.seed' already defined: CA6EF971C3E23AAF [junit4] says ciao! Master seed: CA6EF971C3E23AAF [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(16127@localhost). [junit4] Suite: org.apache.lucene.search.similarities.TestBM25Similarity [junit4] 1> 0.03627357 = score(doc=0,freq=113659.0 = prevFreq=113658 [junit4] 1> ), product of: [junit4] 1> 0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: [junit4] 1> 449.0 = docFreq [junit4] 1> 456.0 = docCount [junit4] 1> 2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: [junit4] 1> 113659.0 = prevFreq=113658 [junit4] 1> 1.2 = parameter k1 [junit4] 1> 0.75 = parameter b [junit4] 1> 2300.5593 = avgFieldLength [junit4] 1> 1048600.0 = fieldLength [junit4] 1> [junit4] 1> 0.03627357 = score(doc=0,freq=113659.0 = currentFreq=113659 [junit4] 1> ), product of: [junit4] 1> 0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: [junit4] 1> 449.0 = docFreq [junit4] 1> 456.0 = docCount [junit4] 1> 2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: [junit4] 1> 113659.0 = currentFreq=113659 [junit4] 1> 1.2 = parameter k1 [junit4] 1> 0.75 = parameter b [junit4] 1> 2300.5593 = avgFieldLength [junit4] 1> 1048600.0 = fieldLength [junit4] 1> [junit4] 1> BM25(k1=1.2,b=0.75) [junit4] 1> field="field",maxDoc=13938,docCount=456,sumTotalTermFreq=1049055,sumDocFreq=456 [junit4] 1> term="term",docFreq=449,totalTermFreq=196765 [junit4] 1> norm=168 (doc length ~ 1048600) [junit4] 1> freq=113659 [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBM25Similarity -Dtests.method=testRandomScoring -Dtests.seed=CA6EF971C3E23AAF -Dtests.locale=el -Dtests.timezone=Etc/GMT-13 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.12s | TestBM25Similarity.testRandomScoring <<< [junit4]> Throwable #1: java.lang.AssertionError: score(113658)=0.036273565 > score(113659)=0.03627356 [junit4]>at __randomizedtesting.SeedInfo.seed([CA6EF971C3E23AAF:41F1A0C3D995DCA5]:0) [junit4]>at org.apache.lucene.search.similarities.BaseSimilarityTestCase.doTestScoring(BaseSimilarityTestCase.java:324) [junit4]>at org.apache.lucene.search.similarities.BaseSimilarityTestCase.testRandomScoring(BaseSimilarityTestCase.java:296) [junit4]>at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=CheapBastard, sim=RandomSimilarity(queryNorm=true): {field=DFR I(ne)3(800.0)}, locale=el, timezone=Etc/GMT-13 [junit4] 2> NOTE: Linux 4.4.0-92-generic amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=8,threads=1,free=134724456,total=189267968 [junit4] 2> NOTE: All tests run in this JVM: [TestBM25Similarity] [junit4] Completed [1/1 (1!)] in 1.14s, 1 test, 1 failure <<< FAILURES! {noformat} > More sanity testing of similarities > --- > > Key: LUCENE-7997 > URL: https://issues.apache.org/jira/browse/LUCENE-7997 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >