[jira] [Commented] (LUCENE-7377) Remove ClassicSimilarity?
[ https://issues.apache.org/jira/browse/LUCENE-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375008#comment-15375008 ] Adrien Grand commented on LUCENE-7377: -- Agreed with the simplification. It cannot be that simple since we want to pre-compute as much as possible for efficiency reasons. For instance in your example the tf can be different for every document while the idf is constant for all docs, which is why we have this SimScorer abstraction. But we can certainly do better than what we have today. > Remove ClassicSimilarity? > - > > Key: LUCENE-7377 > URL: https://issues.apache.org/jira/browse/LUCENE-7377 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > ClassicSimilarity was relying on coordination factors in order to produce > good scores. Now that coords are gone, it is quite a bad option compared to > eg. BM25Similarity. > Maybe we should remove ClassicSimilarity entirely in master and deprecated in > 6.x in order to encourage users to move to BM25Similarity rather than stay on > a Similarity impl of lesser quality? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7377) Remove ClassicSimilarity?
[ https://issues.apache.org/jira/browse/LUCENE-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374882#comment-15374882 ] Ahmet Arslan commented on LUCENE-7377: -- I think, an implementation of TFIDF should stay in Lucene, but it should extend SimilarityBase and it should have a simple, single line code in org.apache.lucene.search.similarities.SimilarityBase#score method. e.g., {code} return tf * log2(((double) stats.getNumberOfDocuments() / (double) stats.getDocFreq()) + 1); {code} Current TFIDFSimilarity and ClassicSimilarity are hard to understand. > Remove ClassicSimilarity? > - > > Key: LUCENE-7377 > URL: https://issues.apache.org/jira/browse/LUCENE-7377 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > ClassicSimilarity was relying on coordination factors in order to produce > good scores. Now that coords are gone, it is quite a bad option compared to > eg. BM25Similarity. > Maybe we should remove ClassicSimilarity entirely in master and deprecated in > 6.x in order to encourage users to move to BM25Similarity rather than stay on > a Similarity impl of lesser quality? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org