[jira] [Commented] (LUCENE-7377) Remove ClassicSimilarity?

2016-07-13 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375008#comment-15375008
 ] 

Adrien Grand commented on LUCENE-7377:
--

Agreed with the simplification. It cannot be that simple since we want to 
pre-compute as much as possible for efficiency reasons. For instance in your 
example the tf can be different for every document while the idf is constant 
for all docs, which is why we have this SimScorer abstraction. But we can 
certainly do better than what we have today.



> Remove ClassicSimilarity?
> -
>
> Key: LUCENE-7377
> URL: https://issues.apache.org/jira/browse/LUCENE-7377
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> ClassicSimilarity was relying on coordination factors in order to produce 
> good scores. Now that coords are gone, it is quite a bad option compared to 
> eg. BM25Similarity.
> Maybe we should remove ClassicSimilarity entirely in master and deprecated in 
> 6.x in order to encourage users to move to BM25Similarity rather than stay on 
> a Similarity impl of lesser quality?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7377) Remove ClassicSimilarity?

2016-07-13 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374882#comment-15374882
 ] 

Ahmet Arslan commented on LUCENE-7377:
--

I think, an implementation of  TFIDF should stay in Lucene, but it  should 
extend SimilarityBase and it should have a simple, single line code in 
org.apache.lucene.search.similarities.SimilarityBase#score method. e.g.,
{code}
return tf * log2(((double) stats.getNumberOfDocuments() / (double) 
stats.getDocFreq()) + 1);
{code}

Current TFIDFSimilarity and ClassicSimilarity are hard to understand.

> Remove ClassicSimilarity?
> -
>
> Key: LUCENE-7377
> URL: https://issues.apache.org/jira/browse/LUCENE-7377
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> ClassicSimilarity was relying on coordination factors in order to produce 
> good scores. Now that coords are gone, it is quite a bad option compared to 
> eg. BM25Similarity.
> Maybe we should remove ClassicSimilarity entirely in master and deprecated in 
> 6.x in order to encourage users to move to BM25Similarity rather than stay on 
> a Similarity impl of lesser quality?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org