Dear Koji, Thank you very much. Do you know what is the range of score in this new formula? What is the reasonable threshold for considering two documents as similar enough in this formula? Regards.
On Tue, Feb 3, 2015 at 1:35 PM, Koji Sekiguchi <k...@r.email.ne.jp> wrote: > Lucene uses TFIDFSimilarity class to calculate the similarity. > It is implemented on the idea of cosine measurement but it modifies the > cosine formula. > Please take a look at "Lucene Practical Scoring Function" in the following > Javadoc: > > http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/ > search/similarities/TFIDFSimilarity.html > > Koji > -- > http://soleami.com/blog/comparing-document-classification-functions-of- > lucene-and-mahout.html > > > On 2015/02/03 5:39, Ali Nazemian wrote: > >> Dear Erik, >> Thank you for your response. Would younplease tell me why this score could >> be higher than 1? While cosine similarity can not be higher than 1. >> On Feb 2, 2015 7:32 PM, "Erik Hatcher" <erik.hatc...@gmail.com> wrote: >> >> The scoring is the same as Lucene. To get deeper insight into how a >>> score >>> is computed, use Solr’s debug=true mode to see the explain details in the >>> response. >>> >>> Erik >>> >>> On Feb 2, 2015, at 10:49 AM, Ali Nazemian <alinazem...@gmail.com> >>>> wrote: >>>> >>>> Hi, >>>> I was wondering what is the range of score is brought by more like this >>>> query in Solr? I know that the Lucene uses cosine similarity in vector >>>> space model for calculating similarity between two documents. I also >>>> know >>>> that cosine similarity is between -1 and 1 but the fact that I dont >>>> understand is why the score which is brought by more like this query >>>> >>> could >>> >>>> be "12" for example?! Would you please explain what is the calculation >>>> process is Solr? >>>> Thank you very much. >>>> >>>> Best regards. >>>> >>>> -- >>>> A.Nazemian >>>> >>> >>> >>> >> > > > -- A.Nazemian