Re: Judging the MoreLikeThis results for relevancy

2018-02-14 Thread Alessandro Benedetti
So let me answer point by point : 1) Similarity is misleading here if you interpret it as a probabilistic measure. Given a query, it doesn't exist the "Ideal Document". Both with TF-IDF and BM25 ( that solves the problem better) you are scoring the document. Higher the score, higher the

Re: Judging the MoreLikeThis results for relevancy

2018-02-13 Thread Arnold Bronley
Thanks for the reply, Alessandro. Can you please elaborate on a point "a document which has a score 50% of the original doc score, it doesn't mean it is 50% similar"? I did not understand this for two reasons: 1. In the end, we are calculating similarity score between documents when we are

Re: Judging the MoreLikeThis results for relevancy

2018-02-08 Thread Alessandro Benedetti
Hi, I have been personally working a lot with the MoreLikeThis and I am close to contribute a refactor of that module ( to break up the monolithic giant facade class mostly) . First of all the MoreLikeThis handler will return the original document ( not scored) + the similar documents(scored).

Judging the MoreLikeThis results for relevancy

2018-02-07 Thread Arnold Bronley
Hi, I am using MoreLikeThis handler to get related documents for a given document. To determine if I am getting good results or not, here is what I do: The same original document should be returned as a top match. If it is not, then there is some problem with the relevancy. Then, as same input