The cool thing about Cosine Similarity is that it is (roughly) what Lucene uses. This means that once you tune your recommender, it is possible to transform it into a Lucene index.
How? I don't know. Ted did this at Veoh. On Sun, Jan 8, 2012 at 5:14 AM, Robert Giacinto <[email protected]> wrote: > Hi Raphael, > > Cosine Similarity is always a good choice. > > You can find an evaluation of different distance measures for text > clustering problems in Similarity Measures for Text Document Clustering by > Anne Huang, 2008. > http://nzcsrsc08.canterbury.ac.nz/site/proceedings/Individual_Papers/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf > > -- Robert > > > 2012/1/8 Raphael Cendrillon <[email protected]> > >> Thanks Yue! >> >> On Jan 7, 2012, at 6:17 PM, Yue Guan <[email protected]> wrote: >> >> > Hi, Raphael >> > >> > Cosine distance is good for text. You may try it. >> > >> > --Yue >> > >> > On Sat, Jan 7, 2012 at 9:05 PM, Raphael Cendrillon >> > <[email protected]> wrote: >> >> Hi everyone, >> >> >> >> I'm working on a problem clustering news articles around common themes. >> There seem to be quite a few different distance measures that can be >> applied. >> >> >> >> Does anyone have any suggestions on a good general purpose measure to >> start out with? >> >> >> >> Thanks! >> -- Lance Norskog [email protected]
