Hi Raphael, Cosine Similarity is always a good choice.
You can find an evaluation of different distance measures for text clustering problems in Similarity Measures for Text Document Clustering by Anne Huang, 2008. http://nzcsrsc08.canterbury.ac.nz/site/proceedings/Individual_Papers/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf -- Robert 2012/1/8 Raphael Cendrillon <[email protected]> > Thanks Yue! > > On Jan 7, 2012, at 6:17 PM, Yue Guan <[email protected]> wrote: > > > Hi, Raphael > > > > Cosine distance is good for text. You may try it. > > > > --Yue > > > > On Sat, Jan 7, 2012 at 9:05 PM, Raphael Cendrillon > > <[email protected]> wrote: > >> Hi everyone, > >> > >> I'm working on a problem clustering news articles around common themes. > There seem to be quite a few different distance measures that can be > applied. > >> > >> Does anyone have any suggestions on a good general purpose measure to > start out with? > >> > >> Thanks! >
