Hi Raphael,

Cosine Similarity is always a good choice.

You can find an evaluation of different distance measures for text
clustering problems in Similarity Measures for Text Document Clustering by
Anne Huang, 2008.
http://nzcsrsc08.canterbury.ac.nz/site/proceedings/Individual_Papers/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf

-- Robert


2012/1/8 Raphael Cendrillon <[email protected]>

> Thanks Yue!
>
> On Jan 7, 2012, at 6:17 PM, Yue Guan <[email protected]> wrote:
>
> > Hi, Raphael
> >
> > Cosine distance is good for text. You may try it.
> >
> > --Yue
> >
> > On Sat, Jan 7, 2012 at 9:05 PM, Raphael Cendrillon
> > <[email protected]> wrote:
> >> Hi everyone,
> >>
> >> I'm working on a problem clustering news articles around common themes.
> There seem to be quite a few different distance measures that can be
> applied.
> >>
> >> Does anyone have any suggestions on a good general purpose measure to
> start out with?
> >>
> >> Thanks!
>

Reply via email to