Re: Suggestions on distance measures for clustering news articles

Lance Norskog Sun, 08 Jan 2012 18:45:08 -0800

The cool thing about Cosine Similarity is that it is (roughly) what
Lucene uses. This means that once you tune your recommender, it is
possible to transform it into a Lucene index.


How? I don't know. Ted did this at Veoh.

On Sun, Jan 8, 2012 at 5:14 AM, Robert Giacinto
<[email protected]> wrote:
> Hi Raphael,
>
> Cosine Similarity is always a good choice.
>
> You can find an evaluation of different distance measures for text
> clustering problems in Similarity Measures for Text Document Clustering by
> Anne Huang, 2008.
> http://nzcsrsc08.canterbury.ac.nz/site/proceedings/Individual_Papers/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf
>
> -- Robert
>
>
> 2012/1/8 Raphael Cendrillon <[email protected]>
>
>> Thanks Yue!
>>
>> On Jan 7, 2012, at 6:17 PM, Yue Guan <[email protected]> wrote:
>>
>> > Hi, Raphael
>> >
>> > Cosine distance is good for text. You may try it.
>> >
>> > --Yue
>> >
>> > On Sat, Jan 7, 2012 at 9:05 PM, Raphael Cendrillon
>> > <[email protected]> wrote:
>> >> Hi everyone,
>> >>
>> >> I'm working on a problem clustering news articles around common themes.
>> There seem to be quite a few different distance measures that can be
>> applied.
>> >>
>> >> Does anyone have any suggestions on a good general purpose measure to
>> start out with?
>> >>
>> >> Thanks!
>>



-- 
Lance Norskog
[email protected]

Re: Suggestions on distance measures for clustering news articles

Reply via email to