Thanks! By the way, did anyone else participate in the codesprint this year?
It was nice to see a few machine learning problems show up, like clustering and classification. On 8 Jan, 2012, at 6:44 PM, Lance Norskog wrote: > The cool thing about Cosine Similarity is that it is (roughly) what > Lucene uses. This means that once you tune your recommender, it is > possible to transform it into a Lucene index. > > How? I don't know. Ted did this at Veoh. > > On Sun, Jan 8, 2012 at 5:14 AM, Robert Giacinto > <[email protected]> wrote: >> Hi Raphael, >> >> Cosine Similarity is always a good choice. >> >> You can find an evaluation of different distance measures for text >> clustering problems in Similarity Measures for Text Document Clustering by >> Anne Huang, 2008. >> http://nzcsrsc08.canterbury.ac.nz/site/proceedings/Individual_Papers/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf >> >> -- Robert >> >> >> 2012/1/8 Raphael Cendrillon <[email protected]> >> >>> Thanks Yue! >>> >>> On Jan 7, 2012, at 6:17 PM, Yue Guan <[email protected]> wrote: >>> >>>> Hi, Raphael >>>> >>>> Cosine distance is good for text. You may try it. >>>> >>>> --Yue >>>> >>>> On Sat, Jan 7, 2012 at 9:05 PM, Raphael Cendrillon >>>> <[email protected]> wrote: >>>>> Hi everyone, >>>>> >>>>> I'm working on a problem clustering news articles around common themes. >>> There seem to be quite a few different distance measures that can be >>> applied. >>>>> >>>>> Does anyone have any suggestions on a good general purpose measure to >>> start out with? >>>>> >>>>> Thanks! >>> > > > > -- > Lance Norskog > [email protected]
