Weighted cosine distance with selected interaction components, to be pedantic.
On Wed, Jul 20, 2011 at 2:59 PM, Marco Turchi <[email protected]>wrote: > Sorry I completely misunderstood, I guess you were talking about the > weighted cosine distance. > > Great, I'll try. > > Thanks again for your useful suggestions > Marco > > > On 20 Jul 2011, at 23:38, Ted Dunning wrote: > > Actually, I would suggest weighting words by something like tf-idf >> weighting. >> >> http://en.wikipedia.org/wiki/**Tf%E2%80%93idf<http://en.wikipedia.org/wiki/Tf%E2%80%93idf> >> >> log or sqrt(tf) is often good instead of linear tf. The standard >> log((N+1) >> / (df+1)) definition is usually good. >> >> On Wed, Jul 20, 2011 at 2:29 PM, Marco Turchi <[email protected] >> >wrote: >> >> Whao, thanks a lot, it seems very interesting. What you suggested means >>> to >>> weight each single words differently when I apply the cosine similarity. >>> Each weight is the frequency of the word in the seed documents. It is not >>> clear to me how to compute and use the anomalously common cooccurrences, >>> but >>> I'll investigate. >>> >>> Thanks a lot >>> Marco >>> >>> >>> >>> On 20 Jul 2011, at 20:36, Ted Dunning wrote: >>> >>> frequency weighted cosine distance >>> >>>> >>>> >>> >>> >
