Re: Problem with method Plus in the Vector class

Marco Turchi Wed, 20 Jul 2011 15:06:56 -0700

Sorry I completely misunderstood, I guess you were talking about theweighted cosine distance.


Great, I'll try.


Thanks again for your useful suggestions
Marco

On 20 Jul 2011, at 23:38, Ted Dunning wrote:

Actually, I would suggest weighting words by something like tf-idf
weighting.

http://en.wikipedia.org/wiki/Tf%E2%80%93idf
log or sqrt(tf) is often good instead of linear tf. The standardlog((N+1)
/ (df+1)) definition is usually good.
On Wed, Jul 20, 2011 at 2:29 PM, Marco Turchi<[email protected]>wrote:
Whao, thanks a lot, it seems very interesting. What you suggestedmeans toweight each single words differently when I apply the cosinesimilarity.Each weight is the frequency of the word in the seed documents. Itis notclear to me how to compute and use the anomalously commoncooccurrences, but
I'll investigate.

Thanks a lot
Marco



On 20 Jul 2011, at 20:36, Ted Dunning wrote:

frequency weighted cosine distance

Re: Problem with method Plus in the Vector class

Reply via email to