Whao, thanks a lot, it seems very interesting. What you suggested means to weight each single words differently when I apply the cosine similarity. Each weight is the frequency of the word in the seed documents. It is not clear to me how to compute and use the anomalously common cooccurrences, but I'll investigate.

Thanks a lot
Marco



On 20 Jul 2011, at 20:36, Ted Dunning wrote:

frequency weighted cosine distance

Reply via email to