Weighted cosine distance with selected interaction components, to be
pedantic.

On Wed, Jul 20, 2011 at 2:59 PM, Marco Turchi <[email protected]>wrote:

> Sorry I completely misunderstood, I guess you were talking about the
> weighted cosine distance.
>
> Great, I'll try.
>
> Thanks again for your useful suggestions
> Marco
>
>
> On 20 Jul 2011, at 23:38, Ted Dunning wrote:
>
>  Actually, I would suggest weighting words by something like tf-idf
>> weighting.
>>
>> http://en.wikipedia.org/wiki/**Tf%E2%80%93idf<http://en.wikipedia.org/wiki/Tf%E2%80%93idf>
>>
>> log or sqrt(tf) is often good instead of linear tf.  The standard
>> log((N+1)
>> / (df+1)) definition is usually good.
>>
>> On Wed, Jul 20, 2011 at 2:29 PM, Marco Turchi <[email protected]
>> >wrote:
>>
>>  Whao, thanks a lot, it seems very interesting. What you suggested means
>>> to
>>> weight each single words differently when I apply the cosine similarity.
>>> Each weight is the frequency of the word in the seed documents. It is not
>>> clear to me how to compute and use the anomalously common cooccurrences,
>>> but
>>> I'll investigate.
>>>
>>> Thanks a lot
>>> Marco
>>>
>>>
>>>
>>> On 20 Jul 2011, at 20:36, Ted Dunning wrote:
>>>
>>> frequency weighted cosine distance
>>>
>>>>
>>>>
>>>
>>>
>

Reply via email to