That's kind of what it does now... though it weights everything as "1". Not
so smart, but for sparse-ish data is not far off from a smarter answer.


On Thu, Nov 15, 2012 at 6:47 PM, Ted Dunning <[email protected]> wrote:

> My own preference (pun intended) is to use log-likelihood score for
> determining which similarities are non-zero and then use simple frequency
> weighting such as IDF for weighting the similarities.   This doesn't make
> direct use of cooccurrence frequencies, but it works really well.  One
> reason that it seems to work well is that by using only general occurrence
> frequencies makes it *really* hard to overfit.
>
>

Reply via email to