That's kind of what it does now... though it weights everything as "1". Not so smart, but for sparse-ish data is not far off from a smarter answer.
On Thu, Nov 15, 2012 at 6:47 PM, Ted Dunning <[email protected]> wrote: > My own preference (pun intended) is to use log-likelihood score for > determining which similarities are non-zero and then use simple frequency > weighting such as IDF for weighting the similarities. This doesn't make > direct use of cooccurrence frequencies, but it works really well. One > reason that it seems to work well is that by using only general occurrence > frequencies makes it *really* hard to overfit. > >
