Thanks for all the comments. They give us idea on what direction to take. We have been zeroing on idea of Random Indexing, but R.I seems missing in mahout currently. Are there future plans for implementing R.I in mahout? Any libraries out that that would be useful for R.I?
On Sun, May 20, 2012 9:47 am, Ted Dunning wrote: > The basic reasoning here is that any cooccurrence measure without > smoothing > is will have zero overlap whenever all the others have zero overlap. This > seems to be the root of your problem. The solution is to increase overlap > or increase data. > > The problem with correlation based approaches is that they over state > coincidental overlaps. Fixing that can't fix the problem of no overlap. >
