RI, per se, probably won't help that much with the coincidence problem. The Mahout math libraries would help a lot with a random indexing implementation.
Kitenga has some very nice random indexing support. See http://www.kitenga.com/ They offer commercial software, but you get what you pay for. On Wed, May 23, 2012 at 12:18 AM, Mugoma Joseph Okomba <[email protected]>wrote: > > Thanks for all the comments. They give us idea on what direction to take. > > We have been zeroing on idea of Random Indexing, but R.I seems missing in > mahout currently. Are there future plans for implementing R.I in mahout? > Any libraries out that that would be useful for R.I? > > On Sun, May 20, 2012 9:47 am, Ted Dunning wrote: > > The basic reasoning here is that any cooccurrence measure without > > smoothing > > is will have zero overlap whenever all the others have zero overlap. > This > > seems to be the root of your problem. The solution is to increase > overlap > > or increase data. > > > > The problem with correlation based approaches is that they over state > > coincidental overlaps. Fixing that can't fix the problem of no overlap. > > > > >
