RI, per se, probably won't help that much with the coincidence problem.

The Mahout math libraries would help a lot with a random indexing
implementation.

Kitenga has some very nice random indexing support.  See
http://www.kitenga.com/

They offer commercial software, but you get what you pay for.

On Wed, May 23, 2012 at 12:18 AM, Mugoma Joseph Okomba <[email protected]>wrote:

>
> Thanks for all the comments. They give us idea on what direction to take.
>
> We have been zeroing on idea of Random Indexing, but R.I seems missing in
> mahout currently. Are there future plans for implementing R.I in mahout?
> Any libraries out that that would be useful for R.I?
>
> On Sun, May 20, 2012 9:47 am, Ted Dunning wrote:
> > The basic reasoning here is that any cooccurrence measure without
> > smoothing
> > is will have zero overlap whenever all the others have zero overlap.
>  This
> > seems to be the root of your problem.  The solution is to increase
> overlap
> > or increase data.
> >
> > The problem with correlation based approaches is that they over state
> > coincidental overlaps.  Fixing that can't fix the problem of no overlap.
> >
>
>
>

Reply via email to