Ted, RI seems pretty interesting, Do you have any refernce paper or system about how people have used it to improve recommendation systems? How people define context vectors using extra information.
A quick idea I got was to use LDA to build topic vectors and use them as context vectors, any thoughts on that. RI seems to be a good candidate for contribution to mahout. On Wed, May 23, 2012 at 12:11 PM, Ted Dunning <[email protected]> wrote: > RI, per se, probably won't help that much with the coincidence problem. > > The Mahout math libraries would help a lot with a random indexing > implementation. > > Kitenga has some very nice random indexing support. See > http://www.kitenga.com/ > > They offer commercial software, but you get what you pay for. > > On Wed, May 23, 2012 at 12:18 AM, Mugoma Joseph Okomba > <[email protected]>wrote: > >> >> Thanks for all the comments. They give us idea on what direction to take. >> >> We have been zeroing on idea of Random Indexing, but R.I seems missing in >> mahout currently. Are there future plans for implementing R.I in mahout? >> Any libraries out that that would be useful for R.I? >> >> On Sun, May 20, 2012 9:47 am, Ted Dunning wrote: >> > The basic reasoning here is that any cooccurrence measure without >> > smoothing >> > is will have zero overlap whenever all the others have zero overlap. >> This >> > seems to be the root of your problem. The solution is to increase >> overlap >> > or increase data. >> > >> > The problem with correlation based approaches is that they over state >> > coincidental overlaps. Fixing that can't fix the problem of no overlap. >> > >> >> >>
