I think single interactions in a thread is an excellent indicator of interest. This can give thread recommendations as well as related person recommendations.
I think it will be interesting to see just how sparse it is. On Mon, Aug 22, 2011 at 7:48 AM, Grant Ingersoll <[email protected]>wrote: > I'm working on an example (well, examples) of using Mahout with the ASF > Public Data Set up on Amazon ( > http://aws.amazon.com/datasets/7791434387204566) and I wanted to show how > to use the 3 "C's" (collab filtering, clustering, classification) with the > data set. Clustering and classification are pretty straight forward, but > I'm wondering about the setup around collaborative filtering. > > The motivation for recommendations is pretty straightforward: provide > people recs on emails that they might find useful based on what other people > have interacted with. The tricky part is I am not totally sure on a valid > setup of the problem. My current thinking is that I build up the rec. > matrix based on whether someone has interacted with (initiated/replied) a > thread or not. Thus, the columns are the thread ids and the rows are the > users. Each cell contains the count of the number of times user X has > interacted with thread Y. This feels to me like it is a stand in for that > user's preference in that if they are replying multiple times, they have an > interest in that topic. I have no idea if this will be effective or not, > but it seems like it could be interesting. Does it sound reasonable? I > worry that even in a really large data set as above it will simply be too > sparse. > > Is there a better way to think about this from a strict collaborative > filtering context? In other words, I know I could do content-based > recommendations but that is not what I am after here. > > -Grant > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > >
