I think single interactions in a thread is an excellent indicator of
interest.  This can give thread recommendations as well as related person
recommendations.

I think it will be interesting to see just how sparse it is.

On Mon, Aug 22, 2011 at 7:48 AM, Grant Ingersoll <[email protected]>wrote:

> I'm working on an example (well, examples) of using Mahout with the ASF
> Public Data Set up on Amazon (
> http://aws.amazon.com/datasets/7791434387204566) and I wanted to show how
> to use the 3 "C's" (collab filtering, clustering, classification) with the
> data set.  Clustering and classification are pretty straight forward, but
> I'm wondering about the setup around collaborative filtering.
>
> The motivation for recommendations is pretty straightforward:  provide
> people recs on emails that they might find useful based on what other people
> have interacted with.  The tricky part is I am not totally sure on a valid
> setup of the problem.  My current thinking is that I build up the rec.
> matrix based on whether someone has interacted with (initiated/replied) a
> thread or not.  Thus, the columns are the thread ids and the rows are the
> users.  Each cell contains the count of the number of times user X has
> interacted with thread Y.  This feels to me like it is a stand in for that
> user's preference in that if they are replying multiple times, they have an
> interest in that topic.  I have no idea if this will be effective or not,
> but it seems like it could be interesting.  Does it sound reasonable?  I
> worry that even in a really large data set as above it will simply be too
> sparse.
>
> Is there a better way to think about this from a strict collaborative
> filtering context?  In other words, I know I could do content-based
> recommendations but that is not what I am after here.
>
> -Grant
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>

Reply via email to