I think this is reasonable. Some suggestions:

1. Instead of using the total number of interactions as cell value, map the 
number to a 1-5 score based on histogram
2. Use item-item algorithm, which is supposed to work for sparse data.
3. I think the best algorithm to handle sparse data is the SVD algorithm.
4. Research shows that "accuracy" is not the only way to evaluate a CF use 
scenario, so you might want to show "explains" (ie, why the recommendations are 
made), etc.

Just my 2 cents.

-daniel

--
Daniel Xiaodan Zhou
PhD student
School of Information
University of Michigan
http://michiza.com

On Aug 22, 2011, at 10:48 AM, Grant Ingersoll wrote:

> I'm working on an example (well, examples) of using Mahout with the ASF 
> Public Data Set up on Amazon 
> (http://aws.amazon.com/datasets/7791434387204566) and I wanted to show how to 
> use the 3 "C's" (collab filtering, clustering, classification) with the data 
> set.  Clustering and classification are pretty straight forward, but I'm 
> wondering about the setup around collaborative filtering.
> 
> The motivation for recommendations is pretty straightforward:  provide people 
> recs on emails that they might find useful based on what other people have 
> interacted with.  The tricky part is I am not totally sure on a valid setup 
> of the problem.  My current thinking is that I build up the rec. matrix based 
> on whether someone has interacted with (initiated/replied) a thread or not.  
> Thus, the columns are the thread ids and the rows are the users.  Each cell 
> contains the count of the number of times user X has interacted with thread 
> Y.  This feels to me like it is a stand in for that user's preference in that 
> if they are replying multiple times, they have an interest in that topic.  I 
> have no idea if this will be effective or not, but it seems like it could be 
> interesting.  Does it sound reasonable?  I worry that even in a really large 
> data set as above it will simply be too sparse.
> 
> Is there a better way to think about this from a strict collaborative 
> filtering context?  In other words, I know I could do content-based 
> recommendations but that is not what I am after here.
> 
> -Grant
> 
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 

Reply via email to