Very true, good catch. I think I was interpreting the results the wrong way. I expect only the top 5, so I changed the parameter to "5" instead of "10" and the results are as expected now.
Thanks. On Wed, Sep 12, 2012 at 11:36 PM, Sean Owen <[email protected]> wrote: > Well there are only 7 products in the universe! If you ask for 10 > recommendations, you will always get all unrated items back in the > recommendations. That's always true unless the algorithm can't > actually establish a value for some items. > > What result were you expecting, less than 10 recs? less than 7? > > On Thu, Sep 13, 2012 at 6:55 AM, Gokul Pillai <[email protected]> > wrote: > > I am trying out Mahout to come up with product recommendations for users > > based on data that show what products they use today. > > The data is not web-scale, just about 300,000 users and 7 products. Few > > comments about the data here: > > 1. Since users either have or not have a particular product, the value in > > the matrix is either "1" or "0" for all the columns (rows being the > userids) > > 2. All the users have one basic product, so I discounted this from the > > data-model passed to the Mahout recommender since I assume that if > everyone > > has the same product, its effect on the recommendations are trivial. > > 3. The matrix itself is sparse, the total counts of users having each > > product is : > > A=31847, 54754,1897 | 23154 | 2201 | 2766 | 33585 > > > > Steps followed: > > 1. Created a data-source from the user-product table in the database > > File ratingsFile = new > > File("datasets/products.csv"); > > DataModel model = new FileDataModel(ratingsFile); > > 2. Created a recommender on this data > > CachingRecommender recommender = new CachingRecommender(new > > SlopeOneRecommender(model)); > > 3. Loop through all users and get the top ten recommendations: > > List<RecommendedItem> recommendations = > > recommender.recommend(userId, 10); > > > > Issue faced: > > The problem I am facing is that the recommendations that come out are way > > too simple - meaning that all that it seems like what is being > recommended > > is "if a user does not have product A, then recommend it, if they dont > have > > product B, then recommend it and so on." Basically a simple inverse of > > their ownership status. > > > > Obviously, I am not doing something right here. How can I do the modeling > > better to get the right recommendations. Or is it that my dataset (300000 > > users times 7 products) is too small for Mahout to work with? > > > > Look forward to your comments. Thanks. >
