1. You can use org.apache.mahout.cf.taste.hadoop.als.RecommenderJob to compute top-N recommendations from the factorization in batch. For each user, you have to compute the product of the item feature matrix and his feature vector and pick the highest ranking unknown items after that.
2. The semantics of the empty cells depends on the type of data you have. For explicit feedback (ratings), you cannot fill the empty cells because you simply don't know what rating the user would have given. For implicit feedback a cell usually holds the count of some observed behavior like clicks e.g. Here empty cells are by definition 0 (no clicks observed), however the factorization has to be modified to give 'lower confidence' to these datapoints. 3. There are two 'flavors' of the ALS factorzation implemented in Mahout, one for implicit feedback data, the other for explicit feedback data, I suggest you look into the papers they are based on: "Large-scale Parallel Collaborative Filtering for the Netflix Prize" http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf "Collaborative Filtering for Implicit Feedback Datasets" http://research.yahoo.com/pub/2433 I also uploaded the slides from a lecture I gave at a scalable data mining class at our department, they might also be helpful in understanding the topic: http://www.slideshare.net/sscdotopen/latent-factor-models-for-collaborative-filtering Best, Sebastian 2012/7/4 Razon, Oren <[email protected]>: > Hi, > I'm exploring Mahout SVD parallel implementation over Hadoop (ALS), and I > would like to clarify a few things : > 1. How do you recommend top K items with this job? Does the job > factorize the ranking matrix, than compute a predicted ranking for each cell > in the matrix, so when you need a recommendation you only need to retrieve > the top K items according to prediction value for the user? Or is it > factorize the matrix and require some online logic when the recommendation is > being asked? > 2. From my knowledge, applying a SVD technique require first to fill in > all empty cells in the ranking matrix (with average ranking for example). Is > it something done during the ALS job (and if so, what is the way it's being > filled), or should it be done as a preprocessing step? > 3. From my understanding SVD recommenders are used to predict user > implicit preference. By doing so you can recommend top K items (top K items > over descending orders according to the prediction). I wonder, could it be > applied on a binary dataset (explicit), where my ranking matrix contain only > 1\0? > 4. From doing some readings I found that the timeSVD++ developed by > Yehuda Koren is considered as the superior SVD implementation for SVD > recommenders. I wondered if there is any kind of a parallel implementation of > it on top of Hadoop? I found this proposal: > https://issues.apache.org/jira/browse/MAHOUT-371 > I wonder, what is the status of it? Was it being checked already? Is it > stable? Did anyone experienced with it? > > Thanks, > Oren > > > > > > --------------------------------------------------------------------- > Intel Electronics Ltd. > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies.
