There is only one implementation, because both 'flavors' of ALS have the same computation shape. The default mode is to factorize explicit feedback data and if you specifiy the option '--implicitFeedback', it will switch to the algorithm that works on implicit feedback data. Internally the different solver from org.apache.mahout.math.als are used if you want to have a deeper look.
Best, Sebatian On 05.07.2012 10:38, Razon, Oren wrote: > Thanks for the answer Sebastian! > You said mahout has two 'flavors' of the ALS factorization, one for implicit > and the other for explicit. > Can you direct me which code do what? > Cause on the Hadoop part I can see only one ALS implementation... > > -----Original Message----- > From: Sebastian Schelter [mailto:[email protected]] > Sent: Thursday, July 05, 2012 11:12 > To: [email protected] > Subject: Re: A bunch of SVD questions... > > 1. You can use org.apache.mahout.cf.taste.hadoop.als.RecommenderJob to > compute top-N recommendations from the factorization in batch. For > each user, you have to compute the product of the item feature matrix > and his feature vector and pick the highest ranking unknown items > after that. > > 2. The semantics of the empty cells depends on the type of data you > have. For explicit feedback (ratings), you cannot fill the empty cells > because you simply don't know what rating the user would have given. > For implicit feedback a cell usually holds the count of some observed > behavior like clicks e.g. Here empty cells are by definition 0 (no > clicks observed), however the factorization has to be modified to give > 'lower confidence' to these datapoints. > > 3. There are two 'flavors' of the ALS factorzation implemented in > Mahout, one for implicit feedback data, the other for explicit > feedback data, I suggest you look into the papers they are based on: > > "Large-scale Parallel Collaborative Filtering for the Netflix Prize" > http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf > "Collaborative Filtering for Implicit Feedback Datasets" > http://research.yahoo.com/pub/2433 > > I also uploaded the slides from a lecture I gave at a scalable data > mining class at our department, they might also be helpful in > understanding the topic: > > http://www.slideshare.net/sscdotopen/latent-factor-models-for-collaborative-filtering > > Best, > Sebastian > 2012/7/4 Razon, Oren <[email protected]>: >> Hi, >> I'm exploring Mahout SVD parallel implementation over Hadoop (ALS), and I >> would like to clarify a few things : >> 1. How do you recommend top K items with this job? Does the job >> factorize the ranking matrix, than compute a predicted ranking for each cell >> in the matrix, so when you need a recommendation you only need to retrieve >> the top K items according to prediction value for the user? Or is it >> factorize the matrix and require some online logic when the recommendation >> is being asked? >> 2. From my knowledge, applying a SVD technique require first to fill in >> all empty cells in the ranking matrix (with average ranking for example). Is >> it something done during the ALS job (and if so, what is the way it's being >> filled), or should it be done as a preprocessing step? >> 3. From my understanding SVD recommenders are used to predict user >> implicit preference. By doing so you can recommend top K items (top K items >> over descending orders according to the prediction). I wonder, could it be >> applied on a binary dataset (explicit), where my ranking matrix contain only >> 1\0? >> 4. From doing some readings I found that the timeSVD++ developed by >> Yehuda Koren is considered as the superior SVD implementation for SVD >> recommenders. I wondered if there is any kind of a parallel implementation >> of it on top of Hadoop? I found this proposal: >> https://issues.apache.org/jira/browse/MAHOUT-371 >> I wonder, what is the status of it? Was it being checked already? Is >> it stable? Did anyone experienced with it? >> >> Thanks, >> Oren >> >> >> >> >> >> --------------------------------------------------------------------- >> Intel Electronics Ltd. >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. > --------------------------------------------------------------------- > Intel Electronics Ltd. > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >
