these guys show one way to combine content info with dyadic data factorization, which is pretty close to what i used. Unfortunately i don't have a free download link for them (it is in ACM library, or Ted knows a cheaper arrangement to pull it off).
Agarwal, Chen : "Regression-based Latent Factor Models" I am not sure if one can construct something similar in Mahout though, but I am sure it can be prototyped very easily in java. (I hacked a lot of Mahout framework previously in my time to achieve similar effect). On Fri, Jul 6, 2012 at 8:07 AM, Razon, Oren <[email protected]> wrote: > Hi Dmitriy, > Thank you for the answer. > I will be happy to read such paper > > -----Original Message----- > From: Dmitriy Lyubimov [mailto:[email protected]] > Sent: Thursday, July 05, 2012 19:18 > To: [email protected] > Subject: RE: A bunch of SVD questions... > > Cold start problem is usually best attacked if there is also content > information about users initially -- demographics, user profile or > something. Otherwise, yes, you are pretty much limited to an average user > profile to start trials. > > There are various ways to combine factorization and content side techniques > into single model, I have a paper reference somewhere around if you think > user content info is your case. > On Jul 5, 2012 5:22 AM, "Razon, Oren" <[email protected]> wrote: > >> Thanks. >> I had some other questions in mind so I will use this post... >> >> 1. Cold start for items problem - With the user cold start problem I can >> handle by trying new items for the user based on popularity \ randomly. >> But what options do I have when using the ALS \ co-occurrence matrix to >> overcome cold start for item? >> >> 2. What about applying a matrix factorization technique (ALS \ SVD) as a >> preprocessing. >> Meaning, after doing the factorization, use the new lower Item matrix for >> example to compute item similarity between items? Will it be a good idea? >> >> 3. I'm looking for a huge data set to try my recommender on. I'm searching >> something which is even bigger than last.fm\ libimseti can anyone >> recommend on such dataset? >> >> Thanks, >> Oren >> >> >> -----Original Message----- >> From: Sebastian Schelter [mailto:[email protected]] >> Sent: Thursday, July 05, 2012 12:46 >> To: [email protected] >> Subject: Re: A bunch of SVD questions... >> >> There is only one implementation, because both 'flavors' of ALS have the >> same computation shape. The default mode is to factorize explicit >> feedback data and if you specifiy the option '--implicitFeedback', it >> will switch to the algorithm that works on implicit feedback data. >> Internally the different solver from org.apache.mahout.math.als are used >> if you want to have a deeper look. >> >> Best, >> Sebatian >> >> On 05.07.2012 10:38, Razon, Oren wrote: >> > Thanks for the answer Sebastian! >> > You said mahout has two 'flavors' of the ALS factorization, one for >> implicit and the other for explicit. >> > Can you direct me which code do what? >> > Cause on the Hadoop part I can see only one ALS implementation... >> > >> > -----Original Message----- >> > From: Sebastian Schelter [mailto:[email protected]] >> > Sent: Thursday, July 05, 2012 11:12 >> > To: [email protected] >> > Subject: Re: A bunch of SVD questions... >> > >> > 1. You can use org.apache.mahout.cf.taste.hadoop.als.RecommenderJob to >> > compute top-N recommendations from the factorization in batch. For >> > each user, you have to compute the product of the item feature matrix >> > and his feature vector and pick the highest ranking unknown items >> > after that. >> > >> > 2. The semantics of the empty cells depends on the type of data you >> > have. For explicit feedback (ratings), you cannot fill the empty cells >> > because you simply don't know what rating the user would have given. >> > For implicit feedback a cell usually holds the count of some observed >> > behavior like clicks e.g. Here empty cells are by definition 0 (no >> > clicks observed), however the factorization has to be modified to give >> > 'lower confidence' to these datapoints. >> > >> > 3. There are two 'flavors' of the ALS factorzation implemented in >> > Mahout, one for implicit feedback data, the other for explicit >> > feedback data, I suggest you look into the papers they are based on: >> > >> > "Large-scale Parallel Collaborative Filtering for the Netflix Prize" >> > >> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf >> > "Collaborative Filtering for Implicit Feedback Datasets" >> > http://research.yahoo.com/pub/2433 >> > >> > I also uploaded the slides from a lecture I gave at a scalable data >> > mining class at our department, they might also be helpful in >> > understanding the topic: >> > >> > >> http://www.slideshare.net/sscdotopen/latent-factor-models-for-collaborative-filtering >> > >> > Best, >> > Sebastian >> > 2012/7/4 Razon, Oren <[email protected]>: >> >> Hi, >> >> I'm exploring Mahout SVD parallel implementation over Hadoop (ALS), and >> I would like to clarify a few things : >> >> 1. How do you recommend top K items with this job? Does the job >> factorize the ranking matrix, than compute a predicted ranking for each >> cell in the matrix, so when you need a recommendation you only need to >> retrieve the top K items according to prediction value for the user? Or is >> it factorize the matrix and require some online logic when the >> recommendation is being asked? >> >> 2. From my knowledge, applying a SVD technique require first to >> fill in all empty cells in the ranking matrix (with average ranking for >> example). Is it something done during the ALS job (and if so, what is the >> way it's being filled), or should it be done as a preprocessing step? >> >> 3. From my understanding SVD recommenders are used to predict user >> implicit preference. By doing so you can recommend top K items (top K items >> over descending orders according to the prediction). I wonder, could it be >> applied on a binary dataset (explicit), where my ranking matrix contain >> only 1\0? >> >> 4. From doing some readings I found that the timeSVD++ developed >> by Yehuda Koren is considered as the superior SVD implementation for SVD >> recommenders. I wondered if there is any kind of a parallel implementation >> of it on top of Hadoop? I found this proposal: >> https://issues.apache.org/jira/browse/MAHOUT-371 >> >> I wonder, what is the status of it? Was it being checked already? >> Is it stable? Did anyone experienced with it? >> >> >> >> Thanks, >> >> Oren >> >> >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> Intel Electronics Ltd. >> >> >> >> This e-mail and any attachments may contain confidential material for >> >> the sole use of the intended recipient(s). Any review or distribution >> >> by others is strictly prohibited. If you are not the intended >> >> recipient, please contact the sender and delete all copies. >> > --------------------------------------------------------------------- >> > Intel Electronics Ltd. >> > >> > This e-mail and any attachments may contain confidential material for >> > the sole use of the intended recipient(s). Any review or distribution >> > by others is strictly prohibited. If you are not the intended >> > recipient, please contact the sender and delete all copies. >> > >> >> >> --------------------------------------------------------------------- >> Intel Electronics Ltd. >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. >> > --------------------------------------------------------------------- > Intel Electronics Ltd. > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies.
