Re: A bunch of SVD questions...

Dmitriy Lyubimov Fri, 06 Jul 2012 14:26:59 -0700

these guys show one way to combine content info with dyadic data
factorization, which is pretty close to what i used. Unfortunately i
don't have a free download link for them (it is in ACM library, or Ted
knows a cheaper arrangement to pull it off).


Agarwal, Chen : "Regression-based Latent Factor Models"

I am not sure if one can construct something similar in Mahout though,
but I am sure it can be prototyped very easily in java. (I hacked a
lot of Mahout framework previously in my time to achieve similar
effect).


On Fri, Jul 6, 2012 at 8:07 AM, Razon, Oren <[email protected]> wrote:
> Hi Dmitriy,
> Thank you for the answer.
> I will be happy to read such paper
>
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:[email protected]]
> Sent: Thursday, July 05, 2012 19:18
> To: [email protected]
> Subject: RE: A bunch of SVD questions...
>
> Cold start problem is usually best attacked if there is also content
> information about users initially -- demographics, user profile or
> something. Otherwise, yes, you are pretty much limited to an average user
> profile to start trials.
>
> There are various ways to combine factorization and content side techniques
> into single model, I have a paper reference somewhere around if you think
> user content info is your case.
> On Jul 5, 2012 5:22 AM, "Razon, Oren" <[email protected]> wrote:
>
>> Thanks.
>> I had some other questions in mind so I will use this post...
>>
>> 1. Cold start for items problem - With the user cold start problem I can
>> handle by trying new items for the user based on popularity \ randomly.
>> But what options do I have when using the ALS \ co-occurrence matrix to
>> overcome cold start for item?
>>
>> 2. What about applying a matrix factorization technique (ALS \ SVD) as a
>> preprocessing.
>> Meaning, after doing the factorization, use the new lower Item matrix for
>> example to compute item similarity between items? Will it be a good idea?
>>
>> 3. I'm looking for a huge data set to try my recommender on. I'm searching
>> something which is even bigger than last.fm\ libimseti can anyone
>> recommend on such dataset?
>>
>> Thanks,
>> Oren
>>
>>
>> -----Original Message-----
>> From: Sebastian Schelter [mailto:[email protected]]
>> Sent: Thursday, July 05, 2012 12:46
>> To: [email protected]
>> Subject: Re: A bunch of SVD questions...
>>
>> There is only one implementation, because both 'flavors' of ALS have the
>> same computation shape. The default mode is to factorize explicit
>> feedback data and if you specifiy the option '--implicitFeedback', it
>> will switch to the algorithm that works on implicit feedback data.
>> Internally the different solver from org.apache.mahout.math.als are used
>> if you want to have a deeper look.
>>
>> Best,
>> Sebatian
>>
>> On 05.07.2012 10:38, Razon, Oren wrote:
>> > Thanks for the answer Sebastian!
>> > You said mahout has two 'flavors' of the ALS factorization, one for
>> implicit and the other for explicit.
>> > Can you direct me which code do what?
>> > Cause on the Hadoop part I can see only one ALS implementation...
>> >
>> > -----Original Message-----
>> > From: Sebastian Schelter [mailto:[email protected]]
>> > Sent: Thursday, July 05, 2012 11:12
>> > To: [email protected]
>> > Subject: Re: A bunch of SVD questions...
>> >
>> > 1. You can use org.apache.mahout.cf.taste.hadoop.als.RecommenderJob to
>> > compute top-N recommendations from the factorization in batch. For
>> > each user, you have to compute the product of the item feature matrix
>> > and his feature vector and pick the highest ranking unknown items
>> > after that.
>> >
>> > 2. The semantics of the empty cells depends on the type of data you
>> > have. For explicit feedback (ratings), you cannot fill the empty cells
>> > because you simply don't know what rating the user would have given.
>> > For implicit feedback a cell usually holds the count of some observed
>> > behavior like clicks e.g. Here empty cells are by definition 0 (no
>> > clicks observed), however the factorization has to be modified to give
>> > 'lower confidence' to these datapoints.
>> >
>> > 3. There are two 'flavors' of the ALS factorzation implemented in
>> > Mahout, one for implicit feedback data, the other for explicit
>> > feedback data, I suggest you look into the papers they are based on:
>> >
>> > "Large-scale Parallel Collaborative Filtering for the Netﬂix Prize"
>> >
>> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
>> > "Collaborative Filtering for Implicit Feedback Datasets"
>> > http://research.yahoo.com/pub/2433
>> >
>> > I also uploaded the slides from a lecture I gave at a scalable data
>> > mining class at our department, they might also be helpful in
>> > understanding the topic:
>> >
>> >
>> http://www.slideshare.net/sscdotopen/latent-factor-models-for-collaborative-filtering
>> >
>> > Best,
>> > Sebastian
>> > 2012/7/4 Razon, Oren <[email protected]>:
>> >> Hi,
>> >> I'm exploring Mahout SVD parallel implementation over Hadoop (ALS), and
>> I would like to clarify a few things :
>> >> 1.      How do you recommend top K items with this job? Does the job
>> factorize the ranking matrix, than compute a predicted ranking for each
>> cell in the matrix, so when you need a recommendation you only need to
>> retrieve the top K items according to prediction value for the user? Or is
>> it factorize the matrix and require some online logic when the
>> recommendation is being asked?
>> >> 2.      From my knowledge, applying a SVD technique require first to
>> fill in all empty cells in the ranking matrix (with average ranking for
>> example). Is it something done during the ALS job (and if so, what is the
>> way it's being filled), or should it be done as a preprocessing step?
>> >> 3.      From my understanding SVD recommenders are used to predict user
>> implicit preference. By doing so you can recommend top K items (top K items
>> over descending orders according to the prediction). I wonder, could it be
>> applied on a binary dataset (explicit), where my ranking matrix contain
>> only 1\0?
>> >> 4.      From doing some readings I found that the timeSVD++ developed
>> by Yehuda Koren is considered as the superior SVD implementation for SVD
>> recommenders. I wondered if there is any kind of a parallel implementation
>> of it on top of Hadoop? I found this proposal:
>> https://issues.apache.org/jira/browse/MAHOUT-371
>> >>       I wonder, what is the status of it? Was it being checked already?
>> Is it stable? Did anyone experienced with it?
>> >>
>> >> Thanks,
>> >> Oren
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> Intel Electronics Ltd.
>> >>
>> >> This e-mail and any attachments may contain confidential material for
>> >> the sole use of the intended recipient(s). Any review or distribution
>> >> by others is strictly prohibited. If you are not the intended
>> >> recipient, please contact the sender and delete all copies.
>> > ---------------------------------------------------------------------
>> > Intel Electronics Ltd.
>> >
>> > This e-mail and any attachments may contain confidential material for
>> > the sole use of the intended recipient(s). Any review or distribution
>> > by others is strictly prohibited. If you are not the intended
>> > recipient, please contact the sender and delete all copies.
>> >
>>
>>
>> ---------------------------------------------------------------------
>> Intel Electronics Ltd.
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

Re: A bunch of SVD questions...

Reply via email to