There is only one implementation, because both 'flavors' of ALS have the
same computation shape. The default mode is to factorize explicit
feedback data and if you specifiy the option '--implicitFeedback', it
will switch to the algorithm that works on implicit feedback data.
Internally the different solver from org.apache.mahout.math.als are used
if you want to have a deeper look.

Best,
Sebatian

On 05.07.2012 10:38, Razon, Oren wrote:
> Thanks for the answer Sebastian!
> You said mahout has two 'flavors' of the ALS factorization, one for implicit 
> and the other for explicit.
> Can you direct me which code do what?
> Cause on the Hadoop part I can see only one ALS implementation...
> 
> -----Original Message-----
> From: Sebastian Schelter [mailto:[email protected]] 
> Sent: Thursday, July 05, 2012 11:12
> To: [email protected]
> Subject: Re: A bunch of SVD questions...
> 
> 1. You can use org.apache.mahout.cf.taste.hadoop.als.RecommenderJob to
> compute top-N recommendations from the factorization in batch. For
> each user, you have to compute the product of the item feature matrix
> and his feature vector and pick the highest ranking unknown items
> after that.
> 
> 2. The semantics of the empty cells depends on the type of data you
> have. For explicit feedback (ratings), you cannot fill the empty cells
> because you simply don't know what rating the user would have given.
> For implicit feedback a cell usually holds the count of some observed
> behavior like clicks e.g. Here empty cells are by definition 0 (no
> clicks observed), however the factorization has to be modified to give
> 'lower confidence' to these datapoints.
> 
> 3. There are two 'flavors' of the ALS factorzation implemented in
> Mahout, one for implicit feedback data, the other for explicit
> feedback data, I suggest you look into the papers they are based on:
> 
> "Large-scale Parallel Collaborative Filtering for the Netflix Prize"
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
> "Collaborative Filtering for Implicit Feedback Datasets"
> http://research.yahoo.com/pub/2433
> 
> I also uploaded the slides from a lecture I gave at a scalable data
> mining class at our department, they might also be helpful in
> understanding the topic:
> 
> http://www.slideshare.net/sscdotopen/latent-factor-models-for-collaborative-filtering
> 
> Best,
> Sebastian
> 2012/7/4 Razon, Oren <[email protected]>:
>> Hi,
>> I'm exploring Mahout SVD parallel implementation over Hadoop (ALS), and I 
>> would like to clarify a few things :
>> 1.      How do you recommend top K items with this job? Does the job 
>> factorize the ranking matrix, than compute a predicted ranking for each cell 
>> in the matrix, so when you need a recommendation you only need to retrieve 
>> the top K items according to prediction value for the user? Or is it 
>> factorize the matrix and require some online logic when the recommendation 
>> is being asked?
>> 2.      From my knowledge, applying a SVD technique require first to fill in 
>> all empty cells in the ranking matrix (with average ranking for example). Is 
>> it something done during the ALS job (and if so, what is the way it's being 
>> filled), or should it be done as a preprocessing step?
>> 3.      From my understanding SVD recommenders are used to predict user 
>> implicit preference. By doing so you can recommend top K items (top K items 
>> over descending orders according to the prediction). I wonder, could it be 
>> applied on a binary dataset (explicit), where my ranking matrix contain only 
>> 1\0?
>> 4.      From doing some readings I found that the timeSVD++ developed by 
>> Yehuda Koren is considered as the superior SVD implementation for SVD 
>> recommenders. I wondered if there is any kind of a parallel implementation 
>> of it on top of Hadoop? I found this proposal: 
>> https://issues.apache.org/jira/browse/MAHOUT-371
>>       I wonder, what is the status of it? Was it being checked already? Is 
>> it stable? Did anyone experienced with it?
>>
>> Thanks,
>> Oren
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Electronics Ltd.
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 


Reply via email to