Regarding the factorization (I am using ALSWRFactorizer), is there a limit to 
how large a data set that can be factorized?

I am trying to apply it on the 100K rating data set from group lens 
(approximately 1000 users by 1600 movies).

It's been running for at least 10 minutes now, I am getting the feeling it 
might not be wise to apply the factorizer on a some of group lens's larger data 
sets...

On Apr 18, 2012, at 1:09 PM, Sean Owen wrote:

> This paper doesn't address how to compute the SVD. There are two
> approaches implemented with SVDRecommender. One computes a SVD, one
> doesn't :) Really it ought to be called something like
> MatrixFactorizationRecommender. The SVD factorizer uses a fairly
> simple expectation maximization approach. I don't know how well this
> scales. The other factorizer uses alternating-least-squares.
> 
> What you come out with are not 3 matrices, from an SVD, but 2. The "S"
> matrix in the SVD of singular values is mashed into the left/right
> singular vectors.
> 
> So to answer your question now, the prediction expression is
> essentially the same, with two caveats:
> 
> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you
> get out of the factorizer are really more like the "U" and "V" with
> the two sqrt(S) bits already multiplied in. The product comes out the
> same, there is a conceptual difference I suppose but not a practical
> one. In both cases you're really just multiplying the matrix factors
> all back together to make the predictions.
> 
> 2. This model subtracts the customer average rating in the beginning,
> and adds it back at the end here. The SVDRecommender doesn't do that,
> because, quite crucially, it turns sparse data into dense data (all
> the zeroes become non-zero) and this crushes scalability.
> 
> The answer is "mostly the same thing" yes. In fact this is broadly how
> all matrix factorization approaches work.
> 
> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[email protected]> wrote:
>> I am basing my knowledge off this paper: 
>> http://www.grouplens.org/papers/pdf/webKDD00.pdf
>> 
>> Your book provided algorithms for the user-based, item-based, and slope one 
>> recommendation, but none for the SVDRecommender (I'm guessing because it was 
>> experimental)
>> 
>> Does the SVDRecommender just compute the resultant matrices and follow a 
>> formula similar to the one at the top of page 5 in the linked paper? I think 
>> I understand the process of SVD but I'm just wondering how it's exactly 
>> applied to obtain recommendations in mahout's case.
>> 
>> 
>> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote:
>> 
>>> Yes you could call it a model-based approach. I suppose I was thinking
>>> more of Bayesian implementations when I wrote that sentence.
>>> 
>>> SVD is the Singular Value Decomposition -- are you asking what the SVD
>>> is, or what matrix factorization is, or something about specific code
>>> here? You can look up the SVD online.
>>> 
>>> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[email protected]> wrote:
>>>> I had originally thought the experimental SVDrecommender in mahout was a 
>>>> model-based collaborative filtering technique. Looking at the book "Mahout 
>>>> in Action", it mentions that model-based recommenders are a future goal 
>>>> for mahout, which implies to me that the SVDRecommender is not considered 
>>>> model-based.
>>>> 
>>>> How exactly does the SVDRecommender work in mahout? I can't seem to find 
>>>> any description of the algorithm underneath it
>> 

Reply via email to