Re: How does SVDRecommender work in mahout?

Sean Owen Wed, 25 Apr 2012 16:26:57 -0700

I don't know what the particular issue is; I imagine there's something
that needs some optimization in there.


If you're definitely interested in ALS and recommenders, I don't feel
bad promoting our attempts to commercialize Mahout: Myrrix
(http://myrrix.com) is exactly an ALS-based recommender, and I know it
will crunch this data set into a model in 16 seconds on my laptop.
This part of it is also free / open source.

Sean

On Wed, Apr 25, 2012 at 9:28 PM, Daniel Quach <[email protected]> wrote:
> I tried it again with 30 features and 3 iterations on the same data set, it's 
> still running for 10+ minutes just to factorize for the SVDRecommender and 
> has yet to complete. Perhaps it is my machine?
>
> I am running on a macbook air with 4GB of RAM and an intel i5 processor, I 
> specified 2GB of memory for java. (-Xmx2048M)
>
>
>
> On Apr 25, 2012, at 12:25 PM, Sean Owen wrote:
>
>> There's not a hard limit; the hard limit you would run into is memory,
>> if anything.
>>
>> This sounds slow. It may be that this implementation could use some
>> optimization somewhere. Are you running many iterations or using a
>> large number of features?
>>
>> I have a different ALS implementation that finishes this data set (3
>> iterations, 30 features -- quick and dirty) in more like 20 seconds.
>> Here's some info on a run on a much larger data set, using ALS, for
>> comparison: http://myrrix.com/example-performance/
>>
>> On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[email protected]> wrote:
>>> Regarding the factorization (I am using ALSWRFactorizer), is there a limit 
>>> to how large a data set that can be factorized?
>>>
>>> I am trying to apply it on the 100K rating data set from group lens 
>>> (approximately 1000 users by 1600 movies).
>>>
>>> It's been running for at least 10 minutes now, I am getting the feeling it 
>>> might not be wise to apply the factorizer on a some of group lens's larger 
>>> data sets...
>>>
>>> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote:
>>>
>>>> This paper doesn't address how to compute the SVD. There are two
>>>> approaches implemented with SVDRecommender. One computes a SVD, one
>>>> doesn't :) Really it ought to be called something like
>>>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly
>>>> simple expectation maximization approach. I don't know how well this
>>>> scales. The other factorizer uses alternating-least-squares.
>>>>
>>>> What you come out with are not 3 matrices, from an SVD, but 2. The "S"
>>>> matrix in the SVD of singular values is mashed into the left/right
>>>> singular vectors.
>>>>
>>>> So to answer your question now, the prediction expression is
>>>> essentially the same, with two caveats:
>>>>
>>>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you
>>>> get out of the factorizer are really more like the "U" and "V" with
>>>> the two sqrt(S) bits already multiplied in. The product comes out the
>>>> same, there is a conceptual difference I suppose but not a practical
>>>> one. In both cases you're really just multiplying the matrix factors
>>>> all back together to make the predictions.
>>>>
>>>> 2. This model subtracts the customer average rating in the beginning,
>>>> and adds it back at the end here. The SVDRecommender doesn't do that,
>>>> because, quite crucially, it turns sparse data into dense data (all
>>>> the zeroes become non-zero) and this crushes scalability.
>>>>
>>>> The answer is "mostly the same thing" yes. In fact this is broadly how
>>>> all matrix factorization approaches work.
>>>>
>>>> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[email protected]> wrote:
>>>>> I am basing my knowledge off this paper: 
>>>>> http://www.grouplens.org/papers/pdf/webKDD00.pdf
>>>>>
>>>>> Your book provided algorithms for the user-based, item-based, and slope 
>>>>> one recommendation, but none for the SVDRecommender (I'm guessing because 
>>>>> it was experimental)
>>>>>
>>>>> Does the SVDRecommender just compute the resultant matrices and follow a 
>>>>> formula similar to the one at the top of page 5 in the linked paper? I 
>>>>> think I understand the process of SVD but I'm just wondering how it's 
>>>>> exactly applied to obtain recommendations in mahout's case.
>>>>>
>>>>>
>>>>> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote:
>>>>>
>>>>>> Yes you could call it a model-based approach. I suppose I was thinking
>>>>>> more of Bayesian implementations when I wrote that sentence.
>>>>>>
>>>>>> SVD is the Singular Value Decomposition -- are you asking what the SVD
>>>>>> is, or what matrix factorization is, or something about specific code
>>>>>> here? You can look up the SVD online.
>>>>>>
>>>>>> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[email protected]> 
>>>>>> wrote:
>>>>>>> I had originally thought the experimental SVDrecommender in mahout was 
>>>>>>> a model-based collaborative filtering technique. Looking at the book 
>>>>>>> "Mahout in Action", it mentions that model-based recommenders are a 
>>>>>>> future goal for mahout, which implies to me that the SVDRecommender is 
>>>>>>> not considered model-based.
>>>>>>>
>>>>>>> How exactly does the SVDRecommender work in mahout? I can't seem to 
>>>>>>> find any description of the algorithm underneath it
>>>>>
>>>
>

Re: How does SVDRecommender work in mahout?

Reply via email to