I tried it again with 30 features and 3 iterations on the same data set, it's 
still running for 10+ minutes just to factorize for the SVDRecommender and has 
yet to complete. Perhaps it is my machine?

I am running on a macbook air with 4GB of RAM and an intel i5 processor, I 
specified 2GB of memory for java. (-Xmx2048M)



On Apr 25, 2012, at 12:25 PM, Sean Owen wrote:

> There's not a hard limit; the hard limit you would run into is memory,
> if anything.
> 
> This sounds slow. It may be that this implementation could use some
> optimization somewhere. Are you running many iterations or using a
> large number of features?
> 
> I have a different ALS implementation that finishes this data set (3
> iterations, 30 features -- quick and dirty) in more like 20 seconds.
> Here's some info on a run on a much larger data set, using ALS, for
> comparison: http://myrrix.com/example-performance/
> 
> On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[email protected]> wrote:
>> Regarding the factorization (I am using ALSWRFactorizer), is there a limit 
>> to how large a data set that can be factorized?
>> 
>> I am trying to apply it on the 100K rating data set from group lens 
>> (approximately 1000 users by 1600 movies).
>> 
>> It's been running for at least 10 minutes now, I am getting the feeling it 
>> might not be wise to apply the factorizer on a some of group lens's larger 
>> data sets...
>> 
>> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote:
>> 
>>> This paper doesn't address how to compute the SVD. There are two
>>> approaches implemented with SVDRecommender. One computes a SVD, one
>>> doesn't :) Really it ought to be called something like
>>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly
>>> simple expectation maximization approach. I don't know how well this
>>> scales. The other factorizer uses alternating-least-squares.
>>> 
>>> What you come out with are not 3 matrices, from an SVD, but 2. The "S"
>>> matrix in the SVD of singular values is mashed into the left/right
>>> singular vectors.
>>> 
>>> So to answer your question now, the prediction expression is
>>> essentially the same, with two caveats:
>>> 
>>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you
>>> get out of the factorizer are really more like the "U" and "V" with
>>> the two sqrt(S) bits already multiplied in. The product comes out the
>>> same, there is a conceptual difference I suppose but not a practical
>>> one. In both cases you're really just multiplying the matrix factors
>>> all back together to make the predictions.
>>> 
>>> 2. This model subtracts the customer average rating in the beginning,
>>> and adds it back at the end here. The SVDRecommender doesn't do that,
>>> because, quite crucially, it turns sparse data into dense data (all
>>> the zeroes become non-zero) and this crushes scalability.
>>> 
>>> The answer is "mostly the same thing" yes. In fact this is broadly how
>>> all matrix factorization approaches work.
>>> 
>>> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[email protected]> wrote:
>>>> I am basing my knowledge off this paper: 
>>>> http://www.grouplens.org/papers/pdf/webKDD00.pdf
>>>> 
>>>> Your book provided algorithms for the user-based, item-based, and slope 
>>>> one recommendation, but none for the SVDRecommender (I'm guessing because 
>>>> it was experimental)
>>>> 
>>>> Does the SVDRecommender just compute the resultant matrices and follow a 
>>>> formula similar to the one at the top of page 5 in the linked paper? I 
>>>> think I understand the process of SVD but I'm just wondering how it's 
>>>> exactly applied to obtain recommendations in mahout's case.
>>>> 
>>>> 
>>>> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote:
>>>> 
>>>>> Yes you could call it a model-based approach. I suppose I was thinking
>>>>> more of Bayesian implementations when I wrote that sentence.
>>>>> 
>>>>> SVD is the Singular Value Decomposition -- are you asking what the SVD
>>>>> is, or what matrix factorization is, or something about specific code
>>>>> here? You can look up the SVD online.
>>>>> 
>>>>> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[email protected]> 
>>>>> wrote:
>>>>>> I had originally thought the experimental SVDrecommender in mahout was a 
>>>>>> model-based collaborative filtering technique. Looking at the book 
>>>>>> "Mahout in Action", it mentions that model-based recommenders are a 
>>>>>> future goal for mahout, which implies to me that the SVDRecommender is 
>>>>>> not considered model-based.
>>>>>> 
>>>>>> How exactly does the SVDRecommender work in mahout? I can't seem to find 
>>>>>> any description of the algorithm underneath it
>>>> 
>> 

Reply via email to