I tried it again with 30 features and 3 iterations on the same data set, it's still running for 10+ minutes just to factorize for the SVDRecommender and has yet to complete. Perhaps it is my machine?
I am running on a macbook air with 4GB of RAM and an intel i5 processor, I specified 2GB of memory for java. (-Xmx2048M) On Apr 25, 2012, at 12:25 PM, Sean Owen wrote: > There's not a hard limit; the hard limit you would run into is memory, > if anything. > > This sounds slow. It may be that this implementation could use some > optimization somewhere. Are you running many iterations or using a > large number of features? > > I have a different ALS implementation that finishes this data set (3 > iterations, 30 features -- quick and dirty) in more like 20 seconds. > Here's some info on a run on a much larger data set, using ALS, for > comparison: http://myrrix.com/example-performance/ > > On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[email protected]> wrote: >> Regarding the factorization (I am using ALSWRFactorizer), is there a limit >> to how large a data set that can be factorized? >> >> I am trying to apply it on the 100K rating data set from group lens >> (approximately 1000 users by 1600 movies). >> >> It's been running for at least 10 minutes now, I am getting the feeling it >> might not be wise to apply the factorizer on a some of group lens's larger >> data sets... >> >> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote: >> >>> This paper doesn't address how to compute the SVD. There are two >>> approaches implemented with SVDRecommender. One computes a SVD, one >>> doesn't :) Really it ought to be called something like >>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly >>> simple expectation maximization approach. I don't know how well this >>> scales. The other factorizer uses alternating-least-squares. >>> >>> What you come out with are not 3 matrices, from an SVD, but 2. The "S" >>> matrix in the SVD of singular values is mashed into the left/right >>> singular vectors. >>> >>> So to answer your question now, the prediction expression is >>> essentially the same, with two caveats: >>> >>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you >>> get out of the factorizer are really more like the "U" and "V" with >>> the two sqrt(S) bits already multiplied in. The product comes out the >>> same, there is a conceptual difference I suppose but not a practical >>> one. In both cases you're really just multiplying the matrix factors >>> all back together to make the predictions. >>> >>> 2. This model subtracts the customer average rating in the beginning, >>> and adds it back at the end here. The SVDRecommender doesn't do that, >>> because, quite crucially, it turns sparse data into dense data (all >>> the zeroes become non-zero) and this crushes scalability. >>> >>> The answer is "mostly the same thing" yes. In fact this is broadly how >>> all matrix factorization approaches work. >>> >>> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[email protected]> wrote: >>>> I am basing my knowledge off this paper: >>>> http://www.grouplens.org/papers/pdf/webKDD00.pdf >>>> >>>> Your book provided algorithms for the user-based, item-based, and slope >>>> one recommendation, but none for the SVDRecommender (I'm guessing because >>>> it was experimental) >>>> >>>> Does the SVDRecommender just compute the resultant matrices and follow a >>>> formula similar to the one at the top of page 5 in the linked paper? I >>>> think I understand the process of SVD but I'm just wondering how it's >>>> exactly applied to obtain recommendations in mahout's case. >>>> >>>> >>>> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote: >>>> >>>>> Yes you could call it a model-based approach. I suppose I was thinking >>>>> more of Bayesian implementations when I wrote that sentence. >>>>> >>>>> SVD is the Singular Value Decomposition -- are you asking what the SVD >>>>> is, or what matrix factorization is, or something about specific code >>>>> here? You can look up the SVD online. >>>>> >>>>> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[email protected]> >>>>> wrote: >>>>>> I had originally thought the experimental SVDrecommender in mahout was a >>>>>> model-based collaborative filtering technique. Looking at the book >>>>>> "Mahout in Action", it mentions that model-based recommenders are a >>>>>> future goal for mahout, which implies to me that the SVDRecommender is >>>>>> not considered model-based. >>>>>> >>>>>> How exactly does the SVDRecommender work in mahout? I can't seem to find >>>>>> any description of the algorithm underneath it >>>> >>
