Re: A bunch of SVD questions...

Sean Owen Wed, 04 Jul 2012 08:39:46 -0700

SVD is not the same thing as ALS, though both are factoring matrices.
There is not a distributed SVD-based recommender, though there is a
distributed SVD, and you could use it as part of a recommender system.
I assume you are talking about ALS.

The version of ALS in Mahout operates on the sparse rating matrix. You
make recommendations by multiplying the two factored matrices back
together, which gives you a dense, approximate version of the original
sparse rating matrix, with blanks filled in. The k largest new entries
in a row are your recs for one user. Of course you never actually
compute that complete product -- it's way too big. You just recreate
one row to make recs for one user.

You definitely can't fill in average ratings in the rating matrix, at
scale -- it makes it dense and too big. It is conceptually a good idea
and that's why the literature talks about this, but I do not think it
is practical. At best you can subtract the mean of the existing
entries, *from the existing entries only*. This makes the fact that
empty cells are conceptually 0 make more sense.

(This is also why I like the Yehuda/Koren formulation of ALS, where
you are not predicting ratings, but a positive interaction score.
There the fact that empty means 0 is just fine. No bias terms needed.)

SVD/ALS are used to factor matrices and reconstruct an approximation
of the original *that is more complete*. The input values can be from
whatever you want -- implicit etc. 1/0 data actually makes more sense
for something like ALS as input. The paper I mention above has a
slightly better generalization of that.

That code wasn't finished and it almost surely will not be. It is not
so much a different SVD as massaging the input to incorporate stuff
like time info. I personally am not sure that the SVD is the best
approach for recommenders, mostly on grounds that it is hard to scale
because it is doing something more complicated.

On Wed, Jul 4, 2012 at 6:08 PM, Razon, Oren <[email protected]> wrote:
> Hi,
> I'm exploring Mahout SVD parallel implementation over Hadoop (ALS), and I 
> would like to clarify a few things :
> 1.      How do you recommend top K items with this job? Does the job 
> factorize the ranking matrix, than compute a predicted ranking for each cell 
> in the matrix, so when you need a recommendation you only need to retrieve 
> the top K items according to prediction value for the user? Or is it 
> factorize the matrix and require some online logic when the recommendation is 
> being asked?
> 2.      From my knowledge, applying a SVD technique require first to fill in 
> all empty cells in the ranking matrix (with average ranking for example). Is 
> it something done during the ALS job (and if so, what is the way it's being 
> filled), or should it be done as a preprocessing step?
> 3.      From my understanding SVD recommenders are used to predict user 
> implicit preference. By doing so you can recommend top K items (top K items 
> over descending orders according to the prediction). I wonder, could it be 
> applied on a binary dataset (explicit), where my ranking matrix contain only 
> 1\0?
> 4.      From doing some readings I found that the timeSVD++ developed by 
> Yehuda Koren is considered as the superior SVD implementation for SVD 
> recommenders. I wondered if there is any kind of a parallel implementation of 
> it on top of Hadoop? I found this proposal: 
> https://issues.apache.org/jira/browse/MAHOUT-371
>       I wonder, what is the status of it? Was it being checked already? Is it 
> stable? Did anyone experienced with it?
>
> Thanks,
> Oren
>
>
>
>
>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

Re: A bunch of SVD questions...

Reply via email to