Re: Factorizing SVD (online SVDRecommender)

Lance Norskog Tue, 08 Nov 2011 16:03:09 -0800

Thank you. Maybe it should be called FactorizedRecommender instead, since
there is no real SVD?


On Sun, Nov 6, 2011 at 11:58 PM, Sebastian Schelter <[email protected]> wrote:

> I think its also important to note that in the recommendation world most
> of the time something which only looks like an SVD is used.
>
> In most recommendation usecases, only a tiny fraction of the
> user-item-matrix is observed. Therefore "standard" algorithms for
> obtaining the SVD cannot be used (at least that's what the papers say).
>
> The usual approach is to decompose the rating matrix A (users x items)
> into the product of two other matrices X (users x features) and Y (items
> x features). A is approximated by XY'.
>
> This decomposition is obtained by selecting X and Y in a way that
> minimizes the overall squared error in regard to the observed ratings
> (plus some regularization to avoid overfitting).
>
> For all observed user-item pairs (u,i) the squared error of the
> prediction is obtained by multiplying the corresponding user and item
> vector from the latent feature space:
>
> f(X,Y) = sum_{u,i} (r_{u,i} - x_u' y_i)^2 + some regularization term
>
> There are several ways to find the X and Y that minimize this function.
>
> One approach is to use Stochastic Gradient Descent. This approach was
> made by popular by Simon Funk's famous article during the netflix prize:
>
> "Netflix Update: Try this at home"
> http://sifter.org/~simon/journal/20061211.html
>
> I think that's also what's implemented in our
> ExpectationMaximizationSVDFactorizer.
>
> Other approaches use a technique called "Alternating Least Squares"
> where you iteratively fix either X or Y and solve the equation for the
> other.
>
> This approach is described in: "Large-scale Parallel Collaborative
> Filtering for the Netﬂix Prize",
>
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
>
> Our sequential ALSWRFactorizer is an implementation of this as well as
> the recently refactored ParallelALSFactorizationJob which takes this
> algorithm onto hadoop.
>
> There is a second very interesting paper that shows how to use the ALS
> approach with implicit feedback data: "Collaborative Filtering for
> Implicit Feedback Datasets", http://research.yahoo.com/pub/2433
>
> I think Tamas's sequential factorizer in MAHOUT-737 is a direct
> implementation of this and I also plan to include it in
> ParallelALSFactorizationJob.
>
> --sebastian
>
>
> On 07.11.2011 07:10, Sean Owen wrote:
> > It's embedded in both as far as I can tell, though I don't know enough
> > about the implementation to say what the split is. Ted remarked once
> > that it's usual to split sqrt(S) across both.
> >
> > Dotting two vectors to get one rating is really just the process of
> > matrix multiplication to recover a value from the approximate
> > user-item matrix. A = U S V', and we have some truncated versions of
> > those right matrices. Uk Sk V'k gives Ak, which is some approximation
> > of the original A (the input) but with many new values filled in from
> > which to make recommendations. The Uk and V'k actually already have Sk
> > embedded. So to get one value in Ak is just a dot of a row of Uk with
> > a column of V'k, per usual matrix multiplication.
> >
> > On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <[email protected]>
> wrote:
> >> This thread is about the class SVDRecommender, which uses an externally
> >> created factorization to do recommendation.
> >>
> >> A: The Factorization classes do not extract the scaling diagonal
> matrix. Is
> >> this embedded in the left or right matrix? Or spread across both?
> >> B: Is there an explanation of why the dot product of two feature vectors
> >> creates the preference? A paper or blog post? Or a paragraph in a reply?
> >>
> >> --
> >> Lance Norskog
> >> [email protected]
> >>
>
>


-- 
Lance Norskog
[email protected]

Re: Factorizing SVD (online SVDRecommender)

Reply via email to