Thank you. Maybe it should be called FactorizedRecommender instead, since there is no real SVD?
On Sun, Nov 6, 2011 at 11:58 PM, Sebastian Schelter <[email protected]> wrote: > I think its also important to note that in the recommendation world most > of the time something which only looks like an SVD is used. > > In most recommendation usecases, only a tiny fraction of the > user-item-matrix is observed. Therefore "standard" algorithms for > obtaining the SVD cannot be used (at least that's what the papers say). > > The usual approach is to decompose the rating matrix A (users x items) > into the product of two other matrices X (users x features) and Y (items > x features). A is approximated by XY'. > > This decomposition is obtained by selecting X and Y in a way that > minimizes the overall squared error in regard to the observed ratings > (plus some regularization to avoid overfitting). > > For all observed user-item pairs (u,i) the squared error of the > prediction is obtained by multiplying the corresponding user and item > vector from the latent feature space: > > f(X,Y) = sum_{u,i} (r_{u,i} - x_u' y_i)^2 + some regularization term > > There are several ways to find the X and Y that minimize this function. > > One approach is to use Stochastic Gradient Descent. This approach was > made by popular by Simon Funk's famous article during the netflix prize: > > "Netflix Update: Try this at home" > http://sifter.org/~simon/journal/20061211.html > > I think that's also what's implemented in our > ExpectationMaximizationSVDFactorizer. > > Other approaches use a technique called "Alternating Least Squares" > where you iteratively fix either X or Y and solve the equation for the > other. > > This approach is described in: "Large-scale Parallel Collaborative > Filtering for the Netflix Prize", > > http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf > > Our sequential ALSWRFactorizer is an implementation of this as well as > the recently refactored ParallelALSFactorizationJob which takes this > algorithm onto hadoop. > > There is a second very interesting paper that shows how to use the ALS > approach with implicit feedback data: "Collaborative Filtering for > Implicit Feedback Datasets", http://research.yahoo.com/pub/2433 > > I think Tamas's sequential factorizer in MAHOUT-737 is a direct > implementation of this and I also plan to include it in > ParallelALSFactorizationJob. > > --sebastian > > > On 07.11.2011 07:10, Sean Owen wrote: > > It's embedded in both as far as I can tell, though I don't know enough > > about the implementation to say what the split is. Ted remarked once > > that it's usual to split sqrt(S) across both. > > > > Dotting two vectors to get one rating is really just the process of > > matrix multiplication to recover a value from the approximate > > user-item matrix. A = U S V', and we have some truncated versions of > > those right matrices. Uk Sk V'k gives Ak, which is some approximation > > of the original A (the input) but with many new values filled in from > > which to make recommendations. The Uk and V'k actually already have Sk > > embedded. So to get one value in Ak is just a dot of a row of Uk with > > a column of V'k, per usual matrix multiplication. > > > > On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <[email protected]> > wrote: > >> This thread is about the class SVDRecommender, which uses an externally > >> created factorization to do recommendation. > >> > >> A: The Factorization classes do not extract the scaling diagonal > matrix. Is > >> this embedded in the left or right matrix? Or spread across both? > >> B: Is there an explanation of why the dot product of two feature vectors > >> creates the preference? A paper or blog post? Or a paragraph in a reply? > >> > >> -- > >> Lance Norskog > >> [email protected] > >> > > -- Lance Norskog [email protected]
