I think Dmitriys description of the SGD and ALS-WR approach hits the nail on the head.
However there is a third way to factorize the rating matrix which we haven't talked about yet. It's described in Yehuda Koren's "Collaborative Filtering for Implicit Feedback Datasets" http://research.yahoo.com/pub/2433 and I recently added it to ParallelALSFactorizationJob. This approach works on implicit feedback data (like the number of times a user watched a television series) and all unobserved interactions are by definition 0. Using a standard SVD would result in the problems Dmitriy described. But the paper introduces a very interesting approach: the user-item matrix holds 0s and 1s only (0 in a cell if there have been no interactions, 1 if there have been 1 or more interactions). This matrix is decomposed into two other matrices X and Y (user and item features) by minimizing the (regularized) squared error over all observations (which is the same as in ALS-WR). However the error is weighted by a confidence value that is very low if the user never interacted with the item (because he simply might not be aware that this item exists) and very high if the user interacted very often with the item (a good indication of preference). That should help to avoid the problems that Dmitriy described. --sebastian 2011/11/17 Dmitriy Lyubimov <[email protected]>: > On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[email protected]> wrote: >> I will finish adding an option with Cholesky decomposition route to >> SSVD some time early in Q1 2012. >> > > PPS i already put some jobs in (they are in the trunk) for Cholesky > route. I thought it would be an easy mod but then i saw that it would > require a little bit more modifications to also support power > iterations the same way they are supported today (and also i still > kind of couldn't quite finish my thought process on what it would take > to modify U-job to produce U without Q in his case, it seems this > route will require a 100% special handling and i wouldn't be able to > reuse any of current U job for this option. > > For these reasons, i decided to wait until i figure all of the > remaining issues architecturally before i proceed. And that would > better be a one longer chunk of time rather than several little > chunks, which makes it dependent more on my schedule to figure where > that chunk might be. >
