I think its also important to note that in the recommendation world most
of the time something which only looks like an SVD is used.
In most recommendation usecases, only a tiny fraction of the
user-item-matrix is observed. Therefore "standard" algorithms for
obtaining the SVD cannot be used (at least that's what the papers say).
The usual approach is to decompose the rating matrix A (users x items)
into the product of two other matrices X (users x features) and Y (items
x features). A is approximated by XY'.
This decomposition is obtained by selecting X and Y in a way that
minimizes the overall squared error in regard to the observed ratings
(plus some regularization to avoid overfitting).
For all observed user-item pairs (u,i) the squared error of the
prediction is obtained by multiplying the corresponding user and item
vector from the latent feature space:
f(X,Y) = sum_{u,i} (r_{u,i} - x_u' y_i)^2 + some regularization term
There are several ways to find the X and Y that minimize this function.
One approach is to use Stochastic Gradient Descent. This approach was
made by popular by Simon Funk's famous article during the netflix prize:
"Netflix Update: Try this at home"
http://sifter.org/~simon/journal/20061211.html
I think that's also what's implemented in our
ExpectationMaximizationSVDFactorizer.
Other approaches use a technique called "Alternating Least Squares"
where you iteratively fix either X or Y and solve the equation for the
other.
This approach is described in: "Large-scale Parallel Collaborative
Filtering for the Netflix Prize",
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
Our sequential ALSWRFactorizer is an implementation of this as well as
the recently refactored ParallelALSFactorizationJob which takes this
algorithm onto hadoop.
There is a second very interesting paper that shows how to use the ALS
approach with implicit feedback data: "Collaborative Filtering for
Implicit Feedback Datasets", http://research.yahoo.com/pub/2433
I think Tamas's sequential factorizer in MAHOUT-737 is a direct
implementation of this and I also plan to include it in
ParallelALSFactorizationJob.
--sebastian
On 07.11.2011 07:10, Sean Owen wrote:
> It's embedded in both as far as I can tell, though I don't know enough
> about the implementation to say what the split is. Ted remarked once
> that it's usual to split sqrt(S) across both.
>
> Dotting two vectors to get one rating is really just the process of
> matrix multiplication to recover a value from the approximate
> user-item matrix. A = U S V', and we have some truncated versions of
> those right matrices. Uk Sk V'k gives Ak, which is some approximation
> of the original A (the input) but with many new values filled in from
> which to make recommendations. The Uk and V'k actually already have Sk
> embedded. So to get one value in Ak is just a dot of a row of Uk with
> a column of V'k, per usual matrix multiplication.
>
> On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <[email protected]> wrote:
>> This thread is about the class SVDRecommender, which uses an externally
>> created factorization to do recommendation.
>>
>> A: The Factorization classes do not extract the scaling diagonal matrix. Is
>> this embedded in the left or right matrix? Or spread across both?
>> B: Is there an explanation of why the dot product of two feature vectors
>> creates the preference? A paper or blog post? Or a paragraph in a reply?
>>
>> --
>> Lance Norskog
>> [email protected]
>>