Let me rephrase. Suppose I did ALS decomposition of a matrix. Suppose I don't want to produce recommendations (by calculating XY'). Suppose I want to find users with similar preferences (by calculating XX'). Should the correlation of a user with himself be 1.0?
If the answer is "yes", that means that the user-feature vectors in X should be normalized, i.e., scaled to have the length of 1.0. If the answer is "no" then a user can possibly correlate stronger with another user than himself. Which should it be? Which one is the case in Mahout? On Wed, Sep 4, 2013 at 1:59 PM, Dmitriy Lyubimov <[email protected]> wrote: > On Wed, Sep 4, 2013 at 10:07 AM, Koobas <[email protected]> wrote: > > In ALS the coincidence matrix is approximated by XY', > > where X is user-feature, Y is item-feature. > > Now, here is the question: > > are/should the feature vectors be normalized before computing > > recommendations? > > if it is a coincidence matrix in a sense that there are just 0's and > 1's no it shouldn't (imo). However, if there's a case of > no-observations then things are a little bit more complicated (in a > sense that preference is still 0 and 1 but there're confidence > weights. Determining weights (no-observation weight vs. degree of > consumption) is usually advised to be determined via > (cross)validation. However at this point Mahout does not support > crossvalidation of those parameters, so usually people use some > guesswork (see Zhou-Koren-Volinsky paper about implicit feedback > datasets). > > > > Now, what happens in the case of SVD? > > The vectors are normal by definition. > > Are singular values used at all, or just left and right singular vectors? > > SVD does not take weights so it cannot ignore or weigh out a > non-observation, which is why it is not well suited for matrix > completion problem per se. >
