On Fri, Jun 4, 2010 at 12:38 AM, Sean Owen <[email protected]> wrote: > Yes thanks a lot. Makes sense to me: we're just changing basis and V > is the change-of-basis transformation. Glad to see that is all there > is to it; not sure what the rest is about. >
Exactly. > I had thought of U S as "user preferences for features" and V as > "expression of features in items". The paper breaks S in half by > taking the square root S = B* B and putting B* with U and B with V. > This is pretty common usage actually. It allows some degree of normalization of the vectors, but isn't strictly necessary. Note that since this is an SVD, S is diagonal and all elements are real (and positive, actually). Thus B* = B. > > But am I right that both are equivalent? Ish. But not customary. The old LSI article makes a better case than I can off the cuff just before bed. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.7546&rep=rep1&type=pdf > Because I'd rather think of > maintaining and updating U S. Because conceptually S is just full of > multipliers -- making users 3x more keen on feature 1 is the same as > reducing the item express 3x less of feature 1. Certainly in the > recommendation computation they show, which makes sense, it doesn't > matter since the dot product is the same. > Actually the elements of S aren't item or user specific. Remember there are only k non-zeros there. They represent the strength of each of the singular vectors. A very useful way to look at it is to consider the SVD as a sum of rank-1 outer products u_i * v'_i. These rank-1 products are summed with weights s_i. This way of looking at matters makes a number of lemmas about SVD's pretty trivial. > They also add on the "row" average to make a prediction, which is the > average rating by the user, I'm guessing -- "row" is a row of A? I would guess so, but that would only make sense if they subtracted it ahead of time. In general, I don't see the point for that. I would rather cosine normalize each user row. > Just working backwards, I'd assume this is because the generated > predictions are otherwise "centered" in the sense that 0 will be > predicted for an item that the user might be neutral on. But I guess I > hadn't seen the intuitive reason this is the result. Is there any easy > way to see it? > For a new user with no history, h = 0 so the corresponding k-dimensional representation of this user will be s^-(1/2) h v = 0. The dot product with any item vector will be identically 0. I don't know that it would make any useful difference, but it would make me happier to reduce the ratings to binary, normalize rows and then decompose. In general, I think that the Belkor team had a much better approach with SVD++ and their time dynamics trick. That is much the same as mean removal. > > > On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[email protected]> wrote: > > You are correct. The paper has an appalling treatment of the folding in > > approach. > > > > In fact, the procedure is dead simple. > > > > The basic idea is to leave the coordinate system derived in the original > SVD > > intact and simply project the new users into that space. > > > > The easiest way to see what is happening is to start again with the > original > > rating matrix A as decomposed: > > > > A = U S V' > > > > where A is users x items. If we multiply on the right by V, we get > > > > A V = U S V' V = U S > > > > (because V' V = I, by definition). This result is (users x items) x > (items > > x k) = users x k, that is, it gives a k dimensional vector for each user. > > Similarly, multiplication on the left by U' gives a k x items matrix > which, > > when transposed gives a k dimensional vector for each item. > > > > This implies that if we augment U with new user row vectors U_new, we > should > > be able to simply compute new k-dimensional vectors for the new users and > > adjoin these new vectors to the previous vectors. Concisely put, > > > > ( A ) ( A V ) > > ( ) V = ( ) > > ( A_new ) ( A_new V ) > > > > This isn't really magical. It just says that we can compute new user > > vectors at any time by multiplying the new users' ratings by V. > > > > The diagram in figure one is hideously confusing because it looks like a > > picture of some kind of multiplication whereas it is really depicting > some > > odd kind of flow diagram. > > > > Does this solve the problem? > > > > On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[email protected]> wrote: >
