Wow! Thanks everyone, this really helps. 2010/11/22 Fernando Fernández <[email protected]>: > Lance, > > Columns of U are in some contexts called "latent factors". For example, if > we are applying SVD over a Document(User)-Term(Items) matrix, Columns of U > could be interpreted as a representation of groups of terms (words that have > similar meaning or tend to appear together in documents of the same kind, so > in this case this "latent" factors are "topics" in some way. Another example > of this is when we apply the SVD factorization in the famous movie > recommendation problem. The "latent" factors (columns of the U matrix) > represent somewhat some kind of "movie topics" (Drama, terror, comedy, and > possible combinations of these...). Note that if we are trying to make > recommendations of movies, we will recommend movies that has a similar > topic, i.e. we will recommend probably a whole topic, not an specific > movie... but SVD helps us find what movies fall into that topic. Note that > this "topic" could be in fact something more abstract than "Drama" or > "comedy". > > The interpretation of V is more or less the "transpose" of these. In the > movie example, the columns of V could be seen as a representation of users > that have seen (or rated) the same movie. So if two movies have a similar > topic, it has been possible been rated or seen by the same persons, so both > movies will have similar values on the V colum representing that group of > persons... > > Actually, Rows of U can be use to find distances between users (according to > what the have rated), and rows of Vt can be used to find distances between > movies (according to what people have rated them). > > Last, The values of S are as some other users pointed, can be seen as a > "weight" of the importance of this "latent" factors when i'm trying to see > the differences between movies or between users. > > Hope this helps. Please, any other user correct me if you see something > wrong in my examples. > > Best, > Fernando. > > > > 2010/11/22 Ted Dunning <[email protected]> > >> Commonly the square root of S is applied to both U and V. S is a set of >> importance weightings for the otherwise >> normalized columns of U and V. >> >> On Mon, Nov 22, 2010 at 10:10 AM, Sean Owen <[email protected]> wrote: >> >> > Hmm. I think I need to fix the second half of my analogy. >> > >> > It's really U x S that could be said to be users' preferences for >> > pseudo-items. and S x VT could be said to be pseudo-users preferences for >> > real items. S itself is a diagonal matrix of course and those values are >> > kind of like "scaling factors" ... but I actually struggle to come up >> with >> > a >> > good intuitive explanation of what S itself is (or really, U and V by >> > themselves). >> > >> > Anyone smarter have a nice pithy analogy? >> > >> > On Mon, Nov 22, 2010 at 11:06 AM, Sean Owen <[email protected]> wrote: >> > > >> > > In more CF-oriented terms, S is an expression of pseudo-users' >> > preferences >> > > for pseudo-items. And then U expresses how much each real user >> > corresponds >> > > to each pseudo-user, and likewise for V and items. >> > > >> > > To put out a speculative analogy -- let's say we're looking at users' >> > > preferences for songs. The "pseudo-items" that the SVD comes up with >> > might >> > > correspond to something like genres, or logical groupings of songs. >> > > "Pseudo-users" are something like types of listeners, perhaps >> > corresponding >> > > to demographics. >> > > >> > > Whereas an entry in the original matrix makes a statement like "Tommy >> > likes >> > > the band Filter", an entry in S makes a statement like "Teenage boys in >> > > moderately affluent households like industrial metal". And U says how >> > much >> > > Tommy is part of this demographic, and V tells how much Filter is >> > industrial >> > > metal. >> > > >> > > >> > >> >
-- Lance Norskog [email protected]
