Re: Interpreting the output of SVD

Lance Norskog Mon, 22 Nov 2010 20:56:13 -0800

Wow!  Thanks everyone, this really helps.

2010/11/22 Fernando Fernández <[email protected]>:
> Lance,
>
> Columns of U are in some contexts called "latent factors". For example, if
> we are applying SVD over a Document(User)-Term(Items) matrix, Columns of U
> could be interpreted as a representation of groups of terms (words that have
> similar meaning or tend to appear together in documents of the same kind, so
> in this case this "latent" factors are "topics" in some way. Another example
> of this is when we apply the SVD factorization in the famous movie
> recommendation problem. The "latent" factors (columns of the U matrix)
> represent somewhat some kind of "movie topics" (Drama, terror, comedy, and
> possible combinations of these...). Note that if we are trying to make
> recommendations of movies, we will recommend movies that has a similar
> topic, i.e. we will recommend probably a whole topic, not an specific
> movie... but SVD helps us find what movies fall into that topic. Note that
> this "topic" could be in fact something more abstract than "Drama" or
> "comedy".
>
> The interpretation of V is more or less the "transpose" of these. In the
> movie example, the columns of V could be seen as a representation of users
> that have seen (or rated) the same movie. So if two movies have a similar
> topic, it has been possible been rated or seen by the same persons, so both
> movies will have similar values on the V colum representing that group of
> persons...
>
> Actually, Rows of U can be use to find distances between users (according to
> what the have rated), and rows of Vt can be used to find distances between
> movies (according to what people have rated them).
>
> Last, The values of S are as some other users pointed, can be seen as a
> "weight" of the importance of this "latent" factors when i'm trying to see
> the differences between movies or between users.
>
> Hope this helps. Please, any other user correct me if you see something
> wrong in my examples.
>
> Best,
> Fernando.
>
>
>
> 2010/11/22 Ted Dunning <[email protected]>
>
>> Commonly the square root of S is applied to both U and V.  S is a set of
>> importance weightings for the otherwise
>> normalized columns of U and V.
>>
>> On Mon, Nov 22, 2010 at 10:10 AM, Sean Owen <[email protected]> wrote:
>>
>> > Hmm. I think I need to fix the second half of my analogy.
>> >
>> > It's really U x S that could be said to be users' preferences for
>> > pseudo-items. and S x VT could be said to be pseudo-users preferences for
>> > real items. S itself is a diagonal matrix of course and those values are
>> > kind of like "scaling factors" ... but I actually struggle to come up
>> with
>> > a
>> > good intuitive explanation of what S itself is (or really, U and V by
>> > themselves).
>> >
>> > Anyone smarter have a nice pithy analogy?
>> >
>> > On Mon, Nov 22, 2010 at 11:06 AM, Sean Owen <[email protected]> wrote:
>> > >
>> > > In more CF-oriented terms, S is an expression of pseudo-users'
>> > preferences
>> > > for pseudo-items. And then U expresses how much each real user
>> > corresponds
>> > > to each pseudo-user, and likewise for V and items.
>> > >
>> > > To put out a speculative analogy -- let's say we're looking at users'
>> > > preferences for songs. The "pseudo-items" that the SVD comes up with
>> > might
>> > > correspond to something like genres, or logical groupings of songs.
>> > > "Pseudo-users" are something like types of listeners, perhaps
>> > corresponding
>> > > to demographics.
>> > >
>> > > Whereas an entry in the original matrix makes a statement like "Tommy
>> > likes
>> > > the band Filter", an entry in S makes a statement like "Teenage boys in
>> > > moderately affluent households like industrial metal". And U says how
>> > much
>> > > Tommy is part of this demographic, and V tells how much Filter is
>> > industrial
>> > > metal.
>> > >
>> > >
>> >
>>
>




-- 
Lance Norskog
[email protected]

Re: Interpreting the output of SVD

Reply via email to