Re: Singular vectors of a recommendation Item-Item space

Ted Dunning Thu, 25 Aug 2011 17:58:04 -0700

In matrix terms the binary user x item matrix maps a set of items to users
(A h = users who interacted with items in h).  Similarly A' maps users to
items.  Thus A' (A h) is the classic "users who x'ed this also x'ed that"
sort of operation.  This can be rearranged to be (A'A) h.


This is where the singular values come in.  If

     A \approx U S V'

and we know that U'U = V'V = I from the definition of the SVD.  This means
that

    A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V'

But V' transforms an item vector into the reduced dimensional space so we
can view this as transforming the item vector into the reduced dimensional
space, scaling element-wise using S^2 and then transforming back to item
space.

As you noted, the first right singular vector corresponds to popularity.  It
is often not appropriate to just recommend popular things (since users have
other means of discovering them) so you might want to drop that dimension
from the analysis.  This can be done in the SVD results or it can be done
using something like a log-likelihood ratio test so that we only model
anomalously large values in A'A.

Btw, if you continue to use R for experimentation, you should be able to
dramatically increase the size of the analysis you do by using sparse
matrices and a random projection algorithm.  I was able to compute SVD's of
matrices with millions of rows and columns and a few million non-zero
elements in a few seconds.  Holler if you would like details about how to do
this.

On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <[email protected]> wrote:

> One thing I found interesting (but not particularly surprising) is that the
> biggest singular value/vector was pretty much tied directly to volume.
>  That
> makes sense because the best predictor of whether a given fields value was
> 1
> was whether it belonged to a row with lots of 1s or a column with lots of
> 1s
> (I haven't quite figured out the best method to normalize the values with
> yet).  When I plot the values of the largest singular vectors against the
> sum of values in the corresponding row or column, the correlation is very
> linear.  That's not the case for any of the other singular vectors (which
> actually makes me wonder if just removing that first singular vector from
> the prediction might not be one of the best ways to normalize the data).
>
> I understand what you're saying about each singular vector corresponding to
> a feature though.  Each left singular vector represents some abstract
> aspect
> of a movie and each right singular vector represents users leanings or
> inclinations with regards to that aspect of the movie.  The singular value
> itself just seems to indicate how good a predictor the combination of one
> users inclination toward that aspect of a movie is for coming up with the
> actual value.  The issue I mentioned above is that popularity of a movie as
> well as how often a user watches movies tend to be the best predictors of
> whether a user has seen or will see a movie.
>
> I had been picturing this with the idea of one k dimensional space -- one
> where a users location corresponds to their ideal prototypical movies
> location. Not that there would be a movie right there, but there would be
> ones nearby, and the nearer they were, the more enjoyable they would be.
>  That's a naive model, but that doesn't mean it wouldn't work well enough.
>  My problem is I don't quite get how to map the user space over to the item
> space.
>
> I think that may be what Lance is trying to describe in his last response,
> but I'm falling short on reconstructing the math from his description.
>
> I get the following
>
> U S V* = A
> U S = A V
> U = A V 1/S
>
> I think that last line is what Lance was describing.  Of course my problem
> was bootstrapping in a user for whom I don't know A or V.
>
> I also think I may have missed a big step of the puzzle.  For some reason I
> thought that just by loosening the rank, you could recompose the Matrix A
> from the truncated SVD values/vectors and use the recomposed values
> themselves as the recommendation.  I thought one of the ideas was that the
> recomposed matrix had less "noise" and could be a better representation of
> the underlying nature of the matrix than the original matrix itself.  But
> that may have just been wishful thinking...
>
> On Thu, Aug 25, 2011 at 4:50 PM, Sean Owen <[email protected]> wrote:
>
> > The 200x10 matrix is indeed a matrix of 10 singular vectors, which are
> > eigenvectors of AA'. It's the columns, not rows, that are
> > eigenvectors.
> >
> > The rows do mean something. I think it's fair to interpret the 10
> > singular values / vectors as corresponding to some underlying features
> > of tastes. The rows say how much each user expresses those 10 tastes.
> > The matrix of right singular vectors on the other side tells you the
> > same thing about items. The diagonal matrix of singular values in the
> > middle also comes into play -- it's like a set of multipliers that say
> > how important each feature is. (This is why we cut out the singular
> > vectors / values that have the smallest singular values; it's like
> > removing the least-important features.) So really you'd have to stick
> > those values somewhere; Ted says it's conventional to put "half" of
> > each (their square roots) with each side if anything.
> >
> > I don't have as good a grasp on an intuition for the columns as
> > eigenvectors. They're also a set of basis vectors, and I had
> > understood them as like the "real" bases of the reduced feature space
> > expressed in user-item space. But I'd have to go back and think that
> > intuition through again since I can't really justify it from scratch
> > again in my head just now.
> >
> >
> > On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <[email protected]>
> wrote:
> > > Well, I think my problem may have had more to do with what I was
> calling
> > the
> > > eigenvector...  I was referring to the rows rather than the columns of
> U
> > and
> > > V.  While the columns may be characteristic of the overall matrix, the
> > rows
> > > are characteristic of the user or item (in that they are a rank reduced
> > > representation of that person or thing). I guess you could say I just
> had
> > to
> > > tilt my head to the side and change my perspective 90 degrees =)
> > >
> >
>

Re: Singular vectors of a recommendation Item-Item space

Reply via email to