Re: Singular vectors of a recommendation Item-Item space

Lance Norskog Fri, 26 Aug 2011 20:22:09 -0700

http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html


Do not fear demographics. Yes, some people rent movies with all-black casts,
and other people rent movies with all-white casts. And the Walmarts in the
SF East Bay have palettes full of Tyler Perry videos, while most of the SF
Peninsula don't know his name.

And sometimes a bunch of Army Rangers have a great time watching
"Confessions of a Shopaholic" in a tent, 120o farenheit, in Qatar.

On Fri, Aug 26, 2011 at 8:29 AM, Jeff Hansen <[email protected]> wrote:

> Thanks for the math Ted -- that was very helpful.
>
> I've been using sparseMatrix() from libray(Matrix) -- largely based on your
> response to somebody elses email.  I've been playing with smaller matrices
> mainly for my own learning purposes -- it's much easier to read through 200
> movies (most of which I've heard of) and get a gut feel, than 10,000
> movies.
>  It also means I don't have to shut down my session quite as often (I've
> been using rstudio) when I run something over the wrong dimension(column vs
> row).  I was running into some limitations as well, but I think some of
> those may have had more to do with my own typos and misunderstanding of
> that
> language.
>
> There's a second reason I've been avoiding the tail -- while working on
> this
> as a possible project to propose, I had a bit of a "duh" moment.  Sometimes
> it's important to realize the real world constraints.  Picture a company
> with very physical locations and very limited shelf space (say a
> convenience
> store or even a vending machine...)  Netflix and Amazon can make a lot of
> money in the long tail because they have centralized and or digital
> inventories that service a large customer base.  Imagine a situation where
> you have a small physical distributed inventory with an equally distributed
> customer base.  In that situation it doesn't pay to chase after the tail --
> you just need to reduce your costs and target the head where you get more
> volume with less items.  So you really end up with a three tier market
> segmentation -- one strategy works best for the head, another for the body,
> and a third for the tail.
>
> As far as clusters go -- I really wasn't finding any clusters at the edges
> of the data, but that could have more to do with not including the tail
> (and
> not normalizing appropriately for popularity).
>
> Incidentally, there was one amusingly cautionary anecdote I'd share -- I
> don't remember the exact spread, but at one point I had two extremes that
> included something like "the johnson family vacation", "my baby's daddy",
> and "barbershop 2" on the one side, and then titles like "mean girls", "50
> first dates" and "along came polly" on the other side.  You may have to
> look
> up those titles to see what I'm talking about, but I'd say the distinction
> will be pretty clear when you do -- and if you were simply automating all
> of
> this and not reviewing it with common sense, you could end up offending
> some
> of your users...  All I'm saying is there may be trends in the real world
> that some people aren't comfortable having pointed out.
>
> On Fri, Aug 26, 2011 at 4:08 AM, Lance Norskog <[email protected]> wrote:
>
> > This is a little meditation on user v.s. item matrix density. The
> > heavy users and heavy items can be subsampled, once they are
> > identified. Hadoop's built-in sort does give a very simple
> > "map-increase" way to do this sort.
> >
> > http://ultrawhizbang.blogspot.com/2011/08/sorted-recommender-data.html
> >
> > On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <[email protected]>
> > wrote:
> > > In matrix terms the binary user x item matrix maps a set of items to
> > users
> > > (A h = users who interacted with items in h).  Similarly A' maps users
> to
> > > items.  Thus A' (A h) is the classic "users who x'ed this also x'ed
> that"
> > > sort of operation.  This can be rearranged to be (A'A) h.
> > >
> > > This is where the singular values come in.  If
> > >
> > >     A \approx U S V'
> > >
> > > and we know that U'U = V'V = I from the definition of the SVD.  This
> > means
> > > that
> > >
> > >    A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V'
> > >
> > > But V' transforms an item vector into the reduced dimensional space so
> we
> > > can view this as transforming the item vector into the reduced
> > dimensional
> > > space, scaling element-wise using S^2 and then transforming back to
> item
> > > space.
> > >
> > > As you noted, the first right singular vector corresponds to
> popularity.
> >  It
> > > is often not appropriate to just recommend popular things (since users
> > have
> > > other means of discovering them) so you might want to drop that
> dimension
> > > from the analysis.  This can be done in the SVD results or it can be
> done
> > > using something like a log-likelihood ratio test so that we only model
> > > anomalously large values in A'A.
> > >
> > > Btw, if you continue to use R for experimentation, you should be able
> to
> > > dramatically increase the size of the analysis you do by using sparse
> > > matrices and a random projection algorithm.  I was able to compute
> SVD's
> > of
> > > matrices with millions of rows and columns and a few million non-zero
> > > elements in a few seconds.  Holler if you would like details about how
> to
> > do
> > > this.
> > >
> > > On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <[email protected]>
> wrote:
> > >
> > >> One thing I found interesting (but not particularly surprising) is
> that
> > the
> > >> biggest singular value/vector was pretty much tied directly to volume.
> > >>  That
> > >> makes sense because the best predictor of whether a given fields value
> > was
> > >> 1
> > >> was whether it belonged to a row with lots of 1s or a column with lots
> > of
> > >> 1s
> > >> (I haven't quite figured out the best method to normalize the values
> > with
> > >> yet).  When I plot the values of the largest singular vectors against
> > the
> > >> sum of values in the corresponding row or column, the correlation is
> > very
> > >> linear.  That's not the case for any of the other singular vectors
> > (which
> > >> actually makes me wonder if just removing that first singular vector
> > from
> > >> the prediction might not be one of the best ways to normalize the
> data).
> > >>
> > >> I understand what you're saying about each singular vector
> corresponding
> > to
> > >> a feature though.  Each left singular vector represents some abstract
> > >> aspect
> > >> of a movie and each right singular vector represents users leanings or
> > >> inclinations with regards to that aspect of the movie.  The singular
> > value
> > >> itself just seems to indicate how good a predictor the combination of
> > one
> > >> users inclination toward that aspect of a movie is for coming up with
> > the
> > >> actual value.  The issue I mentioned above is that popularity of a
> movie
> > as
> > >> well as how often a user watches movies tend to be the best predictors
> > of
> > >> whether a user has seen or will see a movie.
> > >>
> > >> I had been picturing this with the idea of one k dimensional space --
> > one
> > >> where a users location corresponds to their ideal prototypical movies
> > >> location. Not that there would be a movie right there, but there would
> > be
> > >> ones nearby, and the nearer they were, the more enjoyable they would
> be.
> > >>  That's a naive model, but that doesn't mean it wouldn't work well
> > enough.
> > >>  My problem is I don't quite get how to map the user space over to the
> > item
> > >> space.
> > >>
> > >> I think that may be what Lance is trying to describe in his last
> > response,
> > >> but I'm falling short on reconstructing the math from his description.
> > >>
> > >> I get the following
> > >>
> > >> U S V* = A
> > >> U S = A V
> > >> U = A V 1/S
> > >>
> > >> I think that last line is what Lance was describing.  Of course my
> > problem
> > >> was bootstrapping in a user for whom I don't know A or V.
> > >>
> > >> I also think I may have missed a big step of the puzzle.  For some
> > reason I
> > >> thought that just by loosening the rank, you could recompose the
> Matrix
> > A
> > >> from the truncated SVD values/vectors and use the recomposed values
> > >> themselves as the recommendation.  I thought one of the ideas was that
> > the
> > >> recomposed matrix had less "noise" and could be a better
> representation
> > of
> > >> the underlying nature of the matrix than the original matrix itself.
> >  But
> > >> that may have just been wishful thinking...
> > >>
> > >> On Thu, Aug 25, 2011 at 4:50 PM, Sean Owen <[email protected]> wrote:
> > >>
> > >> > The 200x10 matrix is indeed a matrix of 10 singular vectors, which
> are
> > >> > eigenvectors of AA'. It's the columns, not rows, that are
> > >> > eigenvectors.
> > >> >
> > >> > The rows do mean something. I think it's fair to interpret the 10
> > >> > singular values / vectors as corresponding to some underlying
> features
> > >> > of tastes. The rows say how much each user expresses those 10
> tastes.
> > >> > The matrix of right singular vectors on the other side tells you the
> > >> > same thing about items. The diagonal matrix of singular values in
> the
> > >> > middle also comes into play -- it's like a set of multipliers that
> say
> > >> > how important each feature is. (This is why we cut out the singular
> > >> > vectors / values that have the smallest singular values; it's like
> > >> > removing the least-important features.) So really you'd have to
> stick
> > >> > those values somewhere; Ted says it's conventional to put "half" of
> > >> > each (their square roots) with each side if anything.
> > >> >
> > >> > I don't have as good a grasp on an intuition for the columns as
> > >> > eigenvectors. They're also a set of basis vectors, and I had
> > >> > understood them as like the "real" bases of the reduced feature
> space
> > >> > expressed in user-item space. But I'd have to go back and think that
> > >> > intuition through again since I can't really justify it from scratch
> > >> > again in my head just now.
> > >> >
> > >> >
> > >> > On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <[email protected]>
> > >> wrote:
> > >> > > Well, I think my problem may have had more to do with what I was
> > >> calling
> > >> > the
> > >> > > eigenvector...  I was referring to the rows rather than the
> columns
> > of
> > >> U
> > >> > and
> > >> > > V.  While the columns may be characteristic of the overall matrix,
> > the
> > >> > rows
> > >> > > are characteristic of the user or item (in that they are a rank
> > reduced
> > >> > > representation of that person or thing). I guess you could say I
> > just
> > >> had
> > >> > to
> > >> > > tilt my head to the side and change my perspective 90 degrees =)
> > >> > >
> > >> >
> > >>
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > [email protected]
> >
>



-- 
Lance Norskog
[email protected]

Re: Singular vectors of a recommendation Item-Item space

Reply via email to