Thanks for the math Ted -- that was very helpful.

I've been using sparseMatrix() from libray(Matrix) -- largely based on your
response to somebody elses email.  I've been playing with smaller matrices
mainly for my own learning purposes -- it's much easier to read through 200
movies (most of which I've heard of) and get a gut feel, than 10,000 movies.
 It also means I don't have to shut down my session quite as often (I've
been using rstudio) when I run something over the wrong dimension(column vs
row).  I was running into some limitations as well, but I think some of
those may have had more to do with my own typos and misunderstanding of that
language.

There's a second reason I've been avoiding the tail -- while working on this
as a possible project to propose, I had a bit of a "duh" moment.  Sometimes
it's important to realize the real world constraints.  Picture a company
with very physical locations and very limited shelf space (say a convenience
store or even a vending machine...)  Netflix and Amazon can make a lot of
money in the long tail because they have centralized and or digital
inventories that service a large customer base.  Imagine a situation where
you have a small physical distributed inventory with an equally distributed
customer base.  In that situation it doesn't pay to chase after the tail --
you just need to reduce your costs and target the head where you get more
volume with less items.  So you really end up with a three tier market
segmentation -- one strategy works best for the head, another for the body,
and a third for the tail.

As far as clusters go -- I really wasn't finding any clusters at the edges
of the data, but that could have more to do with not including the tail (and
not normalizing appropriately for popularity).

Incidentally, there was one amusingly cautionary anecdote I'd share -- I
don't remember the exact spread, but at one point I had two extremes that
included something like "the johnson family vacation", "my baby's daddy",
and "barbershop 2" on the one side, and then titles like "mean girls", "50
first dates" and "along came polly" on the other side.  You may have to look
up those titles to see what I'm talking about, but I'd say the distinction
will be pretty clear when you do -- and if you were simply automating all of
this and not reviewing it with common sense, you could end up offending some
of your users...  All I'm saying is there may be trends in the real world
that some people aren't comfortable having pointed out.

On Fri, Aug 26, 2011 at 4:08 AM, Lance Norskog <[email protected]> wrote:

> This is a little meditation on user v.s. item matrix density. The
> heavy users and heavy items can be subsampled, once they are
> identified. Hadoop's built-in sort does give a very simple
> "map-increase" way to do this sort.
>
> http://ultrawhizbang.blogspot.com/2011/08/sorted-recommender-data.html
>
> On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <[email protected]>
> wrote:
> > In matrix terms the binary user x item matrix maps a set of items to
> users
> > (A h = users who interacted with items in h).  Similarly A' maps users to
> > items.  Thus A' (A h) is the classic "users who x'ed this also x'ed that"
> > sort of operation.  This can be rearranged to be (A'A) h.
> >
> > This is where the singular values come in.  If
> >
> >     A \approx U S V'
> >
> > and we know that U'U = V'V = I from the definition of the SVD.  This
> means
> > that
> >
> >    A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V'
> >
> > But V' transforms an item vector into the reduced dimensional space so we
> > can view this as transforming the item vector into the reduced
> dimensional
> > space, scaling element-wise using S^2 and then transforming back to item
> > space.
> >
> > As you noted, the first right singular vector corresponds to popularity.
>  It
> > is often not appropriate to just recommend popular things (since users
> have
> > other means of discovering them) so you might want to drop that dimension
> > from the analysis.  This can be done in the SVD results or it can be done
> > using something like a log-likelihood ratio test so that we only model
> > anomalously large values in A'A.
> >
> > Btw, if you continue to use R for experimentation, you should be able to
> > dramatically increase the size of the analysis you do by using sparse
> > matrices and a random projection algorithm.  I was able to compute SVD's
> of
> > matrices with millions of rows and columns and a few million non-zero
> > elements in a few seconds.  Holler if you would like details about how to
> do
> > this.
> >
> > On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <[email protected]> wrote:
> >
> >> One thing I found interesting (but not particularly surprising) is that
> the
> >> biggest singular value/vector was pretty much tied directly to volume.
> >>  That
> >> makes sense because the best predictor of whether a given fields value
> was
> >> 1
> >> was whether it belonged to a row with lots of 1s or a column with lots
> of
> >> 1s
> >> (I haven't quite figured out the best method to normalize the values
> with
> >> yet).  When I plot the values of the largest singular vectors against
> the
> >> sum of values in the corresponding row or column, the correlation is
> very
> >> linear.  That's not the case for any of the other singular vectors
> (which
> >> actually makes me wonder if just removing that first singular vector
> from
> >> the prediction might not be one of the best ways to normalize the data).
> >>
> >> I understand what you're saying about each singular vector corresponding
> to
> >> a feature though.  Each left singular vector represents some abstract
> >> aspect
> >> of a movie and each right singular vector represents users leanings or
> >> inclinations with regards to that aspect of the movie.  The singular
> value
> >> itself just seems to indicate how good a predictor the combination of
> one
> >> users inclination toward that aspect of a movie is for coming up with
> the
> >> actual value.  The issue I mentioned above is that popularity of a movie
> as
> >> well as how often a user watches movies tend to be the best predictors
> of
> >> whether a user has seen or will see a movie.
> >>
> >> I had been picturing this with the idea of one k dimensional space --
> one
> >> where a users location corresponds to their ideal prototypical movies
> >> location. Not that there would be a movie right there, but there would
> be
> >> ones nearby, and the nearer they were, the more enjoyable they would be.
> >>  That's a naive model, but that doesn't mean it wouldn't work well
> enough.
> >>  My problem is I don't quite get how to map the user space over to the
> item
> >> space.
> >>
> >> I think that may be what Lance is trying to describe in his last
> response,
> >> but I'm falling short on reconstructing the math from his description.
> >>
> >> I get the following
> >>
> >> U S V* = A
> >> U S = A V
> >> U = A V 1/S
> >>
> >> I think that last line is what Lance was describing.  Of course my
> problem
> >> was bootstrapping in a user for whom I don't know A or V.
> >>
> >> I also think I may have missed a big step of the puzzle.  For some
> reason I
> >> thought that just by loosening the rank, you could recompose the Matrix
> A
> >> from the truncated SVD values/vectors and use the recomposed values
> >> themselves as the recommendation.  I thought one of the ideas was that
> the
> >> recomposed matrix had less "noise" and could be a better representation
> of
> >> the underlying nature of the matrix than the original matrix itself.
>  But
> >> that may have just been wishful thinking...
> >>
> >> On Thu, Aug 25, 2011 at 4:50 PM, Sean Owen <[email protected]> wrote:
> >>
> >> > The 200x10 matrix is indeed a matrix of 10 singular vectors, which are
> >> > eigenvectors of AA'. It's the columns, not rows, that are
> >> > eigenvectors.
> >> >
> >> > The rows do mean something. I think it's fair to interpret the 10
> >> > singular values / vectors as corresponding to some underlying features
> >> > of tastes. The rows say how much each user expresses those 10 tastes.
> >> > The matrix of right singular vectors on the other side tells you the
> >> > same thing about items. The diagonal matrix of singular values in the
> >> > middle also comes into play -- it's like a set of multipliers that say
> >> > how important each feature is. (This is why we cut out the singular
> >> > vectors / values that have the smallest singular values; it's like
> >> > removing the least-important features.) So really you'd have to stick
> >> > those values somewhere; Ted says it's conventional to put "half" of
> >> > each (their square roots) with each side if anything.
> >> >
> >> > I don't have as good a grasp on an intuition for the columns as
> >> > eigenvectors. They're also a set of basis vectors, and I had
> >> > understood them as like the "real" bases of the reduced feature space
> >> > expressed in user-item space. But I'd have to go back and think that
> >> > intuition through again since I can't really justify it from scratch
> >> > again in my head just now.
> >> >
> >> >
> >> > On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <[email protected]>
> >> wrote:
> >> > > Well, I think my problem may have had more to do with what I was
> >> calling
> >> > the
> >> > > eigenvector...  I was referring to the rows rather than the columns
> of
> >> U
> >> > and
> >> > > V.  While the columns may be characteristic of the overall matrix,
> the
> >> > rows
> >> > > are characteristic of the user or item (in that they are a rank
> reduced
> >> > > representation of that person or thing). I guess you could say I
> just
> >> had
> >> > to
> >> > > tilt my head to the side and change my perspective 90 degrees =)
> >> > >
> >> >
> >>
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to