http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html
Do not fear demographics. Yes, some people rent movies with all-black casts, and other people rent movies with all-white casts. And the Walmarts in the SF East Bay have palettes full of Tyler Perry videos, while most of the SF Peninsula don't know his name. And sometimes a bunch of Army Rangers have a great time watching "Confessions of a Shopaholic" in a tent, 120o farenheit, in Qatar. On Fri, Aug 26, 2011 at 8:29 AM, Jeff Hansen <[email protected]> wrote: > Thanks for the math Ted -- that was very helpful. > > I've been using sparseMatrix() from libray(Matrix) -- largely based on your > response to somebody elses email. I've been playing with smaller matrices > mainly for my own learning purposes -- it's much easier to read through 200 > movies (most of which I've heard of) and get a gut feel, than 10,000 > movies. > It also means I don't have to shut down my session quite as often (I've > been using rstudio) when I run something over the wrong dimension(column vs > row). I was running into some limitations as well, but I think some of > those may have had more to do with my own typos and misunderstanding of > that > language. > > There's a second reason I've been avoiding the tail -- while working on > this > as a possible project to propose, I had a bit of a "duh" moment. Sometimes > it's important to realize the real world constraints. Picture a company > with very physical locations and very limited shelf space (say a > convenience > store or even a vending machine...) Netflix and Amazon can make a lot of > money in the long tail because they have centralized and or digital > inventories that service a large customer base. Imagine a situation where > you have a small physical distributed inventory with an equally distributed > customer base. In that situation it doesn't pay to chase after the tail -- > you just need to reduce your costs and target the head where you get more > volume with less items. So you really end up with a three tier market > segmentation -- one strategy works best for the head, another for the body, > and a third for the tail. > > As far as clusters go -- I really wasn't finding any clusters at the edges > of the data, but that could have more to do with not including the tail > (and > not normalizing appropriately for popularity). > > Incidentally, there was one amusingly cautionary anecdote I'd share -- I > don't remember the exact spread, but at one point I had two extremes that > included something like "the johnson family vacation", "my baby's daddy", > and "barbershop 2" on the one side, and then titles like "mean girls", "50 > first dates" and "along came polly" on the other side. You may have to > look > up those titles to see what I'm talking about, but I'd say the distinction > will be pretty clear when you do -- and if you were simply automating all > of > this and not reviewing it with common sense, you could end up offending > some > of your users... All I'm saying is there may be trends in the real world > that some people aren't comfortable having pointed out. > > On Fri, Aug 26, 2011 at 4:08 AM, Lance Norskog <[email protected]> wrote: > > > This is a little meditation on user v.s. item matrix density. The > > heavy users and heavy items can be subsampled, once they are > > identified. Hadoop's built-in sort does give a very simple > > "map-increase" way to do this sort. > > > > http://ultrawhizbang.blogspot.com/2011/08/sorted-recommender-data.html > > > > On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <[email protected]> > > wrote: > > > In matrix terms the binary user x item matrix maps a set of items to > > users > > > (A h = users who interacted with items in h). Similarly A' maps users > to > > > items. Thus A' (A h) is the classic "users who x'ed this also x'ed > that" > > > sort of operation. This can be rearranged to be (A'A) h. > > > > > > This is where the singular values come in. If > > > > > > A \approx U S V' > > > > > > and we know that U'U = V'V = I from the definition of the SVD. This > > means > > > that > > > > > > A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V' > > > > > > But V' transforms an item vector into the reduced dimensional space so > we > > > can view this as transforming the item vector into the reduced > > dimensional > > > space, scaling element-wise using S^2 and then transforming back to > item > > > space. > > > > > > As you noted, the first right singular vector corresponds to > popularity. > > It > > > is often not appropriate to just recommend popular things (since users > > have > > > other means of discovering them) so you might want to drop that > dimension > > > from the analysis. This can be done in the SVD results or it can be > done > > > using something like a log-likelihood ratio test so that we only model > > > anomalously large values in A'A. > > > > > > Btw, if you continue to use R for experimentation, you should be able > to > > > dramatically increase the size of the analysis you do by using sparse > > > matrices and a random projection algorithm. I was able to compute > SVD's > > of > > > matrices with millions of rows and columns and a few million non-zero > > > elements in a few seconds. Holler if you would like details about how > to > > do > > > this. > > > > > > On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <[email protected]> > wrote: > > > > > >> One thing I found interesting (but not particularly surprising) is > that > > the > > >> biggest singular value/vector was pretty much tied directly to volume. > > >> That > > >> makes sense because the best predictor of whether a given fields value > > was > > >> 1 > > >> was whether it belonged to a row with lots of 1s or a column with lots > > of > > >> 1s > > >> (I haven't quite figured out the best method to normalize the values > > with > > >> yet). When I plot the values of the largest singular vectors against > > the > > >> sum of values in the corresponding row or column, the correlation is > > very > > >> linear. That's not the case for any of the other singular vectors > > (which > > >> actually makes me wonder if just removing that first singular vector > > from > > >> the prediction might not be one of the best ways to normalize the > data). > > >> > > >> I understand what you're saying about each singular vector > corresponding > > to > > >> a feature though. Each left singular vector represents some abstract > > >> aspect > > >> of a movie and each right singular vector represents users leanings or > > >> inclinations with regards to that aspect of the movie. The singular > > value > > >> itself just seems to indicate how good a predictor the combination of > > one > > >> users inclination toward that aspect of a movie is for coming up with > > the > > >> actual value. The issue I mentioned above is that popularity of a > movie > > as > > >> well as how often a user watches movies tend to be the best predictors > > of > > >> whether a user has seen or will see a movie. > > >> > > >> I had been picturing this with the idea of one k dimensional space -- > > one > > >> where a users location corresponds to their ideal prototypical movies > > >> location. Not that there would be a movie right there, but there would > > be > > >> ones nearby, and the nearer they were, the more enjoyable they would > be. > > >> That's a naive model, but that doesn't mean it wouldn't work well > > enough. > > >> My problem is I don't quite get how to map the user space over to the > > item > > >> space. > > >> > > >> I think that may be what Lance is trying to describe in his last > > response, > > >> but I'm falling short on reconstructing the math from his description. > > >> > > >> I get the following > > >> > > >> U S V* = A > > >> U S = A V > > >> U = A V 1/S > > >> > > >> I think that last line is what Lance was describing. Of course my > > problem > > >> was bootstrapping in a user for whom I don't know A or V. > > >> > > >> I also think I may have missed a big step of the puzzle. For some > > reason I > > >> thought that just by loosening the rank, you could recompose the > Matrix > > A > > >> from the truncated SVD values/vectors and use the recomposed values > > >> themselves as the recommendation. I thought one of the ideas was that > > the > > >> recomposed matrix had less "noise" and could be a better > representation > > of > > >> the underlying nature of the matrix than the original matrix itself. > > But > > >> that may have just been wishful thinking... > > >> > > >> On Thu, Aug 25, 2011 at 4:50 PM, Sean Owen <[email protected]> wrote: > > >> > > >> > The 200x10 matrix is indeed a matrix of 10 singular vectors, which > are > > >> > eigenvectors of AA'. It's the columns, not rows, that are > > >> > eigenvectors. > > >> > > > >> > The rows do mean something. I think it's fair to interpret the 10 > > >> > singular values / vectors as corresponding to some underlying > features > > >> > of tastes. The rows say how much each user expresses those 10 > tastes. > > >> > The matrix of right singular vectors on the other side tells you the > > >> > same thing about items. The diagonal matrix of singular values in > the > > >> > middle also comes into play -- it's like a set of multipliers that > say > > >> > how important each feature is. (This is why we cut out the singular > > >> > vectors / values that have the smallest singular values; it's like > > >> > removing the least-important features.) So really you'd have to > stick > > >> > those values somewhere; Ted says it's conventional to put "half" of > > >> > each (their square roots) with each side if anything. > > >> > > > >> > I don't have as good a grasp on an intuition for the columns as > > >> > eigenvectors. They're also a set of basis vectors, and I had > > >> > understood them as like the "real" bases of the reduced feature > space > > >> > expressed in user-item space. But I'd have to go back and think that > > >> > intuition through again since I can't really justify it from scratch > > >> > again in my head just now. > > >> > > > >> > > > >> > On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <[email protected]> > > >> wrote: > > >> > > Well, I think my problem may have had more to do with what I was > > >> calling > > >> > the > > >> > > eigenvector... I was referring to the rows rather than the > columns > > of > > >> U > > >> > and > > >> > > V. While the columns may be characteristic of the overall matrix, > > the > > >> > rows > > >> > > are characteristic of the user or item (in that they are a rank > > reduced > > >> > > representation of that person or thing). I guess you could say I > > just > > >> had > > >> > to > > >> > > tilt my head to the side and change my perspective 90 degrees =) > > >> > > > > >> > > > >> > > > > > > > > > > > -- > > Lance Norskog > > [email protected] > > > -- Lance Norskog [email protected]
