This is a little meditation on user v.s. item matrix density. The heavy users and heavy items can be subsampled, once they are identified. Hadoop's built-in sort does give a very simple "map-increase" way to do this sort.
http://ultrawhizbang.blogspot.com/2011/08/sorted-recommender-data.html On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <[email protected]> wrote: > In matrix terms the binary user x item matrix maps a set of items to users > (A h = users who interacted with items in h). Similarly A' maps users to > items. Thus A' (A h) is the classic "users who x'ed this also x'ed that" > sort of operation. This can be rearranged to be (A'A) h. > > This is where the singular values come in. If > > A \approx U S V' > > and we know that U'U = V'V = I from the definition of the SVD. This means > that > > A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V' > > But V' transforms an item vector into the reduced dimensional space so we > can view this as transforming the item vector into the reduced dimensional > space, scaling element-wise using S^2 and then transforming back to item > space. > > As you noted, the first right singular vector corresponds to popularity. It > is often not appropriate to just recommend popular things (since users have > other means of discovering them) so you might want to drop that dimension > from the analysis. This can be done in the SVD results or it can be done > using something like a log-likelihood ratio test so that we only model > anomalously large values in A'A. > > Btw, if you continue to use R for experimentation, you should be able to > dramatically increase the size of the analysis you do by using sparse > matrices and a random projection algorithm. I was able to compute SVD's of > matrices with millions of rows and columns and a few million non-zero > elements in a few seconds. Holler if you would like details about how to do > this. > > On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <[email protected]> wrote: > >> One thing I found interesting (but not particularly surprising) is that the >> biggest singular value/vector was pretty much tied directly to volume. >> That >> makes sense because the best predictor of whether a given fields value was >> 1 >> was whether it belonged to a row with lots of 1s or a column with lots of >> 1s >> (I haven't quite figured out the best method to normalize the values with >> yet). When I plot the values of the largest singular vectors against the >> sum of values in the corresponding row or column, the correlation is very >> linear. That's not the case for any of the other singular vectors (which >> actually makes me wonder if just removing that first singular vector from >> the prediction might not be one of the best ways to normalize the data). >> >> I understand what you're saying about each singular vector corresponding to >> a feature though. Each left singular vector represents some abstract >> aspect >> of a movie and each right singular vector represents users leanings or >> inclinations with regards to that aspect of the movie. The singular value >> itself just seems to indicate how good a predictor the combination of one >> users inclination toward that aspect of a movie is for coming up with the >> actual value. The issue I mentioned above is that popularity of a movie as >> well as how often a user watches movies tend to be the best predictors of >> whether a user has seen or will see a movie. >> >> I had been picturing this with the idea of one k dimensional space -- one >> where a users location corresponds to their ideal prototypical movies >> location. Not that there would be a movie right there, but there would be >> ones nearby, and the nearer they were, the more enjoyable they would be. >> That's a naive model, but that doesn't mean it wouldn't work well enough. >> My problem is I don't quite get how to map the user space over to the item >> space. >> >> I think that may be what Lance is trying to describe in his last response, >> but I'm falling short on reconstructing the math from his description. >> >> I get the following >> >> U S V* = A >> U S = A V >> U = A V 1/S >> >> I think that last line is what Lance was describing. Of course my problem >> was bootstrapping in a user for whom I don't know A or V. >> >> I also think I may have missed a big step of the puzzle. For some reason I >> thought that just by loosening the rank, you could recompose the Matrix A >> from the truncated SVD values/vectors and use the recomposed values >> themselves as the recommendation. I thought one of the ideas was that the >> recomposed matrix had less "noise" and could be a better representation of >> the underlying nature of the matrix than the original matrix itself. But >> that may have just been wishful thinking... >> >> On Thu, Aug 25, 2011 at 4:50 PM, Sean Owen <[email protected]> wrote: >> >> > The 200x10 matrix is indeed a matrix of 10 singular vectors, which are >> > eigenvectors of AA'. It's the columns, not rows, that are >> > eigenvectors. >> > >> > The rows do mean something. I think it's fair to interpret the 10 >> > singular values / vectors as corresponding to some underlying features >> > of tastes. The rows say how much each user expresses those 10 tastes. >> > The matrix of right singular vectors on the other side tells you the >> > same thing about items. The diagonal matrix of singular values in the >> > middle also comes into play -- it's like a set of multipliers that say >> > how important each feature is. (This is why we cut out the singular >> > vectors / values that have the smallest singular values; it's like >> > removing the least-important features.) So really you'd have to stick >> > those values somewhere; Ted says it's conventional to put "half" of >> > each (their square roots) with each side if anything. >> > >> > I don't have as good a grasp on an intuition for the columns as >> > eigenvectors. They're also a set of basis vectors, and I had >> > understood them as like the "real" bases of the reduced feature space >> > expressed in user-item space. But I'd have to go back and think that >> > intuition through again since I can't really justify it from scratch >> > again in my head just now. >> > >> > >> > On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <[email protected]> >> wrote: >> > > Well, I think my problem may have had more to do with what I was >> calling >> > the >> > > eigenvector... I was referring to the rows rather than the columns of >> U >> > and >> > > V. While the columns may be characteristic of the overall matrix, the >> > rows >> > > are characteristic of the user or item (in that they are a rank reduced >> > > representation of that person or thing). I guess you could say I just >> had >> > to >> > > tilt my head to the side and change my perspective 90 degrees =) >> > > >> > >> > -- Lance Norskog [email protected]
