By the way, please ignore my use of the term eigenvector -- I have a feeling I completely misused it. I've never quite understood the concept, but to me that truncated 10 value long vector that corresponds to a movie seems to be "characteristic" of it (which is what the language eigen was always intended to convey.
On Thu, Aug 25, 2011 at 3:40 PM, Jeff Hansen <[email protected]> wrote: > I've been playing around with this problem for the last week or so (or at > least this problem as I understood it based on your initial commentary > Lance) -- but purely in R using smaller data so I can 1. get my head wrapped > around the problem, and 2. get more familiar with R. > > To make the problem a little more tenable I limited my sample to 200 movies > and 10,000 users (taking the most rated movies from 2004 and 2005 based on > NF's dataset -- I know, I should really switch back to the grouplens > dataset...) I'm also only looking at binary data at the moment -- I treat > any rating above 3 as a movie you liked and anything 3 or below as the same > as not having rated the movie. > > So I take this 200 x 10,000 matrix of 1s and 0s and I run a truncated SVD > on it so that I can project it onto a 10 dimensional space. > > M<-initial data > s_m<- svd(M,10,10) > U<-s_m$u > S<-diag(s_m$d[1:10]) > V<-s_m$v > > So U is a 200 row by 10 column matrix -- each row represents the > eigenvector of a given movie, and each column represents one Lance's so > called axes of interest. So what I did next was spit out the top and bottom > n movie titles for each of these 10 dimensions. I found it was important to > show more than one movie title for each side of the dimensions, otherwise > the results might be somewhat misleading. > > I then went through the 10 dimensions and qualitatively answered for > myself whether I was strongly or weakly aligned in one direction, or not > aligned in anyway on this dimension. Personally I usually found I only felt > strongly aligned on 2 of the ten, and weakly aligned on another 2. > > I then normalized U across each of the ten dimensions and for each movie > added up it's z score in that dimension by my alignment in that dimension. > I then sorted the results and displayed the movie titles -- it was a pretty > accurate ranking of movies as I like them. > > scaled <- apply(U,2,scale) > me <- c(0,2,1,0,-1,1,0,0,0,0) > dim(me) <- c(10,1) > recommendations <- scaled %*% me > > I imagine few users would want to bother, but I can see where it would be a > relatively quick way to train a recommender. Here's the problem though -- I > can get it to work using the method I've described above, but I can't quite > figure out how to use it to generate an eigenvector for the user. For > existing users I can always generate predictions by matrix multiplying U %*% > S %*% t(V)[,user] and then sorting by the results. It would be nice to use > a consistent model. I can't quite see the math to generate an equivalent > equation though. > > On Wed, Aug 17, 2011 at 3:52 AM, Lance Norskog <[email protected]> wrote: > >> Sharpened: >> >> >> http://ultrawhizbang.blogspot.com/2011/08/singular-vectors-for-recommendations.html >> >> On Wed, Aug 10, 2011 at 11:53 PM, Sean Owen <[email protected]> wrote: >> > You may need to sharpen your terms / problem statement here : >> > >> > What is a geometric value -- just mean a continuous real value? >> > So these are item-feature vectors? >> > >> > The middle bit of the output of an SVD is not a singular vector -- it's >> a >> > diagonal matrix containing singular values on the diagonal. >> > The left matrix contains singular vectors, which are not eigenvectors >> except >> > in very specific cases of the original matrix. >> > >> > Singular vectors are the columns of the left matrix, not rows, whereas >> items >> > corresponds to its rows. What do you mean about relating them? >> > What do you mean by the "hot spot" you are trying to find? >> > A vector does not express two end-points, no. You could think of (X,Y) >> as >> > corresponding to a point in 2-space, or could think of it as a ray from >> > (0,0) to (X,Y), but you could think of it as (100,200) to (100+X,200+Y) >> just >> > as well. There are not two point implied by anything here. >> > >> > >> > How do you get points from the original item-feature space into the >> > transformed, reduced space? While I think this is an imprecise answer: >> if A >> > = U Sigma V^T then you can think of (Sigma V^T) as like the >> change-of-basis >> > transformation that does this. >> > >> > >> > On Wed, Aug 10, 2011 at 10:54 AM, Lance Norskog <[email protected]> >> wrote: >> > >> >> Zeroing in on the topic: >> >> >> >> I have: >> >> 1) a set of raw input vectors of a given length, one for each item. >> >> Each value in the vectors are geometric, not bag-of-words or other. >> >> The matrix is [# items , # dimensions]. >> >> 2) An SVD of same: >> >> left matrix of [ # items, #d features per item] * singular >> >> vector[# features] * right matrix of [#dimensions features per >> >> dimension, #dimensions]. >> >> 3) The first few columns of the left matrix are interesting singular >> >> eigenvectors. >> >> >> >> I would like to: >> >> 1) relate the singular vectors to the item vectors, such that they >> >> create points in the "hot spots" of the item vectors. >> >> 2) find the inverses: a singular vector has two endpoints, and both >> >> represent "hot spots" in the item space. >> >> >> >> Given the first 3 singular vectors, there are 6 "hot spots" in the >> >> item vectors, one for each end of the vector. What transforms are >> >> needed to get the item vectors and the singular vector endpoints in >> >> the same space? I'm not finding the exact sequence. >> >> >> >> A use case for this is a new user. It gives a quick assessment by >> >> asking where the user is on the few common axes of items: >> >> "Transformers 3: The Stupiding" v.s. "Crazy Bride Wedding Love >> >> Planner"? >> >> >> >> -- >> >> Lance Norskog >> >> [email protected] >> >> >> > >> >> >> >> -- >> Lance Norskog >> [email protected] >> > >
