Yeah OK, that makes sense. That's a pretty helpful paper. I can get through the Lanczos algorithm without much trouble, I think. The piece I'm looking for pointers on at the moment is the eigen decomposition of the tridiagonal matrix.
Jake am I right that this is done in memory right now? I am sure your previous comment actually answers this next question, but I haven't connected the dots yet. If we're getting eigenvectors of AT * A, and that matrix is huge, then isn't the tridiagonal matrix still huge -- albeit it has only 3m entries, not m^2. It seems like the code just calls to EigenDecomposer which treats it as a dense m x m matrix. (Is it worth me figuring out the QR decomposition enough to build a distributed version of it?) On Sun, Nov 6, 2011 at 7:25 PM, Sebastian Schelter <[email protected]> wrote: > On 06.11.2011 18:52, Sean Owen wrote: >> Following up on this very old thread. >> >> I understood all this bit, thanks, that greatly clarified. >> >> You multiply a new user vector by V to project it into the new >> "pseudo-item", reduced-dimension space. >> And to get back to real items, you multiply by V's inverse, which is >> its transpose. >> And so you are really multiplying the user vector by V VT, which is >> not a no-op, since those are truncated matrices and aren't actually >> exact inverses (?) >> >> The original paper talks about cosine similarities between users or >> items in the reduced-dimension space, but, can anyone shine light on >> the point of that? > > I think this interpretation is coming from LSI where you project a query > onto the document feature space and use the cosine to find the most > similar documents (which can be done by simple matrix vector > mulitplication as the singular vectors are orthonormal and computing dot > products with the projected query is therefore equal to computing cosines). > > Something similar could be done to find most similar items or users, the > bad thing is that AFAIK you have to look at every other user or item as > U and V are dense. > > Maybe this paper can help to shine light on that: > > "Using Linear Algebra for Intelligent Information Retrieval" > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3138&rep=rep1&type=pdf > > From the paper also, it seems like they say the >> predictions are just computed as vector products as above. >> >> >> Finally, separately, I'm trying to understand the Lanczos method as >> part of computing an SVD. Lanczos operates on a real symmetric matrix >> right? And am I right that it comes into play when you are computing >> and SVD... >> >> A = U * S * VT >> >> ... because U is actually the eigenvectors of (symmetric) A*AT and V >> is the eigenvectors of AT*A? And so Lanczos is used to answer those >> questions to complete the SVD >> >> On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[email protected]> wrote: >>> You are correct. The paper has an appalling treatment of the folding in >>> approach. >>> >>> In fact, the procedure is dead simple. >>> >>> The basic idea is to leave the coordinate system derived in the original SVD >>> intact and simply project the new users into that space. >>> >>> The easiest way to see what is happening is to start again with the original >>> rating matrix A as decomposed: >>> >>> A = U S V' >>> >>> where A is users x items. If we multiply on the right by V, we get >>> >>> A V = U S V' V = U S >>> >>> (because V' V = I, by definition). This result is (users x items) x (items >>> x k) = users x k, that is, it gives a k dimensional vector for each user. >>> Similarly, multiplication on the left by U' gives a k x items matrix which, >>> when transposed gives a k dimensional vector for each item. >>> >>> This implies that if we augment U with new user row vectors U_new, we should >>> be able to simply compute new k-dimensional vectors for the new users and >>> adjoin these new vectors to the previous vectors. Concisely put, >>> >>> ( A ) ( A V ) >>> ( ) V = ( ) >>> ( A_new ) ( A_new V ) >>> >>> This isn't really magical. It just says that we can compute new user >>> vectors at any time by multiplying the new users' ratings by V. >>> >>> The diagram in figure one is hideously confusing because it looks like a >>> picture of some kind of multiplication whereas it is really depicting some >>> odd kind of flow diagram. >>> >>> Does this solve the problem? >>> >>> On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[email protected]> wrote: >>> >>>> Section 3 is hard to understand. >>>> >>>> - Ak and P are defined, but not used later >>>> - Definition of P has UTk x Nu as a computation. UTk is a k x m >>>> matrix, and Nu is "t" x 1. t is not defined. >>>> - This only makes sense if t = m. But m is the number of users, and Nu >>>> is a user vector, so should have a number of elements equal to n, the >>>> number of items >>>> - Sk * VTk is described as a k x "d" matrix but d is undefined >>>> - The diagram suggests that VTk are multiplied by all the Nu, which >>>> makes more sense -- but only if Nu are multiplied by VTk, not the >>>> other way. And the diagram depicts neither of those. >>>> - Conceptually I would understand Nu x VTk, but then P is defined by >>>> an additional product with Uk >>>> >>>> In short... what? >>>> >>>> >>>> On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning <[email protected]> wrote: >>>>> Fire away. >>>>> >>>>> On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[email protected]> wrote: >>>>> >>>>>> Is anyone out there familiar enough with this to a) discuss this paper >>>>>> with me or b) point me to another writeup on the approach? >>>>>> >>>>> >>>> >>> > >
