Re: Understanding the SVD recommender

Ted Dunning Sun, 06 Nov 2011 15:27:32 -0800

The tridiagonal is much smaller than you would need if you wanted all the
eigenvalues.  Since you only want a small number, you only have a
tri-diagonal matrix that is some multiple of that size.  In-memory
decomposition makes total sense.


Regarding QR decomposition, Dmitriy has already built a parallel version.
 A large QR is required at one point in the SSVD algorithm.  I have shown a
way to avoid this large QR using a much smaller Cholesky decomposition, but
it isn't entirely clear that this is a net win.  It is a huge win with the
sequential version.

On Sun, Nov 6, 2011 at 3:17 PM, Sean Owen <[email protected]> wrote:

> Yeah OK, that makes sense. That's a pretty helpful paper.
>
> I can get through the Lanczos algorithm without much trouble, I think.
> The piece I'm looking for pointers on at the moment is the eigen
> decomposition of the tridiagonal matrix.
>
> Jake am I right that this is done in memory right now? I am sure your
> previous comment actually answers this next question, but I haven't
> connected the dots yet. If we're getting eigenvectors of AT * A, and
> that matrix is huge, then isn't the tridiagonal matrix still huge --
> albeit it has only 3m entries, not m^2. It seems like the code just
> calls to EigenDecomposer which treats it as a dense m x m matrix.
>
> (Is it worth me figuring out the QR decomposition enough to build a
> distributed version of it?)
>
> On Sun, Nov 6, 2011 at 7:25 PM, Sebastian Schelter <[email protected]> wrote:
> > On 06.11.2011 18:52, Sean Owen wrote:
> >> Following up on this very old thread.
> >>
> >> I understood all this bit, thanks, that greatly clarified.
> >>
> >> You multiply a new user vector by V to project it into the new
> >> "pseudo-item", reduced-dimension space.
> >> And to get back to real items, you multiply by V's inverse, which is
> >> its transpose.
> >> And so you are really multiplying the user vector by V VT, which is
> >> not a no-op, since those are truncated matrices and aren't actually
> >> exact inverses (?)
> >>
> >> The original paper talks about cosine similarities between users or
> >> items in the reduced-dimension space, but, can anyone shine light on
> >> the point of that?
> >
> > I think this interpretation is coming from LSI where you project a query
> > onto the document feature space and use the cosine to find the most
> > similar documents (which can be done by simple matrix vector
> > mulitplication as the singular vectors are orthonormal and computing dot
> > products with the projected query is therefore equal to computing
> cosines).
> >
> > Something similar could be done to find most similar items or users, the
> > bad thing is that AFAIK you have to look at every other user or item as
> > U and V are dense.
> >
> > Maybe this paper can help to shine light on that:
> >
> > "Using Linear Algebra for Intelligent Information Retrieval"
> >
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3138&rep=rep1&type=pdf
> >
> > From the paper also, it seems like they say the
> >> predictions are just computed as vector products as above.
> >>
> >>
> >> Finally, separately, I'm trying to understand the Lanczos method as
> >> part of computing an SVD. Lanczos operates on a real symmetric matrix
> >> right? And am I right that it comes into play when you are computing
> >> and SVD...
> >>
> >> A = U * S * VT
> >>
> >> ... because U is actually the eigenvectors of (symmetric) A*AT and V
> >> is the eigenvectors of AT*A? And so Lanczos is used to answer those
> >> questions to complete the SVD
> >>
> >> On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[email protected]>
> wrote:
> >>> You are correct.  The paper has an appalling treatment of the folding
> in
> >>> approach.
> >>>
> >>> In fact, the procedure is dead simple.
> >>>
> >>> The basic idea is to leave the coordinate system derived in the
> original SVD
> >>> intact and simply project the new users into that space.
> >>>
> >>> The easiest way to see what is happening is to start again with the
> original
> >>> rating matrix A as decomposed:
> >>>
> >>>   A = U S V'
> >>>
> >>> where A is users x items.  If we multiply on the right by V, we get
> >>>
> >>>   A V = U S V' V = U S
> >>>
> >>> (because V' V = I, by definition).  This result is (users x items) x
> (items
> >>> x k) = users x k, that is, it gives a k dimensional vector for each
> user.
> >>>  Similarly, multiplication on the left by U' gives a k x items matrix
> which,
> >>> when transposed gives a k dimensional vector for each item.
> >>>
> >>> This implies that if we augment U with new user row vectors U_new, we
> should
> >>> be able to simply compute new k-dimensional vectors for the new users
> and
> >>> adjoin these new vectors to the previous vectors.  Concisely put,
> >>>
> >>> ( A     )     ( A V     )
> >>> (       ) V = (         )
> >>> ( A_new )     ( A_new V )
> >>>
> >>> This isn't really magical.  It just says that we can compute new user
> >>> vectors at any time by multiplying the new users' ratings by V.
> >>>
> >>> The diagram in figure one is hideously confusing because it looks like
> a
> >>> picture of some kind of multiplication whereas it is really depicting
> some
> >>> odd kind of flow diagram.
> >>>
> >>> Does this solve the problem?
> >>>
> >>> On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[email protected]> wrote:
> >>>
> >>>> Section 3 is hard to understand.
> >>>>
> >>>> - Ak and P are defined, but not used later
> >>>> - Definition of P has UTk x Nu as a computation. UTk is a k x m
> >>>> matrix, and Nu is "t" x 1. t is not defined.
> >>>> - This only makes sense if t = m. But m is the number of users, and Nu
> >>>> is a user vector, so should have a number of elements equal to n, the
> >>>> number of items
> >>>> - Sk * VTk is described as a k x "d" matrix but d is undefined
> >>>> - The diagram suggests that VTk are multiplied by all the Nu, which
> >>>> makes more sense -- but only if Nu are multiplied by VTk, not the
> >>>> other way. And the diagram depicts neither of those.
> >>>> - Conceptually I would understand Nu x VTk, but then P is defined by
> >>>> an additional product with Uk
> >>>>
> >>>> In short... what?
> >>>>
> >>>>
> >>>> On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning <[email protected]>
> wrote:
> >>>>> Fire away.
> >>>>>
> >>>>> On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[email protected]> wrote:
> >>>>>
> >>>>>> Is anyone out there familiar enough with this to a) discuss this
> paper
> >>>>>> with me or b) point me to another writeup on the approach?
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
> >
>

Re: Understanding the SVD recommender

Reply via email to