On Thu, Nov 17, 2011 at 5:26 AM, Sean Owen <[email protected]> wrote:
> One more question. OK, so I use Lanczos to find V_k by finding the top k
> eigenvectors of AT * A. A is sparse. But isn't AT * A dense, then? Is that
> just how it is?
>
A'A and AA' are both dense, yes, but you never compute them. There are
several ways to avoid computing them:
1) you do two Matrix - Vector multiplications per Lanczos iteration:
first
take (sparse) A and multiply it by a vector v. Then take (also sparse) A'
and multiply it by the now-computed u = Av. A'u = A'(A v) = (A'A)v
2) you rewrite the double summation involved in the definition of
(A'A)v to notice that it can be done in one pass over A:
(A'A)v = vectorSum_i(a_i' (a_i * v))
where a_i is the i'th row of A, and the Map operation is everything in
the outer parenthesis: take vector v (which is living in memory) and
dot it with a_i, then take emit a_i' scaled by this value. Reduce is
just vector summation.
We take approach 2), and is why we have the horrible method
"timesSquared(Vector)" as a special optimization method in
the VectorIterable interface.
This trick is essentially how we compute sparse matrix multiplication
as well.
> This will be my last basic question for the week:
>
> I understand that A ~= U_k * S_k * V_kT. Let's call the product on the
> right A_k.
>
> A_k = A * V_k * V_kT right?
>
yeah, I guess so.
> And then A_k is just my complete prediction matrix right? It's dense so
> it's not formed all at once. But all I need are V_k and its transpose to do
> the work.
>
Correct. With V_k and V_k' in memory (if they're small enough), you can
compute a row of A_k on the fly by doing two matrix-vector multiplications.
> I somehow thought it was more complicated than this -- having come back to
> this I keep wondering if I forget something here.
>
No, I think this is it. You might need S^-1 somewhere in there that I'm
forgetting, but this looks right.
-jake
>
>
> On Sun, Nov 6, 2011 at 6:08 PM, Jake Mannix <[email protected]> wrote:
>
> > Re: Lanczos, yes, it operates by finding V as you describe. The user is
> > required to do more work to recapture U. Practical reason is that the
> > assumption is numCols(A) = numFeatures which is much less than
> numRows(A) =
> > numTrainingSamples
> >
> > On Nov 6, 2011 9:52 AM, "Sean Owen" <[email protected]> wrote:
> >
> > Following up on this very old thread.
> >
> > I understood all this bit, thanks, that greatly clarified.
> >
> > You multiply a new user vector by V to project it into the new
> > "pseudo-item", reduced-dimension space.
> > And to get back to real items, you multiply by V's inverse, which is
> > its transpose.
> > And so you are really multiplying the user vector by V VT, which is
> > not a no-op, since those are truncated matrices and aren't actually
> > exact inverses (?)
> >
> > The original paper talks about cosine similarities between users or
> > items in the reduced-dimension space, but, can anyone shine light on
> > the point of that? From the paper also, it seems like they say the
> > predictions are just computed as vector products as above.
> >
> >
> > Finally, separately, I'm trying to understand the Lanczos method as
> > part of computing an SVD. Lanczos operates on a real symmetric matrix
> > right? And am I right that it comes into play when you are computing
> > and SVD...
> >
> > A = U * S * VT
> >
> > ... because U is actually the eigenvectors of (symmetric) A*AT and V
> > is the eigenvectors of AT*A? And so Lanczos is used to answer those
> > questions to complete the SVD?
> >
> > On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[email protected]>
> wrote:
> > > You are correct. The...
> >
>