For reference, look in DistributedRowMatrix#timesSquared(Vector), which is
contained on lines 227-247.  JobClient.runJob() is called only one time,
running
the TimesSquaredJob (a single map-reduce job).'

  -jake

On Mon, Dec 12, 2011 at 9:58 AM, Danny Bickson <[email protected]>wrote:

> K passes over the data - where in each pass you multiply once by A and once
> by A' I call 2K passes over the data.
>
> On Mon, Dec 12, 2011 at 7:48 PM, Jake Mannix <[email protected]>
> wrote:
>
> > On Mon, Dec 12, 2011 at 9:10 AM, Danny Bickson <[email protected]
> > >wrote:
> >
> > > I meant to write: twice in case of a rectangular matrix.
> > > By the way, if you want to have the two sides matrices [U,D,V]=svd(A)
> > > You will need to run Lanczos twice: once with A and another time with
> A'.
> > > So run time should be doubled.
> > >
> >
> > Neither of these statements are actually correct: for a rectangular
> matrix,
> > if you want the
> > top K singular vectors and values, you will make K passes over the data
> > (not 2K) each
> > one being an operation of (A'A)*v without ever computing A'A itself.
>  This
> > operation
> > "timesSquared(Vector)", for a matrix with row i having d_i nonzero
> entries,
> > will scale like
> > sum_i(d_i^2), but still only one pass over the data.
> >
> > Also, once you have run Lanczos over A, and gotten the matrix V out, you
> > can recover
> > U in O(1) map-reduce operations, by use of the identity: U = A * V * D^-1
> >
> >  -jake
> >
> >
> > >
> > > On Mon, Dec 12, 2011 at 7:08 PM, Danny Bickson <
> [email protected]
> > > >wrote:
> > >
> > > > In each Lanczos iteration you multiply by the matrix A (in case of a
> > > > square matrix)
> > > > or twice, by the matrix A' and A. Multiplication is linear in the
> > number
> > > > of non zero edges.
> > > > See http://en.wikipedia.org/wiki/Lanczos_algorithm
> > > > Finally a decomposition of a tridiagonal matrix T for extracting the
> > > > eigenvalues.
> > > > I think it is also linear in the number of iterations (since the size
> > of
> > > T
> > > > is number of iterations+1). Note that this code is not distributed
> > since
> > > it
> > > > can be efficiently done on a single node.
> > > > The last step is the Ritz transformation - a product of the
> > intermediate
> > > > vectors v
> > > > with the eigenvectors. This step may be heavy since those matrices
> are
> > > > typically dense.
> > > >
> > > > Best,
> > > >
> > > > DB
> > > >
> > > >
> > > > 2011/12/12 Fernando Fernández <[email protected]
> >
> > > >
> > > >> Hi all,
> > > >>
> > > >> This is a question for everybody, though it may be better answered
> by
> > > Jake
> > > >> Mannix. Do you guys know what is the complexity of the algorithm
> > > >> implemented in mahout for Lancos SVD? Linear, quadratic, etc..
> > > >>
> > > >>
> > > >> Thanks in advance!!
> > > >> Fernando.
> > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to