On Mon, Dec 12, 2011 at 10:46 AM, Danny Bickson <[email protected]>wrote:
>
>
> Regarding the Ritz transformation - it depends on the sparsity of A and the
> number of iterations whether the final product for computing U should be
> done using your technique of
> U ~= A V D^-1 or not.
>

Almost everything in DistributedRowMatrix assumes that matrices are sparse.
If they are dense, many of these operations will blow up with OOM in
unexpected
places if the dimensions are at all large, but I don't know, I don't ever
run on
completely dense matrices.

Mahout SVD is optimized for input matrix being bounded in numCols, small in
truncated rank, and sparse.  numRows can be effectively unbounded, given
enough hardware.  But numCols * truncatedRank must < RAM of the launching
JVM (not mappers/reducers).

  -jake


> On Mon, Dec 12, 2011 at 8:08 PM, Jake Mannix <[email protected]>
> wrote:
>
> > For reference, look in DistributedRowMatrix#timesSquared(Vector), which
> is
> > contained on lines 227-247.  JobClient.runJob() is called only one time,
> > running
> > the TimesSquaredJob (a single map-reduce job).'
> >
> >  -jake
> >
> > On Mon, Dec 12, 2011 at 9:58 AM, Danny Bickson <[email protected]
> > >wrote:
> >
> > > K passes over the data - where in each pass you multiply once by A and
> > once
> > > by A' I call 2K passes over the data.
> > >
> > > On Mon, Dec 12, 2011 at 7:48 PM, Jake Mannix <[email protected]>
> > > wrote:
> > >
> > > > On Mon, Dec 12, 2011 at 9:10 AM, Danny Bickson <
> > [email protected]
> > > > >wrote:
> > > >
> > > > > I meant to write: twice in case of a rectangular matrix.
> > > > > By the way, if you want to have the two sides matrices
> [U,D,V]=svd(A)
> > > > > You will need to run Lanczos twice: once with A and another time
> with
> > > A'.
> > > > > So run time should be doubled.
> > > > >
> > > >
> > > > Neither of these statements are actually correct: for a rectangular
> > > matrix,
> > > > if you want the
> > > > top K singular vectors and values, you will make K passes over the
> data
> > > > (not 2K) each
> > > > one being an operation of (A'A)*v without ever computing A'A itself.
> > >  This
> > > > operation
> > > > "timesSquared(Vector)", for a matrix with row i having d_i nonzero
> > > entries,
> > > > will scale like
> > > > sum_i(d_i^2), but still only one pass over the data.
> > > >
> > > > Also, once you have run Lanczos over A, and gotten the matrix V out,
> > you
> > > > can recover
> > > > U in O(1) map-reduce operations, by use of the identity: U = A * V *
> > D^-1
> > > >
> > > >  -jake
> > > >
> > > >
> > > > >
> > > > > On Mon, Dec 12, 2011 at 7:08 PM, Danny Bickson <
> > > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > In each Lanczos iteration you multiply by the matrix A (in case
> of
> > a
> > > > > > square matrix)
> > > > > > or twice, by the matrix A' and A. Multiplication is linear in the
> > > > number
> > > > > > of non zero edges.
> > > > > > See http://en.wikipedia.org/wiki/Lanczos_algorithm
> > > > > > Finally a decomposition of a tridiagonal matrix T for extracting
> > the
> > > > > > eigenvalues.
> > > > > > I think it is also linear in the number of iterations (since the
> > size
> > > > of
> > > > > T
> > > > > > is number of iterations+1). Note that this code is not
> distributed
> > > > since
> > > > > it
> > > > > > can be efficiently done on a single node.
> > > > > > The last step is the Ritz transformation - a product of the
> > > > intermediate
> > > > > > vectors v
> > > > > > with the eigenvectors. This step may be heavy since those
> matrices
> > > are
> > > > > > typically dense.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > DB
> > > > > >
> > > > > >
> > > > > > 2011/12/12 Fernando Fernández <
> > [email protected]
> > > >
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> This is a question for everybody, though it may be better
> answered
> > > by
> > > > > Jake
> > > > > >> Mannix. Do you guys know what is the complexity of the algorithm
> > > > > >> implemented in mahout for Lancos SVD? Linear, quadratic, etc..
> > > > > >>
> > > > > >>
> > > > > >> Thanks in advance!!
> > > > > >> Fernando.
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to