On Mon, Dec 12, 2011 at 10:00 AM, Danny Bickson <[email protected]>wrote:
> Also don't forget that when you compute timeSquared() it results in two map > reduce phases where in each of them the matrix A is fully read from disk. > So I would not call it a single pass. > This is not true. timesSquared() reads the data in *one* map-reduce pass. > > On Mon, Dec 12, 2011 at 7:58 PM, Danny Bickson <[email protected] > >wrote: > > > K passes over the data - where in each pass you multiply once by A and > > once by A' I call 2K passes over the data. > > > > > > On Mon, Dec 12, 2011 at 7:48 PM, Jake Mannix <[email protected] > >wrote: > > > >> On Mon, Dec 12, 2011 at 9:10 AM, Danny Bickson <[email protected] > >> >wrote: > >> > >> > I meant to write: twice in case of a rectangular matrix. > >> > By the way, if you want to have the two sides matrices [U,D,V]=svd(A) > >> > You will need to run Lanczos twice: once with A and another time with > >> A'. > >> > So run time should be doubled. > >> > > >> > >> Neither of these statements are actually correct: for a rectangular > >> matrix, > >> if you want the > >> top K singular vectors and values, you will make K passes over the data > >> (not 2K) each > >> one being an operation of (A'A)*v without ever computing A'A itself. > This > >> operation > >> "timesSquared(Vector)", for a matrix with row i having d_i nonzero > >> entries, > >> will scale like > >> sum_i(d_i^2), but still only one pass over the data. > >> > >> Also, once you have run Lanczos over A, and gotten the matrix V out, you > >> can recover > >> U in O(1) map-reduce operations, by use of the identity: U = A * V * > D^-1 > >> > >> -jake > >> > >> > >> > > >> > On Mon, Dec 12, 2011 at 7:08 PM, Danny Bickson < > [email protected] > >> > >wrote: > >> > > >> > > In each Lanczos iteration you multiply by the matrix A (in case of a > >> > > square matrix) > >> > > or twice, by the matrix A' and A. Multiplication is linear in the > >> number > >> > > of non zero edges. > >> > > See http://en.wikipedia.org/wiki/Lanczos_algorithm > >> > > Finally a decomposition of a tridiagonal matrix T for extracting the > >> > > eigenvalues. > >> > > I think it is also linear in the number of iterations (since the > size > >> of > >> > T > >> > > is number of iterations+1). Note that this code is not distributed > >> since > >> > it > >> > > can be efficiently done on a single node. > >> > > The last step is the Ritz transformation - a product of the > >> intermediate > >> > > vectors v > >> > > with the eigenvectors. This step may be heavy since those matrices > are > >> > > typically dense. > >> > > > >> > > Best, > >> > > > >> > > DB > >> > > > >> > > > >> > > 2011/12/12 Fernando Fernández < > [email protected]> > >> > > > >> > >> Hi all, > >> > >> > >> > >> This is a question for everybody, though it may be better answered > by > >> > Jake > >> > >> Mannix. Do you guys know what is the complexity of the algorithm > >> > >> implemented in mahout for Lancos SVD? Linear, quadratic, etc.. > >> > >> > >> > >> > >> > >> Thanks in advance!! > >> > >> Fernando. > >> > >> > >> > > > >> > > > >> > > >> > > > > >
