Also don't forget that when you compute timeSquared() it results in two map reduce phases where in each of them the matrix A is fully read from disk. So I would not call it a single pass.
On Mon, Dec 12, 2011 at 7:58 PM, Danny Bickson <[email protected]>wrote: > K passes over the data - where in each pass you multiply once by A and > once by A' I call 2K passes over the data. > > > On Mon, Dec 12, 2011 at 7:48 PM, Jake Mannix <[email protected]>wrote: > >> On Mon, Dec 12, 2011 at 9:10 AM, Danny Bickson <[email protected] >> >wrote: >> >> > I meant to write: twice in case of a rectangular matrix. >> > By the way, if you want to have the two sides matrices [U,D,V]=svd(A) >> > You will need to run Lanczos twice: once with A and another time with >> A'. >> > So run time should be doubled. >> > >> >> Neither of these statements are actually correct: for a rectangular >> matrix, >> if you want the >> top K singular vectors and values, you will make K passes over the data >> (not 2K) each >> one being an operation of (A'A)*v without ever computing A'A itself. This >> operation >> "timesSquared(Vector)", for a matrix with row i having d_i nonzero >> entries, >> will scale like >> sum_i(d_i^2), but still only one pass over the data. >> >> Also, once you have run Lanczos over A, and gotten the matrix V out, you >> can recover >> U in O(1) map-reduce operations, by use of the identity: U = A * V * D^-1 >> >> -jake >> >> >> > >> > On Mon, Dec 12, 2011 at 7:08 PM, Danny Bickson <[email protected] >> > >wrote: >> > >> > > In each Lanczos iteration you multiply by the matrix A (in case of a >> > > square matrix) >> > > or twice, by the matrix A' and A. Multiplication is linear in the >> number >> > > of non zero edges. >> > > See http://en.wikipedia.org/wiki/Lanczos_algorithm >> > > Finally a decomposition of a tridiagonal matrix T for extracting the >> > > eigenvalues. >> > > I think it is also linear in the number of iterations (since the size >> of >> > T >> > > is number of iterations+1). Note that this code is not distributed >> since >> > it >> > > can be efficiently done on a single node. >> > > The last step is the Ritz transformation - a product of the >> intermediate >> > > vectors v >> > > with the eigenvectors. This step may be heavy since those matrices are >> > > typically dense. >> > > >> > > Best, >> > > >> > > DB >> > > >> > > >> > > 2011/12/12 Fernando Fernández <[email protected]> >> > > >> > >> Hi all, >> > >> >> > >> This is a question for everybody, though it may be better answered by >> > Jake >> > >> Mannix. Do you guys know what is the complexity of the algorithm >> > >> implemented in mahout for Lancos SVD? Linear, quadratic, etc.. >> > >> >> > >> >> > >> Thanks in advance!! >> > >> Fernando. >> > >> >> > > >> > > >> > >> > >
