For reference, the SSVD runs in a fixed number of map-reduce phases (Dmitriy can say exactly how many, but it is on the order of 3-4 without computing U or V and without power iterations). I think that the cost of each map-reduce is roughly O(N d^2).
On Mon, Dec 12, 2011 at 11:57 AM, Jake Mannix <[email protected]> wrote: > On Mon, Dec 12, 2011 at 10:46 AM, Danny Bickson <[email protected] > >wrote: > > > > > > Regarding the Ritz transformation - it depends on the sparsity of A and > the > > number of iterations whether the final product for computing U should be > > done using your technique of > > U ~= A V D^-1 or not. > > > > Almost everything in DistributedRowMatrix assumes that matrices are sparse. > If they are dense, many of these operations will blow up with OOM in > unexpected > places if the dimensions are at all large, but I don't know, I don't ever > run on > completely dense matrices. > > Mahout SVD is optimized for input matrix being bounded in numCols, small in > truncated rank, and sparse. numRows can be effectively unbounded, given > enough hardware. But numCols * truncatedRank must < RAM of the launching > JVM (not mappers/reducers). > > -jake > > > > On Mon, Dec 12, 2011 at 8:08 PM, Jake Mannix <[email protected]> > > wrote: > > > > > For reference, look in DistributedRowMatrix#timesSquared(Vector), which > > is > > > contained on lines 227-247. JobClient.runJob() is called only one > time, > > > running > > > the TimesSquaredJob (a single map-reduce job).' > > > > > > -jake > > > > > > On Mon, Dec 12, 2011 at 9:58 AM, Danny Bickson < > [email protected] > > > >wrote: > > > > > > > K passes over the data - where in each pass you multiply once by A > and > > > once > > > > by A' I call 2K passes over the data. > > > > > > > > On Mon, Dec 12, 2011 at 7:48 PM, Jake Mannix <[email protected]> > > > > wrote: > > > > > > > > > On Mon, Dec 12, 2011 at 9:10 AM, Danny Bickson < > > > [email protected] > > > > > >wrote: > > > > > > > > > > > I meant to write: twice in case of a rectangular matrix. > > > > > > By the way, if you want to have the two sides matrices > > [U,D,V]=svd(A) > > > > > > You will need to run Lanczos twice: once with A and another time > > with > > > > A'. > > > > > > So run time should be doubled. > > > > > > > > > > > > > > > > Neither of these statements are actually correct: for a rectangular > > > > matrix, > > > > > if you want the > > > > > top K singular vectors and values, you will make K passes over the > > data > > > > > (not 2K) each > > > > > one being an operation of (A'A)*v without ever computing A'A > itself. > > > > This > > > > > operation > > > > > "timesSquared(Vector)", for a matrix with row i having d_i nonzero > > > > entries, > > > > > will scale like > > > > > sum_i(d_i^2), but still only one pass over the data. > > > > > > > > > > Also, once you have run Lanczos over A, and gotten the matrix V > out, > > > you > > > > > can recover > > > > > U in O(1) map-reduce operations, by use of the identity: U = A * V > * > > > D^-1 > > > > > > > > > > -jake > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 12, 2011 at 7:08 PM, Danny Bickson < > > > > [email protected] > > > > > > >wrote: > > > > > > > > > > > > > In each Lanczos iteration you multiply by the matrix A (in case > > of > > > a > > > > > > > square matrix) > > > > > > > or twice, by the matrix A' and A. Multiplication is linear in > the > > > > > number > > > > > > > of non zero edges. > > > > > > > See http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > > > > Finally a decomposition of a tridiagonal matrix T for > extracting > > > the > > > > > > > eigenvalues. > > > > > > > I think it is also linear in the number of iterations (since > the > > > size > > > > > of > > > > > > T > > > > > > > is number of iterations+1). Note that this code is not > > distributed > > > > > since > > > > > > it > > > > > > > can be efficiently done on a single node. > > > > > > > The last step is the Ritz transformation - a product of the > > > > > intermediate > > > > > > > vectors v > > > > > > > with the eigenvectors. This step may be heavy since those > > matrices > > > > are > > > > > > > typically dense. > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > DB > > > > > > > > > > > > > > > > > > > > > 2011/12/12 Fernando Fernández < > > > [email protected] > > > > > > > > > > > > > > > > > > >> Hi all, > > > > > > >> > > > > > > >> This is a question for everybody, though it may be better > > answered > > > > by > > > > > > Jake > > > > > > >> Mannix. Do you guys know what is the complexity of the > algorithm > > > > > > >> implemented in mahout for Lancos SVD? Linear, quadratic, etc.. > > > > > > >> > > > > > > >> > > > > > > >> Thanks in advance!! > > > > > > >> Fernando. > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
