On Thu, May 5, 2011 at 12:22 PM, Jake Mannix <[email protected]> wrote:
> On Thu, May 5, 2011 at 8:24 AM, Vckay <[email protected]> wrote: > > > So I am trying to build PCA. I was recommended in a previous thread that > it > > was better that my data is available at the start as a distributed row > > matrix. The work flow (already posted in a previous thread) would be: > > 1. Get the data into distributed row matrix format. > > 2. Compute empirical mean vector. > > > > Note that as we've mentioned in other threads, this step: > > > I know what you guys were saying in the previous thread. I believe I did mention that since I would be working with image data that is overwhelming dense meaning that even if I did do a subtract from mean, I would essentially get a sparse matrix. In fact, running SVD separately on the matrix and the low rank matrix (e*m') would probably in this case be a bad idea because you would end up having to run the code on a dense matrix.
