On Tue, Jul 2, 2013 at 1:52 PM, Chirag Lakhani <[email protected]> wrote:
> Hello, > > I am trying to use the Mahout/Java API to do PCA but I am confused about > the write order to do things. To start, I have a list of DenseVectors that > I am reading into the code and turning it into a distributed matrix in the > following form. > > DistributedRowMatrix m = new DistributedRowMatrix(input_vec, matrix_path, > num_rows,num_cols); > > When I run this code, I would have thought it would output the result into > the path called "matrix_path" so that I can then use something like > MatrixColumnMeansJob.run > to get mean. When I run this bit of code I get no output, is there > something else I should do or is there a better way to calculate the mean > for my file. > > > From what I understand about the SSVD CI code, you need to calculate the > column mean and then output it into a directory . No, you don't have to (although you have an _option_ to calculate and substitute one yourself if for some reason it is already known.) Default use assumes it would calculate it for you. > Is there a good way to do > this if I am starting from a file which is a sequence file of DenseVectors? > Yes. just don't specify --pcaOffset option. > > -- > > *Chirag Lakhani* > > Data Scientist > > Zaloni, Inc. | www.zaloni.com > > 633 Davis Dr., Suite 200 > > Durham, NC 27713 > e: [email protected] > p: 919.602.4965 x7020 >
