So how does the column mean get calculated if the --pcaOffset option is not specified? I would think you are just doing SVD at that point.
On Tue, Jul 2, 2013 at 5:52 PM, Dmitriy Lyubimov <[email protected]> wrote: > On Tue, Jul 2, 2013 at 1:52 PM, Chirag Lakhani <[email protected]> > wrote: > > > Hello, > > > > I am trying to use the Mahout/Java API to do PCA but I am confused about > > the write order to do things. To start, I have a list of DenseVectors > that > > I am reading into the code and turning it into a distributed matrix in > the > > following form. > > > > DistributedRowMatrix m = new DistributedRowMatrix(input_vec, > matrix_path, > > num_rows,num_cols); > > > > When I run this code, I would have thought it would output the result > into > > the path called "matrix_path" so that I can then use something like > > MatrixColumnMeansJob.run > > to get mean. When I run this bit of code I get no output, is there > > something else I should do or is there a better way to calculate the mean > > for my file. > > > > > > From what I understand about the SSVD CI code, you need to calculate the > > column mean and then output it into a directory > > . > > > No, you don't have to (although you have an _option_ to calculate and > substitute one yourself if for some reason it is already known.) Default > use assumes it would calculate it for you. > > > > > Is there a good way to do > > this if I am starting from a file which is a sequence file of > DenseVectors? > > > > Yes. just don't specify --pcaOffset option. > > > > > > -- > > > > *Chirag Lakhani* > > > > Data Scientist > > > > Zaloni, Inc. | www.zaloni.com > > > > 633 Davis Dr., Suite 200 > > > > Durham, NC 27713 > > e: [email protected] > > p: 919.602.4965 x7020 > > > -- *Chirag Lakhani* Data Scientist Zaloni, Inc. | www.zaloni.com 633 Davis Dr., Suite 200 Durham, NC 27713 e: [email protected] p: 919.602.4965 x7020
