Thanks, the CCLI code seems to help a great deal. I am still confused about the distributed row format. When I have used the command line in Mahout I had a sequence file of dense vectors and that seemed to be fine. Is it possible to use that as an input or do I need to take that file and make it into a distributed row matrix type?
On Fri, Apr 12, 2013 at 1:19 PM, Dmitriy Lyubimov <[email protected]> wrote: > On Fri, Apr 12, 2013 at 8:42 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > No,this is not right. > > > > I will explain later when i have a moment. > > On Apr 12, 2013 8:08 AM, "Chirag Lakhani" <[email protected]> wrote: > > > >> I am having trouble understanding whether the following code is > sufficient > >> for running PCA > >> > >> I have a sequence file of dense vectors that I am calling and then I am > >> trying to run the following code > >> > >> SSVDSolver pcaFactory = new SSVDSolver(conf, new Path(vectorsFolder), > new > >> Path(pcaOutput),18,5,3,10); > >> > >> > >> pcaFactory.setPcaMeanPath(pcaFactory.getPcaMeanPath()); > >> > > ssvd solver doesn't compute pca mean -- it requires it. this line > therefore achieves nothing > > SSVDCli.java computes PCA mean using DistributedRowMatrix and passes it > over to SSVD Solver. This behavior is switched on by -pca option. See the > SSVDCli code for details. > > -d > > > >> pcaFactory.run(); > >> > >> > >> Is this enough for PCA or does anyone have example code they are willing > >> to > >> share to see how PCA works using the SSVD solver. > >> > > >
