In case someone else is using the SSVD+PCA from the SSVDSolver API. Unless you want to supply your own mean you need to first calculate one from the distributed matrix in question as Dmitriy explains.
Begin forwarded message: From: Dmitriy Lyubimov <[email protected]> Subject: Re: SSVD+PCA Date: August 31, 2012 3:42:45 PM PDT To: Pat Ferrel <[email protected]> Bottom line, external tools need to arrive at the offset and the solver just accept any offset (mean or otherwise) as a parameter with this api: solver.setPcaMeanPath(xiPath) xi is the mean (usually denoted by mu but in my working notes i had a conflict with something else so i called it "xi" there.) On Fri, Aug 31, 2012 at 3:40 PM, Dmitriy Lyubimov <[email protected]> wrote: > Aha. not exactly. The mean computation is a property of the > Distributed matrix, so SSVD cli piggy backs to compute one. But once > the mean is computed, its path is passed on to the solver to use. Here > the code from SSVDCli: > > // MAHOUT-817 > if (pca && xiPath == null) { > xiPath = new Path(getTempPath(), "xi"); > MatrixColumnMeansJob.run(conf, inputPaths[0], getTempPath()); > } > > SSVDSolver solver = > new SSVDSolver(conf, inputPaths, getTempPath(), r, k, p, reduceTasks); > > > ......... > solver.setPcaMeanPath(xiPath); > ......... > solver.run(); > > > On Fri, Aug 31, 2012 at 3:35 PM, Dmitriy Lyubimov <[email protected]> wrote: >> There should be. let me look at the code. >> >> On Fri, Aug 31, 2012 at 3:31 PM, Pat Ferrel <[email protected]> wrote: >>> Is there a way to tell the SSVDSolver to do a PCA calculation from the API? >>> I only see a way to pass in the offset, but I am often wrong.
