In case someone else is using the SSVD+PCA from the SSVDSolver API. Unless you 
want to supply your own mean you need to first calculate one from the 
distributed matrix in question as Dmitriy explains.

Begin forwarded message:

From: Dmitriy Lyubimov <[email protected]>
Subject: Re: SSVD+PCA
Date: August 31, 2012 3:42:45 PM PDT
To: Pat Ferrel <[email protected]>

Bottom line, external tools need to arrive at the offset and the
solver just accept any offset (mean or otherwise) as a parameter with
this api:  solver.setPcaMeanPath(xiPath)

xi is the mean (usually denoted by mu but in my working notes i had a
conflict with something else so i called it "xi" there.)


On Fri, Aug 31, 2012 at 3:40 PM, Dmitriy Lyubimov <[email protected]> wrote:
> Aha. not exactly. The mean computation is a property of the
> Distributed matrix, so SSVD cli piggy backs to compute one. But once
> the mean is computed, its path is passed on to the solver to use. Here
> the code from SSVDCli:
> 
>    // MAHOUT-817
>    if (pca && xiPath == null) {
>      xiPath = new Path(getTempPath(), "xi");
>      MatrixColumnMeansJob.run(conf, inputPaths[0], getTempPath());
>    }
> 
>    SSVDSolver solver =
>      new SSVDSolver(conf, inputPaths, getTempPath(), r, k, p, reduceTasks);
> 
> 
> .........
>    solver.setPcaMeanPath(xiPath);
> .........
>    solver.run();
> 
> 
> On Fri, Aug 31, 2012 at 3:35 PM, Dmitriy Lyubimov <[email protected]> wrote:
>> There should be. let me look at the code.
>> 
>> On Fri, Aug 31, 2012 at 3:31 PM, Pat Ferrel <[email protected]> wrote:
>>> Is there a way to tell the SSVDSolver to do a PCA calculation from the API? 
>>> I only see a way to pass in the offset, but I am often wrong.

Reply via email to