Re: how to run PCA from Mahout

Dmitriy Lyubimov Tue, 06 Sep 2011 10:51:22 -0700

You need to massage your data to compute (and subract) a median first,
as far as i understand. That should be relatively easy to do. Then you
can run a distributed SVD on it ('bin/mahout ssvd' command from trunk
should be quite good to try).


-d


On Tue, Sep 6, 2011 at 5:33 AM, Amr Desoky <[email protected]> wrote:
> Hi,
>   It is mentioned on the web site : 
> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
>   That you implement the following algorithms within Mahout :
>      Gaussian Discriminative Analysis
>     Independent Component Analysis
>    Principal Components Analysis
>
> But unfortunately, I could not find any help or documentation  on how to use 
> these algorithms!!
> specially  I would like to try PCA on a huge data set of ~10Million vectors 
> of 400 components each.
>
> Please give me some help on how to run PCA (and also ICA, GDA) whatever 
> available.
>
> Best regards,
> Amr
>
>
> Amr Ibrahim El-Desoky, Mousa
> PhD Student, Computer Science (i6),
> RWTH-Aachen University,
> Aachen, Germany
> Cel.     : +49 0176 56418470
> Office : +49 241 8021620
> Fax      : +49 241 8022219

Re: how to run PCA from Mahout

Reply via email to