You need to massage your data to compute (and subract) a median first,
as far as i understand. That should be relatively easy to do. Then you
can run a distributed SVD on it ('bin/mahout ssvd' command from trunk
should be quite good to try).-d On Tue, Sep 6, 2011 at 5:33 AM, Amr Desoky <[email protected]> wrote: > Hi, > It is mentioned on the web site : > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms > That you implement the following algorithms within Mahout : > Gaussian Discriminative Analysis > Independent Component Analysis > Principal Components Analysis > > But unfortunately, I could not find any help or documentation on how to use > these algorithms!! > specially I would like to try PCA on a huge data set of ~10Million vectors > of 400 components each. > > Please give me some help on how to run PCA (and also ICA, GDA) whatever > available. > > Best regards, > Amr > > > Amr Ibrahim El-Desoky, Mousa > PhD Student, Computer Science (i6), > RWTH-Aachen University, > Aachen, Germany > Cel. : +49 0176 56418470 > Office : +49 241 8021620 > Fax : +49 241 8022219
