Note that normally subtracting anything fills in sparse matrices. This appears to be a special case (since it has 400 columns) that might not have this problem.
On Tue, Sep 6, 2011 at 5:53 PM, Dmitriy Lyubimov <[email protected]> wrote: > I am sorry, i meant 'subtract a mean', not median. That's for PCA. > > On Tue, Sep 6, 2011 at 10:50 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > You need to massage your data to compute (and subract) a median first, > > as far as i understand. That should be relatively easy to do. Then you > > can run a distributed SVD on it ('bin/mahout ssvd' command from trunk > > should be quite good to try). > > > > -d > > > > > > On Tue, Sep 6, 2011 at 5:33 AM, Amr Desoky <[email protected]> wrote: > >> Hi, > >> It is mentioned on the web site : > https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms > >> That you implement the following algorithms within Mahout : > >> Gaussian Discriminative Analysis > >> Independent Component Analysis > >> Principal Components Analysis > >> > >> But unfortunately, I could not find any help or documentation on how to > use these algorithms!! > >> specially I would like to try PCA on a huge data set of ~10Million > vectors of 400 components each. > >> > >> Please give me some help on how to run PCA (and also ICA, GDA) whatever > available. > >> > >> Best regards, > >> Amr > >> > >> > >> Amr Ibrahim El-Desoky, Mousa > >> PhD Student, Computer Science (i6), > >> RWTH-Aachen University, > >> Aachen, Germany > >> Cel. : +49 0176 56418470 > >> Office : +49 241 8021620 > >> Fax : +49 241 8022219 > > >
