Le mardi 22 août 2006 à 17:51 -0400, Kai Wang a écrit : 
> I need to do some principal component analysis and I came across the
> PDL package by searching CPAN archive. It is strange that there was a
> PDL::PCA module in the v1 release, but it is gone in the v2 release.
> Where can I find the module, or is there an alternative disguised
> module to use?

A PCA is really simple, it's just an eigen decomposition (or a singular
value decomposition) of a covariance or a correlation matrix. Here is a
perldl session:

We work with a 5x10 array (5 variables)

perldl> $a=random(5,10)

[
[ 0.81459466  0.68912187  0.69473342  0.40310166 0.053549728]
[ 0.49116007  0.51132722  0.59193333  0.13071509  0.23396271]
[  0.7311982  0.32066284  0.58360821  0.93889985  0.31170175]
[ 0.21440786  0.16216256  0.52012476 0.014589886  0.48918438]
[ 0.66915246 0.052257777  0.37053337  0.66329092   0.8158816]
[  0.6775804  0.54029059  0.40736243  0.61037705  0.66184753]
[0.021697479  0.80545599  0.32724047  0.94343127  0.69164962]
[ 0.18851102  0.20058168  0.65451622  0.58169081  0.82885017]
[ 0.64679303  0.40926274  0.89771871  0.44790381  0.02813467]
[ 0.42665488 0.019801185   0.5025285  0.88013821  0.70019348]
]

We will perform a PCA on a correlation matrix so first we center and
scale the data (not scaled for a covariance matrix). The code and the
documentation of this function is available at the end of this mail.

perldl> $astd = $a->stdize(1,1,1)

[
[   1.218811   1.1927081  0.82078193 -0.49363657  -1.4042758]
[0.011145874  0.52592354  0.21681373  -1.3429709 -0.81226258]
[ 0.90741882 -0.18912652  0.16790221    1.177047 -0.55716712]
[  -1.022213 -0.78355132 -0.20507394  -1.7050633 0.025230343]
[ 0.67574764  -1.1957281   -1.083949  0.31766501   1.0972655]
[ 0.70721654  0.63454509 -0.86757196  0.15267318  0.59181299]
[ -1.7417699   1.6289969  -1.3383022   1.1911765  0.68960655]
[ -1.1189086 -0.63946781  0.58449893 0.063225998    1.139821]
[ 0.59226025  0.14315033   2.0133553 -0.35393806  -1.4876736]
[ -0.2297086  -1.3174503 -0.30845497   0.9938211  0.71764267]
]

Now we can compute the correlation matrix (see the Pearson's correlation
formula to understand what's going on (on sample here)). The previous
computation plus the next is just an expensive way to compute this
matrix.
perldl> $cor = pdl(1/($a->dim(1)-1)) * transpose($astd) x $astd

[
[             1 -0.00053138444     0.32878255  -0.0073759656
-0.51271005]
[-0.00053138444              1    0.012026092   0.0083757904
-0.41298933]
[    0.32878255    0.012026092              1    -0.35579867
-0.73372351]
[ -0.0073759656   0.0083757904    -0.35579867              1
0.40637719]
[   -0.51271005    -0.41298933    -0.73372351     0.40637719
1]
]

And now we perform the PCA (beware the eigenvalues/vectors are not in
descending order).
perldl> ($loadings,$var) = eigens($cor)

[
[  0.51277015 -0.023151796   0.26688486   0.71880084  -0.38552379]
[ 0.079766792   0.87374023   0.34570482  -0.26501553  -0.20117209]
[  -0.6612732  -0.24381615   0.45367723  0.002921031  -0.54538049]
[ -0.53818726   0.38881605  -0.16524866   0.64268649   0.34470785]
[ 0.061451273  -0.15942949   0.75903261 0.0061367069   0.62820205]
]
[0.52221377  1.0757427 0.10460516 0.99055987  2.3068785]

It was simple, no ?

Check the results now (non-published code and it's publication is not
planned in a near future). The signs are different but it doesn't matter
(and in fact the code used here is a recuperation of an old version of
this module after a hard drive lost and as I see in this example I need
to change the signs (column = -column if  sum(sign(column)) < 0) in the
code.


perldl> $pca =
PDL::Multivariate::pca->new($a,center=>1,scale=>1,method=>'eigen');

perldl> $pca->loadings(1)

      PC 1     PC 2     PC 3     PC 4     PC 5 

[  0.3855  0.02315   0.7188   0.5128  -0.2669]
[  0.2012  -0.8737   -0.265  0.07977  -0.3457]
[  0.5454   0.2438 0.002921  -0.6613  -0.4537]
[ -0.3447  -0.3888   0.6427  -0.5382   0.1652]
[ -0.6282   0.1594 0.006137  0.06145   -0.759]


perldl> $loadings

[
[  0.51277015 -0.023151796   0.26688486   0.71880084  -0.38552379]
[ 0.079766792   0.87374023   0.34570482  -0.26501553  -0.20117209]
[  -0.6612732  -0.24381615   0.45367723  0.002921031  -0.54538049]
[ -0.53818726   0.38881605  -0.16524866   0.64268649   0.34470785]
[ 0.061451273  -0.15942949   0.75903261 0.0061367069   0.62820205]
]

Standard deviations of the principal components:

perldl> $pca->stdev(1)

                     PC 1    PC 2    PC 3    PC 4    PC 5 

Std.Dev.         [  1.519   1.037  0.9953  0.7226  0.3234]
Variance         [  2.307   1.076  0.9906  0.5222  0.1046]
Proportion Var.  [ 0.4614  0.2151  0.1981  0.1044 0.02092]
Cumulative Prop. [ 0.4614  0.6765  0.8746  0.9791       1]


perldl> $var->sqrt
[ 0.7226436  1.0371802  0.3234272 0.99526874  1.5188412]


And if you want the scores:

perldl> $astd x $loadings  

[
[ 0.35672166  0.84572869  0.12565888  0.23652252   -2.209788]
[ 0.57714863 0.013729798 -0.11145723 -0.99882677  -1.2015417]
[ -0.3285283  0.31929055 -0.36444564   1.4559184 -0.34763135]
[ 0.46814201  -1.2779324 -0.33581793  -1.6233796 0.091660452]
[ 0.86437537 -0.84753944 0.055581472   1.0103408    1.370001]
[ 0.94115815  0.71459163  0.43848936  0.43940211  0.49726102]
[-0.47690319   2.1431498 -0.18226365 -0.91781941   1.9174891]
[-0.97524868 -0.83247209  0.60020031 -0.58546697  0.97906707]
[-0.91720037 -0.27996197 0.050261502  0.15706055  -2.4117378]
[-0.50966528 -0.79858464 -0.27620706   0.8262484   1.3152202]
]



perldl> $pca->scores(1)

      PC 1     PC 2     PC 3     PC 4     PC 5 

[    2.21  -0.8457   0.2365   0.3567  -0.1257]
[   1.202 -0.01373  -0.9988   0.5771   0.1115]
[  0.3476  -0.3193    1.456  -0.3285   0.3644]
[-0.09166    1.278   -1.623   0.4681   0.3358]
[   -1.37   0.8475     1.01   0.8644 -0.05558]
[ -0.4973  -0.7146   0.4394   0.9412  -0.4385]
[  -1.917   -2.143  -0.9178  -0.4769   0.1823]
[ -0.9791   0.8325  -0.5855  -0.9752  -0.6002]
[   2.412     0.28   0.1571  -0.9172 -0.05026]
[  -1.315   0.7986   0.8262  -0.5097   0.2762]




If you want the eigsys from PDL::Slatec returns eigenvalues/vectors in
ascending order but works on array of float. You can also use a singular
value decomposition (svdc in PDL::Slatec) to perform a PCA (it's the
preferred way); the procedure is slightly different (no need to compute
the covariance/correlation matrix and some others things) and I will
maybe explain how to do it in a future mail.

Greg


==============================================================================
=head2 stdize

=for ref

Standardization (possibly weighted) of matrix over specified axis:

                         a - mean
        STANDARDIZED = ------------
                        stdev(n-1)

Uses arithmetic mean and standard deviation estimation if asked.
Can use arbitrary values (PDL vector) and compute inplace.

=for usage

PDL = stdize(PDL, SCALAR(axis), SCALAR|PDL(center), SCALAR|PDL(scale),
SCALAR(weight))
axis   : threading's axis, generally observation, default = 1 unless a
vector is given
center : center data by variable NOCENTER = 0 | CENTER = 1  | PDL,
DEFAULT = 1
scale  : scale data by variable NOSCALE = 0 | SCALE = 1 | PDL, DEFAULT =
0
weight : PDL of weights (size(entry matrix)), DEFAULT = ones(entry
matrix)

=for example

my $a = random(10,10);
my $standardized = stdize($a,1,1,1);

=cut

*stdize = \&PDL::stdize;

sub PDL::stdize{
        my($m, $obs, $center, $scale, $weight) = @_;
        $obs = 1 unless defined($obs) ||  $m->getndims < 2;
        $center = 1 unless defined $center;
        my ($mean, $rms, $mm);

        $m = $m->copy unless $m->is_inplace(0);

        $mm = $m->mv($obs,0);
        if ( !UNIVERSAL::isa($center,'PDL') || !
UNIVERSAL::isa($scale,'PDL')) {
                ($mean, $rms) = $mm->statsover($weight);
                $center = $mean if (!UNIVERSAL::isa($center,'PDL') &&
$center);
                $scale = $rms if (!UNIVERSAL::isa($scale,'PDL') &&
$scale);
        }
        $mm = $mm->mv(0,$m->getndims-1);
        if (UNIVERSAL::isa($center,'PDL')){
                $mm -=  $center;
        }
        if (UNIVERSAL::isa($scale,'PDL')){
                $mm /=  $scale;
        }
        $m;

}



        
        
                
___________________________________________________________________________ 
Découvrez un nouveau moyen de poser toutes vos questions quelque soit le sujet 
! 
Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et vos 
expériences. 
http://fr.answers.yahoo.com 



_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to