Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-04-04 Thread Nikos Alexandris
On Sun, 2009-03-22 at 23:53 +0100, Nikos Alexandris wrote:
 I have prepaired a grass-location
 containing the 3 modis bands I've used for all PCA related examples. I
 will eventually upload them in gregis [1]

If anyone would like to try what I tried here are the MODIS bands:

# grass location
https://git.berlios.de/cgi-bin/gitweb.cgi?p=gregis;a=blob_plain;f=peloponnesos/modis_peloponnese_postfire07.zip;hb=peloponnese

# geotiffs
https://git.berlios.de/cgi-bin/gitweb.cgi?p=gregis;a=blob_plain;f=peloponnesos/modis_peloponnese_postfire07_GeoTiff.zip;hb=peloponnese

Cheers, Nikos


___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-04-02 Thread Nikos Alexandris
Nikos:
  If outputs are not identical, either R or grass do some hidden
  modification or there is a bug in either grass or R (all within
  limits, e.g. identical up to the 5th digit in scientific format is
  fine?).
  Some textbooks give a rule of thumb for further analysis to use only
  components with an eigenvalue =1

 I think this depends on what you are trying to achieve. Of course,
 components with small(-er) eigenvalues include more noizzze. In my
 change detection project I used *only* components with eigenvalues 
 1.

Hmm.. Markus, I was too quick yesterday. That's not true. All of the
PC's I've used have eigenvalue  1. Sorry :-p

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


RE: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-04-02 Thread Patton, Eric
 than 1. The reasoning is fairly weak, but goes like this: if a PC has
 eigenvalue  1, it explains more variance than any of the original
 variables, which all have variance 1.
 
 Maybe I should Cc: this to the wiki.
 --
 Edzer

Or even better include it in the docs if there is anything in your post 
that does a better job explaining the module. I'm completely unfamiliar with
these modules, but could you suggest some portions of your explanation that 
would be useful for inclusion in the documentation?

~ Eric.
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-04-01 Thread Hamish

Markus Neteler wrote:
  Remember that you have to rescale these data first:
  
  https://lpdaac.usgs.gov/lpdaac/products/modis_products_table
  - MOD09Q1  Terra 
 Surface Reflectance Bands 1–2 
 Tile  250m 
 8 Day
  
  https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/surface_reflectance/8_day_l3_global_250m/v5/terra
 - layers
  
  ... MULTIPLY BY SCALE FACTOR ...

Nikos wrote:
 Yep, I know that. But why is it wrong to just use them as they are?


FWIW, the MODIS/Aqua Chlorophyll-a product requires more than simple
linear rescaling:
  r.mapcalc ${map}.chlor_A = 10^(($Slope * $map) + $Intercept)

http://grass.osgeo.org/wiki/MODIS#Processing_2


so you may need to take care if working directly with raw 0-65535 data.


Hamish





___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-04-01 Thread Markus Metz


Edzer Pebesma wrote:

Markus Metz wrote:
  

I'm more familiar with non-spatial PCA, so it's high time I read the
manual of i.pca, and the new wiki page on it...

  

I think there's no such thing as spatial or non-spatial PCA. There's
just PCA.
  
That was a feeble attempt to buy time to go through some statistics 
literature ;-)


So it seems that this thread is about the different values for 
eigenvalues. AFAIKT, the answer is in the very first post of this thread 
[1]. It seems that i.pca output is supposed to be identical to 
prcomp(center=FALSE, scale=FALSE) output in R, because a PCA is 
scale-sensitive and the eigenvalue as reported by i.pca is the variance 
of the raw, unstandardised data. If outputs are not identical, either R 
or grass do some hidden modification or there is a bug in either grass 
or R (all within limits, e.g. identical up to the 5th digit in 
scientific format is fine?).


Some textbooks give a rule of thumb for further analysis to use only 
components with an eigenvalue =1 which obviously only works if the 
eigenvalue is calculated from standardised values (center=TRUE, 
scale=TRUE or e.g. r.mapcalc standardised_map = (map - mean) / stddev). 
E.g., comparing the results of MODIS raw and MODIS scaled with 0.0001 
should give eigenvalue #x of MODIS scaled = 1E-8 * eigenvalue #x of 
MODIS raw.


BTW, the rescaling method of i.pca is not very convincing, as pointed 
out by Augustin Lobo. IMHO, fool-proof would be normalization (x - mean) 
/ stddev.


[1] http://lists.osgeo.org/pipermail/grass-user/2009-March/049306.html
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-04-01 Thread Markus Metz


Edzer Pebesma wrote:

Markus, a few notes:

- if you do PCA on uncentered data, by computing the eigenvalues of the
uncentered covariance matrix, this implies that bands with a larger mean
will get more influence on the final PCAs. I have sofar not managed
finding an argument why this would be desirable.
  
Add it to wiki? E.g. bands entered in a PCA should have the same mean, 
but normalization is also an option.

- if you do PCA on (band-mean)/sd(band), it means that you first
normalize (scale) 

I think scale and normalize are two different things.

each variable to mean zero and unit variance. This
procedure is identical to doing PCA on the correlation matrix. It means
that, unlike for unscaled variables, variables with larger variance will
not get more influence on the PCA than others. For image analysis I can
see a place for both; if bands with low variance indicate insignificant
and perhaps noisy information, you may downweight them. 
Variance is dependent on range, I would rather use something like 
coefficient of variation (stddev/mean) to get some scale-independent 
indicator on the amount of information in a given band. A downscaled 
band (e.g. MODIS scale of 0.0001) has still the same information but 
lower variance.

- Only in case of normalized variables, or equivalently PCA on
correlations, it makes sense to select PC's with an eigenvalue larger
than 1. The reasoning is fairly weak, but goes like this: if a PC has
eigenvalue  1, it explains more variance than any of the original
variables, which all have variance 1.
  
Sounds good to me, why should I use a component that explains less than 
any of the original bands? And the whole purpose of a PCA is variable 
reduction to get a new set of variables, each explaining the whole 
dataset better than one of the original variables/bands. A PCA produces 
as many components as input variables, so some selection is usually 
necessary for further processing, could also be % explained variance. 
OTOH, sometimes only the first component is of interest. There may be 
exceptions for imagery processing, e.g. haze reduction (would have to 
read up on imagery processing too to say anything more about where 
components with eigenvalue  1 could be useful).


___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-04-01 Thread Nikos Alexandris
Markus:
 It seems that i.pca output is supposed to be identical to 
 prcomp(center=FALSE, scale=FALSE) output in R, because a PCA is 
 scale-sensitive and the eigenvalue as reported by i.pca is the variance 
 of the raw, unstandardised data.

The thing is that with the SPOT data all seems fine and i.pca ==
prcomp(x, center=TRUE, scale=FALSE) which is not the case for the MODIS
bands I work with.


 If outputs are not identical, either R or grass do some hidden
 modification or there is a bug in either grass or R (all within
 limits, e.g. identical up to the 5th digit in scientific format is
 fine?).
 Some textbooks give a rule of thumb for further analysis to use only 
 components with an eigenvalue =1

I think this depends on what you are trying to achieve. Of course,
components with small(-er) eigenvalues include more noizzze. In my
change detection project I used *only* components with eigenvalues  1.


 which obviously only works if the eigenvalue is calculated from
 standardised values (center=TRUE, scale=TRUE or e.g. r.mapcalc
 standardised_map = (map - mean) / stddev).

 E.g., comparing the results of MODIS raw and MODIS scaled with 0.0001 
 should give eigenvalue #x of MODIS scaled = 1E-8 * eigenvalue #x of
 MODIS raw.

I didn't find the time to rescale and re-test. I will... at some
point :-)


 BTW, the rescaling method of i.pca is not very convincing, as pointed 
 out by Augustin Lobo. IMHO, fool-proof would be normalization (x -
 mean) / stddev.

Kind regards, Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-04-01 Thread Nikos Alexandris
On Wed, 2009-04-01 at 18:21 +0200, Edzer Pebesma wrote:
 Markus, a few notes:
 
 - if you do PCA on uncentered data, by computing the eigenvalues of the
 uncentered covariance matrix, this implies that bands with a larger mean
 will get more influence on the final PCAs. I have sofar not managed
 finding an argument why this would be desirable.
 - if you do PCA on (band-mean)/sd(band), it means that you first
 normalize (scale) each variable to mean zero and unit variance. This
 procedure is identical to doing PCA on the correlation matrix. It means
 that, unlike for unscaled variables, variables with larger variance will
 not get more influence on the PCA than others. For image analysis I can
 see a place for both; if bands with low variance indicate insignificant
 and perhaps noisy information, you may downweight them. Or not, if they
 contain (equally) important information. Scaling becomes urgent when you
 compute PCAs from a mix of things with uncomparable units, such as image
 bands and DTMs.
 - Only in case of normalized variables, or equivalently PCA on
 correlations, it makes sense to select PC's with an eigenvalue larger
 than 1. The reasoning is fairly weak, but goes like this: if a PC has
 eigenvalue  1, it explains more variance than any of the original
 variables, which all have variance 1.
 
 Maybe I should Cc: this to the wiki.
 --
 Edzer

Nice to see this expert-comments! Really helpful to understand the
process better.

Thanks, Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Agustin Lobo

Nikos,
I'm probably missing a message (I do not see which is
exactly the difference between the spot and the modis
example), but
Perhaps you must use
r.mapcalc to rescale and not r.rescale, or if using r.rescale you must 
use r.mapcalc afterwards to rewrite the output raster, as r.rescale

is modifying the categories only and not the actual raster values which
is what i.pca is probably using. Maybe, as i.pca is quite old, it 
considers the categories only up to 255. In any case, using r.mapcalc 
you know better what you are doing.


Also, note that you must rescale
using the mean, not max min values. In other words
r.mapcalc band1c = band1 - mean(band1)

Agus



Nikos Alexandris wrote:

Roger:
Good, thanks. There you say that you are using some MODIS surface 
reflectance products. I guess it will be easier to check things if the 
data (GRASS location) are available, so that others can try the same 
calculations. Would it be possible to make one or more test sets 
available, and link them from thw Wiki?


Nikos:

Finally,
I am, sort of, free again :-).


Sorry for being late. Couldn't make it earlier. I've updated the
wiki-page [1].



I have prepaired a grass-location
containing the 3 modis bands I've used for all PCA related examples. I
will eventually upload them in gregis [check previous post] but if there is 
another place
(GRASS-wiki?) to upload them would be also ok.


I have currently some access problem in gregis this why I can't upload
the MODIS bands there. Is there any other place to put them for now?


Now, I am more confused than before. Or maybe not...

There seems to be no problem with the SPOT images, that is, the results
of i.pca are (almost) identical with the results of R's prcomp(x,
center=TRUE) *with* centering. This is completely different than with
the example using MODIS data. 


Perhaps it depends on the range of the input data?! The SPOT bands are
up to max. 255. The MODIS bands are:

# band 2
r.info -r mod_b2

min=0
max=5504

# band 6
r.info -r mod_b6

min=93
max=5488

# band 7
r.info -r mod_b7

min=0
max=4133


I ran the same PCA tests (in R) with the _rescaled_ as like with the
original MODIS bands (using r.rescale) but no luck, I get the same
contradictory results: _i.pca_ matches with _prcomp(x, center=FALSE)_.

I suspect that the range of the input data is the reason... but I am not
sure.  This is why I tried to repeat the test with _rescaled_ MODIS
bands. Given that the r.rescale products are kind of a rules file
under the cats directory, is r.rescale safe? Should I instead
rescale the data with r.mapcalc or is there no difference between
r.rescale and r.mapcalc? 


All explained in the *under construction* wiki. Kind regards, Nikos
---

[1] http://grass.osgeo.org/wiki/Principal_Component_Analysis

___
grass-stats mailing list
grass-st...@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-stats



--
Dr. Agustin Lobo
Institut de Ciencies de la Terra Jaume Almera (CSIC)
LLuis Sole Sabaris s/n
08028 Barcelona
Spain
Tel. 34 934095410
Fax. 34 934110012
email: agustin.l...@ija.csic.es
http://www.ija.csic.es/gt/obster
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Nikos Alexandris
Agustin:
 Nikos, I'm probably missing a message (I do not see which is
 exactly the difference between the spot and the modis example),

Hi Agus. The difference is:

Using SPOT (range up to max. 255):
i.pca == prcomp(x, center=TRUE, scale=FALSE)

Using MODIS (range varries between bands, up to max. ~5500):
i.pca == prcomp(x, center=FALSE, scale=FALSE)


  but Perhaps you must use r.mapcalc to rescale and not r.rescale, or
 if using r.rescale you must use r.mapcalc afterwards to rewrite the
 output raster, as r.rescale is modifying the categories only and not
 the actual raster values which is what i.pca is probably using. Maybe,
 as i.pca is quite old, it considers the categories only up to 255. In
 any case, using r.mapcalc you know better what you are doing.

That's the confirmation I needed. I think rescaling the MODIS data to
0~255 will confirm that:

IF r.info -r = max. 255
THEN i.pca == prcomp(x, center=TRUE, scale=FALSE)


 Also, note that you must rescale
 using the mean, not max min values. In other words
 r.mapcalc band1c = band1 - mean(band1)

Thanks for the tip. Maybe this can be added in the grass-wiki FAQ !?

Cheers, Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Nikos Alexandris
On Tue, 2009-03-31 at 12:56 +0200, Edzer Pebesma wrote:
 I tried to improve the wiki page, but never got access.
 --
 Edzer

Edzer, it definitely needs to be improved. Hopefully you'll get access
while the thing is boiling.

@admins: Please, we need access ;-)
Kindest regards, Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Markus Metz


Nikos Alexandris wrote:

Agustin:
  

Nikos, I'm probably missing a message (I do not see which is
exactly the difference between the spot and the modis example),



Hi Agus. The difference is:

Using SPOT (range up to max. 255):
i.pca == prcomp(x, center=TRUE, scale=FALSE)

Using MODIS (range varries between bands, up to max. ~5500):
  
Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day 
L3 Global 500m.
As I would use the max possible for SPOT (255) I would also use the max 
possible for other sensors, not the observed max, otherwise statistics 
that don't do proper normalization on the fly are not comparable between 
different sensors.
But any PCA should do normalization beforehand, not sure what R or i.pca 
are doing.

Please excuse my random comments :-)

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Nikos Alexandris

Agustin:   
  Nikos, I'm probably missing a message (I do not see which is
  exactly the difference between the spot and the modis example),
  

Nikos:
  Hi Agus. The difference is:
  Using SPOT (range up to max. 255):
  i.pca == prcomp(x, center=TRUE, scale=FALSE)
  Using MODIS (range varries between bands, up to max. ~5500):
[...]

Markus M:
 Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day 
 L3 Global 500m.

I am referring to the range of the specific MODIS bands that I've used,
that is:
# band 2
r.info -r mod_b2

min=0
max=5504

# band 6
r.info -r mod_b6

min=93
max=5488

# band 7
r.info -r mod_b7

min=0
max=4133


 As I would use the max possible for SPOT (255) I would also use the max 
 possible for other sensors, not the observed max, otherwise statistics 
 that don't do proper normalization on the fly are not comparable between 
 different sensors.

 But any PCA should do normalization beforehand, not sure what R or i.pca 
 are doing.

That's the question _now_ (if I am right with my tests): why isn't
normalisation (=data centering in this case) performed with the MODIS
bands I use?


 Please excuse my random comments :-)

Markus, your comments are required :-)
Cheers, Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Markus Neteler
On Tue, Mar 31, 2009 at 5:00 PM, Markus Metz
markus.metz.gisw...@googlemail.com wrote:
 Nikos Alexandris wrote:
 Using MODIS (range varries between bands, up to max. ~5500):


 Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day L3
 Global 500m.

Remember that you have to rescale these data first:

https://lpdaac.usgs.gov/lpdaac/products/modis_products_table
- MOD09Q1  Terra   Surface Reflectance Bands 1–2   Tile250m8 Day

https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/surface_reflectance/8_day_l3_global_250m/v5/terra
   - layers

... MULTIPLY BY SCALE FACTOR ...

best,
Markus
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Nikos Alexandris
On Tue, 2009-03-31 at 17:13 +0200, Markus Neteler wrote:
 On Tue, Mar 31, 2009 at 5:00 PM, Markus Metz
 markus.metz.gisw...@googlemail.com wrote:
  Nikos Alexandris wrote:
  Using MODIS (range varries between bands, up to max. ~5500):
 
 
  Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day L3
  Global 500m.
 
 Remember that you have to rescale these data first:
 
 https://lpdaac.usgs.gov/lpdaac/products/modis_products_table
 - MOD09Q1Terra   Surface Reflectance Bands 1–2   Tile250m8 Day
 
 https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/surface_reflectance/8_day_l3_global_250m/v5/terra
- layers
 
 ... MULTIPLY BY SCALE FACTOR ...

 best,
 Markus

Yep, I know that. But why is it wrong to just use them as they are?
Regards, Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Nikos Alexandris
Edzer:
 With Markus' help I got access, and corrected/extended quite a few things,
 including why R sometimes doesn't print loadings (when they're close to
 0).

Great!


 I do worry about the wiki that states results are similar between R and
 grass, when the differences appear e.g. in the third or second digit of a
 loading (eigenvector value). Clearly something is is going on, here.

So you think it's really important? Isn't it a matter of pixel-inclusion
in the various algorithms/programs?

Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Markus Metz


Nikos Alexandris wrote:

Nikos:
  

Hi Agus. The difference is:
Using SPOT (range up to max. 255):
  
That implied to me that the max possible value of SPOT was used, not the 
max observed (often the max observed is also the max possible). In my 
experience, when comparing imagery or derived products from different 
sensors, the least errors occur if everything is rescaled to the same 
range, e.g. 0 - 255 for surface reflectance or -1 - 1 for NDVI, whatever 
is convenient or appropriate.
Note that the scale factor for MODIS is 0.0001 which would give you a 
new range of -0.01 - 1.6 (I hope I got my math right), not 0 - 255 as 
for e.g. SPOT or Landsat.



___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Nikos Alexandris

Nikos:  
  Hi Agus. The difference is:
  Using SPOT (range up to max. 255):

Markus M:  
 That implied to me that the max possible value of SPOT was used, not the 
 max observed (often the max observed is also the max possible).

Right. Sorry for not being precise.


 In my experience, when comparing imagery or derived products from different 
 sensors, the least errors occur if everything is rescaled to the same 
 range, e.g. 0 - 255 for surface reflectance or -1 - 1 for NDVI, whatever 
 is convenient or appropriate.

 Note that the scale factor for MODIS is 0.0001 which would give you a 
 new range of -0.01 - 1.6 (I hope I got my math right), not 0 - 255 as 
 for e.g. SPOT or Landsat.

The thing is by multiplying by 0.0001 thing are worse concerning the
*eigenvalues* (the eigenvectors are the same):


# using the MODIS bands as they are
r.info -r mod_b2

min=0
max=5504


# use of i.pca gives
r.info -h pca_mod_b267.1

[...]
Data Description:
   generated by i.pca
Comments:
   Eigen values, (vectors), and [percent importance]:
   PC1 6307563.04 (-0.6353,-0.6485,-0.4192)[98.71%]
   PC2  78023.63 (-0.7124, 0.2828, 0.6422)[1.22%]
   PC3   4504.60 (-0.2979, 0.7067,-0.6417)[0.07%]
   

# using MODIS bands multiplied by 0.0001
r.info -r mod_b2_x

min=0
max=0.5504


# using i.pca gives
r.info -h pca.mod_x.1

[...]   
Data Description:
   generated by i.pca
Comments:
   Eigen values, (vectors), and [percent importance]:
   PC1  0.06 (-0.6353,-0.6485,-0.4192)[98.71%]
   PC2  0.00 (-0.7124, 0.2828, 0.6422)[1.22%]
   PC3  0.00 (-0.2979, 0.7067,-0.6417)[0.07%]

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Markus Metz


Nikos Alexandris wrote:


The thing is by multiplying by 0.0001 thing are worse concerning the
*eigenvalues* (the eigenvectors are the same):


# using the MODIS bands as they are
r.info -r mod_b2

min=0
max=5504


# use of i.pca gives
r.info -h pca_mod_b267.1

[...]
Data Description:
   generated by i.pca
Comments:
   Eigen values, (vectors), and [percent importance]:
   PC1 6307563.04 (-0.6353,-0.6485,-0.4192)[98.71%]
   PC2  78023.63 (-0.7124, 0.2828, 0.6422)[1.22%]
   PC3   4504.60 (-0.2979, 0.7067,-0.6417)[0.07%]
   


# using MODIS bands multiplied by 0.0001
r.info -r mod_b2_x

min=0
max=0.5504


# using i.pca gives
r.info -h pca.mod_x.1

[...]   
Data Description:

   generated by i.pca
Comments:
   Eigen values, (vectors), and [percent importance]:
   PC1  0.06 (-0.6353,-0.6485,-0.4192)[98.71%]
   PC2  0.00 (-0.7124, 0.2828, 0.6422)[1.22%]
   PC3  0.00 (-0.2979, 0.7067,-0.6417)[0.07%]

  
OK, I don't have the full discussion on i.pca in my head, so I don't 
know how much sense my comments make. The loadings and percentages 
explained variance are identical, that's good. The Eigenvalues are not, 
it seems they were calculated from unstandardised (raw) values. For 
imagery processing, that may be desired, for other applications AFAIK it 
is required that input variables variables (here different bands) are 
standardised first so they can be combined and principal components 
extracted. I'm more familiar with non-spatial PCA, so it's high time I 
read the manual of i.pca, and the new wiki page on it...


___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-31 Thread Nikos Alexandris

Nikos:
  The thing is by multiplying by 0.0001 thing are worse concerning the
  *eigenvalues* (the eigenvectors are the same):

  # use of i.pca gives
  r.info -h pca_mod_b267.1
[...]
 Eigen values, (vectors), and [percent importance]:
 PC1 6307563.04 (-0.6353,-0.6485,-0.4192)[98.71%]
 PC2  78023.63 (-0.7124, 0.2828, 0.6422)[1.22%]
 PC3   4504.60 (-0.2979, 0.7067,-0.6417)[0.07%]

  # using i.pca gives
  r.info -h pca.mod_x.1
[...]
 Eigen values, (vectors), and [percent importance]:
 PC1  0.06 (-0.6353,-0.6485,-0.4192)[98.71%]
 PC2  0.00 (-0.7124, 0.2828, 0.6422)[1.22%]
 PC3  0.00 (-0.2979, 0.7067,-0.6417)[0.07%]


Markus M:   
 OK, I don't have the full discussion on i.pca in my head, so I don't 
 know how much sense my comments make. The loadings and percentages 
 explained variance are identical, that's good.

Yep.


 The Eigenvalues are not, it seems they were calculated from unstandardised 
 (raw) values.

Note: the percent importance is nothing else than just transforemd
eigenvalues (that is: sum-up all eigenvalues and say the sum is the
100%, take then the percent of each eigenvalue).

The fact that the 2nd an d 3rd eigenvalues  in the above example are
0.00 is a (another) print-out/report issue I think.

However, the multiplication of the MODIS bands with the recommended
factor (0.0001) does nothing to the way i.pca treats the data.

##
I am convinced that i.pca wrongly _depends_ currently on the range of
the input data whether to apply data centering or not.

I just need to rescale the MODIS bands in to 0,255 and confirm my
skepsis. If I am wrong then it might be even more complicated!!
##


  For imagery processing, that may be desired, for other applications AFAIK it 
 is required that input variables variables (here different bands) are 
 standardised first so they can be combined and principal components 
 extracted. 

As Augustin Lobo and others suggested, data centering should be
performed. And me, with my limited knowledge and experience think the
same.


 I'm more familiar with non-spatial PCA, so it's high time I read the
 manual of i.pca, and the new wiki page on it...

Markus, I might missed to update some important sentences. So handle
with care :-)

Cheers, Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-30 Thread Nikos Alexandris

Roger:
   Good, thanks. There you say that you are using some MODIS surface 
   reflectance products. I guess it will be easier to check things if the 
   data (GRASS location) are available, so that others can try the same 
   calculations. Would it be possible to make one or more test sets 
   available, and link them from thw Wiki?

Nikos:
 Finally,
 I am, sort of, free again :-).

Sorry for being late. Couldn't make it earlier. I've updated the
wiki-page [1].


 I have prepaired a grass-location
 containing the 3 modis bands I've used for all PCA related examples. I
 will eventually upload them in gregis [check previous post] but if there is 
 another place
 (GRASS-wiki?) to upload them would be also ok.

I have currently some access problem in gregis this why I can't upload
the MODIS bands there. Is there any other place to put them for now?


Now, I am more confused than before. Or maybe not...

There seems to be no problem with the SPOT images, that is, the results
of i.pca are (almost) identical with the results of R's prcomp(x,
center=TRUE) *with* centering. This is completely different than with
the example using MODIS data. 

Perhaps it depends on the range of the input data?! The SPOT bands are
up to max. 255. The MODIS bands are:

# band 2
r.info -r mod_b2

min=0
max=5504

# band 6
r.info -r mod_b6

min=93
max=5488

# band 7
r.info -r mod_b7

min=0
max=4133


I ran the same PCA tests (in R) with the _rescaled_ as like with the
original MODIS bands (using r.rescale) but no luck, I get the same
contradictory results: _i.pca_ matches with _prcomp(x, center=FALSE)_.

I suspect that the range of the input data is the reason... but I am not
sure.  This is why I tried to repeat the test with _rescaled_ MODIS
bands. Given that the r.rescale products are kind of a rules file
under the cats directory, is r.rescale safe? Should I instead
rescale the data with r.mapcalc or is there no difference between
r.rescale and r.mapcalc? 

All explained in the *under construction* wiki. Kind regards, Nikos
---

[1] http://grass.osgeo.org/wiki/Principal_Component_Analysis

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-23 Thread Markus Neteler
On Thu, Mar 12, 2009 at 8:26 AM, Hamish hamis...@yahoo.com wrote:

 Nikos:
  http://grass.osgeo.org/wiki/Principal_Component_Analysis

...
 better yet, use the standard Spearfish SPOT imagery sample dataset
 for the examples. (as m.eigensystem and now i.pca help page
 examples do, probably others)

Let me suggest the Landsat imagery in the North Carolina location
which are more recent (the Spearfish *imagery* sample dataset
isn't that great).

Markus
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-12 Thread Hamish

Nikos:
  http://grass.osgeo.org/wiki/Principal_Component_Analysis

Roger:
 Good, thanks. There you say that you are using some MODIS
 surface reflectance products. I guess it will be easier to
 check things if the data (GRASS location) are available, so
 that others can try the same calculations. Would it be possible
 to make one or more test sets available, and link them from thw
 Wiki?

better yet, use the standard Spearfish SPOT imagery sample dataset
for the examples. (as m.eigensystem and now i.pca help page
examples do, probably others)

is there a way to change text color in MediaWiki? it might be
easier to highlight m.eigensystem output rows by color instead
of '#'s.


idea for possible google summer of code 2009 project: overhaul i.*
modules/lib for grass7. US$4.5k bounty :)


cheers,
Hamish



  

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-10 Thread Nikos Alexandris
On Tue, 2009-03-03 at 14:26 +0100, Nikos Alexandris wrote:
 On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote:
  On Tue, Mar 3, 2009 at 12:26 PM, Hamish hamis...@yahoo.com wrote:
  ...
  
   it will be really good to finally have all this documented.
  
  I find it hard to follow these long mails. Why not enjoying the
  GRASS Wiki to stabilize the documentation and comparisons?
  
  Markus
 
 Yes, why not? Hmmm, you mean to put these stuff already in the Wiki
 before we are sure about the one last and very important variable
 (=eigenvalues reported by  i.pca)?
 
 Kind regards, Nikos 

I've started the PCA grass-wiki page. I am trying to build it... step by
step. Expert advice is always welcome and highly appreciated.

http://grass.osgeo.org/wiki/Principal_Component_Analysis

Kind regards, Nikos

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-10 Thread Roger Bivand

On Tue, 10 Mar 2009, Nikos Alexandris wrote:


On Tue, 2009-03-03 at 14:26 +0100, Nikos Alexandris wrote:

On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote:

On Tue, Mar 3, 2009 at 12:26 PM, Hamish hamis...@yahoo.com wrote:
...


it will be really good to finally have all this documented.


I find it hard to follow these long mails. Why not enjoying the
GRASS Wiki to stabilize the documentation and comparisons?

Markus


Yes, why not? Hmmm, you mean to put these stuff already in the Wiki
before we are sure about the one last and very important variable
(=eigenvalues reported by  i.pca)?

Kind regards, Nikos


I've started the PCA grass-wiki page. I am trying to build it... step by
step. Expert advice is always welcome and highly appreciated.

http://grass.osgeo.org/wiki/Principal_Component_Analysis


Good, thanks. There you say that you are using some MODIS surface 
reflectance products. I guess it will be easier to check things if the 
data (GRASS location) are available, so that others can try the same 
calculations. Would it be possible to make one or more test sets 
available, and link them from thw Wiki?


Roger



Kind regards, Nikos

___
grass-stats mailing list
grass-st...@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-stats



--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: roger.biv...@nhh.no

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-10 Thread Nikos Alexandris

 Nikos:
  I've started the PCA grass-wiki page. I am trying to build it... step by
  step. Expert advice is always welcome and highly appreciated.
  http://grass.osgeo.org/wiki/Principal_Component_Analysis


 Roger:
 Good, thanks. There you say that you are using some MODIS surface 
 reflectance products. I guess it will be easier to check things if the 
 data (GRASS location) are available, so that others can try the same 
 calculations. Would it be possible to make one or more test sets 
 available, and link them from thw Wiki?


Absolutely! I need some time though (=some days?, currently
overloaded :-).

Kindest regards, Nikos

P.S. I was informed today that I have to wait another 2 weeks to get the
book Applied Spatial Data Analysis with R :-(

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-03 Thread Markus Neteler
On Tue, Mar 3, 2009 at 12:26 PM, Hamish hamis...@yahoo.com wrote:
...

 it will be really good to finally have all this documented.

I find it hard to follow these long mails. Why not enjoying the
GRASS Wiki to stabilize the documentation and comparisons?

Markus
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

2009-03-03 Thread Nikos Alexandris
On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote:
 On Tue, Mar 3, 2009 at 12:26 PM, Hamish hamis...@yahoo.com wrote:
 ...
 
  it will be really good to finally have all this documented.
 
 I find it hard to follow these long mails. Why not enjoying the
 GRASS Wiki to stabilize the documentation and comparisons?
 
 Markus

Yes, why not? Hmmm, you mean to put these stuff already in the Wiki
before we are sure about the one last and very important variable
(=eigenvalues reported by  i.pca)?

Kind regards, Nikos 

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user