Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Sun, 2009-03-22 at 23:53 +0100, Nikos Alexandris wrote: I have prepaired a grass-location containing the 3 modis bands I've used for all PCA related examples. I will eventually upload them in gregis [1] If anyone would like to try what I tried here are the MODIS bands: # grass location https://git.berlios.de/cgi-bin/gitweb.cgi?p=gregis;a=blob_plain;f=peloponnesos/modis_peloponnese_postfire07.zip;hb=peloponnese # geotiffs https://git.berlios.de/cgi-bin/gitweb.cgi?p=gregis;a=blob_plain;f=peloponnesos/modis_peloponnese_postfire07_GeoTiff.zip;hb=peloponnese Cheers, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos: If outputs are not identical, either R or grass do some hidden modification or there is a bug in either grass or R (all within limits, e.g. identical up to the 5th digit in scientific format is fine?). Some textbooks give a rule of thumb for further analysis to use only components with an eigenvalue =1 I think this depends on what you are trying to achieve. Of course, components with small(-er) eigenvalues include more noizzze. In my change detection project I used *only* components with eigenvalues 1. Hmm.. Markus, I was too quick yesterday. That's not true. All of the PC's I've used have eigenvalue 1. Sorry :-p ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
RE: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
than 1. The reasoning is fairly weak, but goes like this: if a PC has eigenvalue 1, it explains more variance than any of the original variables, which all have variance 1. Maybe I should Cc: this to the wiki. -- Edzer Or even better include it in the docs if there is anything in your post that does a better job explaining the module. I'm completely unfamiliar with these modules, but could you suggest some portions of your explanation that would be useful for inclusion in the documentation? ~ Eric. ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Markus Neteler wrote: Remember that you have to rescale these data first: https://lpdaac.usgs.gov/lpdaac/products/modis_products_table - MOD09Q1 Terra Surface Reflectance Bands 1–2 Tile 250m 8 Day https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/surface_reflectance/8_day_l3_global_250m/v5/terra - layers ... MULTIPLY BY SCALE FACTOR ... Nikos wrote: Yep, I know that. But why is it wrong to just use them as they are? FWIW, the MODIS/Aqua Chlorophyll-a product requires more than simple linear rescaling: r.mapcalc ${map}.chlor_A = 10^(($Slope * $map) + $Intercept) http://grass.osgeo.org/wiki/MODIS#Processing_2 so you may need to take care if working directly with raw 0-65535 data. Hamish ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Edzer Pebesma wrote: Markus Metz wrote: I'm more familiar with non-spatial PCA, so it's high time I read the manual of i.pca, and the new wiki page on it... I think there's no such thing as spatial or non-spatial PCA. There's just PCA. That was a feeble attempt to buy time to go through some statistics literature ;-) So it seems that this thread is about the different values for eigenvalues. AFAIKT, the answer is in the very first post of this thread [1]. It seems that i.pca output is supposed to be identical to prcomp(center=FALSE, scale=FALSE) output in R, because a PCA is scale-sensitive and the eigenvalue as reported by i.pca is the variance of the raw, unstandardised data. If outputs are not identical, either R or grass do some hidden modification or there is a bug in either grass or R (all within limits, e.g. identical up to the 5th digit in scientific format is fine?). Some textbooks give a rule of thumb for further analysis to use only components with an eigenvalue =1 which obviously only works if the eigenvalue is calculated from standardised values (center=TRUE, scale=TRUE or e.g. r.mapcalc standardised_map = (map - mean) / stddev). E.g., comparing the results of MODIS raw and MODIS scaled with 0.0001 should give eigenvalue #x of MODIS scaled = 1E-8 * eigenvalue #x of MODIS raw. BTW, the rescaling method of i.pca is not very convincing, as pointed out by Augustin Lobo. IMHO, fool-proof would be normalization (x - mean) / stddev. [1] http://lists.osgeo.org/pipermail/grass-user/2009-March/049306.html ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Edzer Pebesma wrote: Markus, a few notes: - if you do PCA on uncentered data, by computing the eigenvalues of the uncentered covariance matrix, this implies that bands with a larger mean will get more influence on the final PCAs. I have sofar not managed finding an argument why this would be desirable. Add it to wiki? E.g. bands entered in a PCA should have the same mean, but normalization is also an option. - if you do PCA on (band-mean)/sd(band), it means that you first normalize (scale) I think scale and normalize are two different things. each variable to mean zero and unit variance. This procedure is identical to doing PCA on the correlation matrix. It means that, unlike for unscaled variables, variables with larger variance will not get more influence on the PCA than others. For image analysis I can see a place for both; if bands with low variance indicate insignificant and perhaps noisy information, you may downweight them. Variance is dependent on range, I would rather use something like coefficient of variation (stddev/mean) to get some scale-independent indicator on the amount of information in a given band. A downscaled band (e.g. MODIS scale of 0.0001) has still the same information but lower variance. - Only in case of normalized variables, or equivalently PCA on correlations, it makes sense to select PC's with an eigenvalue larger than 1. The reasoning is fairly weak, but goes like this: if a PC has eigenvalue 1, it explains more variance than any of the original variables, which all have variance 1. Sounds good to me, why should I use a component that explains less than any of the original bands? And the whole purpose of a PCA is variable reduction to get a new set of variables, each explaining the whole dataset better than one of the original variables/bands. A PCA produces as many components as input variables, so some selection is usually necessary for further processing, could also be % explained variance. OTOH, sometimes only the first component is of interest. There may be exceptions for imagery processing, e.g. haze reduction (would have to read up on imagery processing too to say anything more about where components with eigenvalue 1 could be useful). ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Markus: It seems that i.pca output is supposed to be identical to prcomp(center=FALSE, scale=FALSE) output in R, because a PCA is scale-sensitive and the eigenvalue as reported by i.pca is the variance of the raw, unstandardised data. The thing is that with the SPOT data all seems fine and i.pca == prcomp(x, center=TRUE, scale=FALSE) which is not the case for the MODIS bands I work with. If outputs are not identical, either R or grass do some hidden modification or there is a bug in either grass or R (all within limits, e.g. identical up to the 5th digit in scientific format is fine?). Some textbooks give a rule of thumb for further analysis to use only components with an eigenvalue =1 I think this depends on what you are trying to achieve. Of course, components with small(-er) eigenvalues include more noizzze. In my change detection project I used *only* components with eigenvalues 1. which obviously only works if the eigenvalue is calculated from standardised values (center=TRUE, scale=TRUE or e.g. r.mapcalc standardised_map = (map - mean) / stddev). E.g., comparing the results of MODIS raw and MODIS scaled with 0.0001 should give eigenvalue #x of MODIS scaled = 1E-8 * eigenvalue #x of MODIS raw. I didn't find the time to rescale and re-test. I will... at some point :-) BTW, the rescaling method of i.pca is not very convincing, as pointed out by Augustin Lobo. IMHO, fool-proof would be normalization (x - mean) / stddev. Kind regards, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Wed, 2009-04-01 at 18:21 +0200, Edzer Pebesma wrote: Markus, a few notes: - if you do PCA on uncentered data, by computing the eigenvalues of the uncentered covariance matrix, this implies that bands with a larger mean will get more influence on the final PCAs. I have sofar not managed finding an argument why this would be desirable. - if you do PCA on (band-mean)/sd(band), it means that you first normalize (scale) each variable to mean zero and unit variance. This procedure is identical to doing PCA on the correlation matrix. It means that, unlike for unscaled variables, variables with larger variance will not get more influence on the PCA than others. For image analysis I can see a place for both; if bands with low variance indicate insignificant and perhaps noisy information, you may downweight them. Or not, if they contain (equally) important information. Scaling becomes urgent when you compute PCAs from a mix of things with uncomparable units, such as image bands and DTMs. - Only in case of normalized variables, or equivalently PCA on correlations, it makes sense to select PC's with an eigenvalue larger than 1. The reasoning is fairly weak, but goes like this: if a PC has eigenvalue 1, it explains more variance than any of the original variables, which all have variance 1. Maybe I should Cc: this to the wiki. -- Edzer Nice to see this expert-comments! Really helpful to understand the process better. Thanks, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos, I'm probably missing a message (I do not see which is exactly the difference between the spot and the modis example), but Perhaps you must use r.mapcalc to rescale and not r.rescale, or if using r.rescale you must use r.mapcalc afterwards to rewrite the output raster, as r.rescale is modifying the categories only and not the actual raster values which is what i.pca is probably using. Maybe, as i.pca is quite old, it considers the categories only up to 255. In any case, using r.mapcalc you know better what you are doing. Also, note that you must rescale using the mean, not max min values. In other words r.mapcalc band1c = band1 - mean(band1) Agus Nikos Alexandris wrote: Roger: Good, thanks. There you say that you are using some MODIS surface reflectance products. I guess it will be easier to check things if the data (GRASS location) are available, so that others can try the same calculations. Would it be possible to make one or more test sets available, and link them from thw Wiki? Nikos: Finally, I am, sort of, free again :-). Sorry for being late. Couldn't make it earlier. I've updated the wiki-page [1]. I have prepaired a grass-location containing the 3 modis bands I've used for all PCA related examples. I will eventually upload them in gregis [check previous post] but if there is another place (GRASS-wiki?) to upload them would be also ok. I have currently some access problem in gregis this why I can't upload the MODIS bands there. Is there any other place to put them for now? Now, I am more confused than before. Or maybe not... There seems to be no problem with the SPOT images, that is, the results of i.pca are (almost) identical with the results of R's prcomp(x, center=TRUE) *with* centering. This is completely different than with the example using MODIS data. Perhaps it depends on the range of the input data?! The SPOT bands are up to max. 255. The MODIS bands are: # band 2 r.info -r mod_b2 min=0 max=5504 # band 6 r.info -r mod_b6 min=93 max=5488 # band 7 r.info -r mod_b7 min=0 max=4133 I ran the same PCA tests (in R) with the _rescaled_ as like with the original MODIS bands (using r.rescale) but no luck, I get the same contradictory results: _i.pca_ matches with _prcomp(x, center=FALSE)_. I suspect that the range of the input data is the reason... but I am not sure. This is why I tried to repeat the test with _rescaled_ MODIS bands. Given that the r.rescale products are kind of a rules file under the cats directory, is r.rescale safe? Should I instead rescale the data with r.mapcalc or is there no difference between r.rescale and r.mapcalc? All explained in the *under construction* wiki. Kind regards, Nikos --- [1] http://grass.osgeo.org/wiki/Principal_Component_Analysis ___ grass-stats mailing list grass-st...@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-stats -- Dr. Agustin Lobo Institut de Ciencies de la Terra Jaume Almera (CSIC) LLuis Sole Sabaris s/n 08028 Barcelona Spain Tel. 34 934095410 Fax. 34 934110012 email: agustin.l...@ija.csic.es http://www.ija.csic.es/gt/obster ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Agustin: Nikos, I'm probably missing a message (I do not see which is exactly the difference between the spot and the modis example), Hi Agus. The difference is: Using SPOT (range up to max. 255): i.pca == prcomp(x, center=TRUE, scale=FALSE) Using MODIS (range varries between bands, up to max. ~5500): i.pca == prcomp(x, center=FALSE, scale=FALSE) but Perhaps you must use r.mapcalc to rescale and not r.rescale, or if using r.rescale you must use r.mapcalc afterwards to rewrite the output raster, as r.rescale is modifying the categories only and not the actual raster values which is what i.pca is probably using. Maybe, as i.pca is quite old, it considers the categories only up to 255. In any case, using r.mapcalc you know better what you are doing. That's the confirmation I needed. I think rescaling the MODIS data to 0~255 will confirm that: IF r.info -r = max. 255 THEN i.pca == prcomp(x, center=TRUE, scale=FALSE) Also, note that you must rescale using the mean, not max min values. In other words r.mapcalc band1c = band1 - mean(band1) Thanks for the tip. Maybe this can be added in the grass-wiki FAQ !? Cheers, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Tue, 2009-03-31 at 12:56 +0200, Edzer Pebesma wrote: I tried to improve the wiki page, but never got access. -- Edzer Edzer, it definitely needs to be improved. Hopefully you'll get access while the thing is boiling. @admins: Please, we need access ;-) Kindest regards, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos Alexandris wrote: Agustin: Nikos, I'm probably missing a message (I do not see which is exactly the difference between the spot and the modis example), Hi Agus. The difference is: Using SPOT (range up to max. 255): i.pca == prcomp(x, center=TRUE, scale=FALSE) Using MODIS (range varries between bands, up to max. ~5500): Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day L3 Global 500m. As I would use the max possible for SPOT (255) I would also use the max possible for other sensors, not the observed max, otherwise statistics that don't do proper normalization on the fly are not comparable between different sensors. But any PCA should do normalization beforehand, not sure what R or i.pca are doing. Please excuse my random comments :-) ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Agustin: Nikos, I'm probably missing a message (I do not see which is exactly the difference between the spot and the modis example), Nikos: Hi Agus. The difference is: Using SPOT (range up to max. 255): i.pca == prcomp(x, center=TRUE, scale=FALSE) Using MODIS (range varries between bands, up to max. ~5500): [...] Markus M: Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day L3 Global 500m. I am referring to the range of the specific MODIS bands that I've used, that is: # band 2 r.info -r mod_b2 min=0 max=5504 # band 6 r.info -r mod_b6 min=93 max=5488 # band 7 r.info -r mod_b7 min=0 max=4133 As I would use the max possible for SPOT (255) I would also use the max possible for other sensors, not the observed max, otherwise statistics that don't do proper normalization on the fly are not comparable between different sensors. But any PCA should do normalization beforehand, not sure what R or i.pca are doing. That's the question _now_ (if I am right with my tests): why isn't normalisation (=data centering in this case) performed with the MODIS bands I use? Please excuse my random comments :-) Markus, your comments are required :-) Cheers, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Tue, Mar 31, 2009 at 5:00 PM, Markus Metz markus.metz.gisw...@googlemail.com wrote: Nikos Alexandris wrote: Using MODIS (range varries between bands, up to max. ~5500): Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day L3 Global 500m. Remember that you have to rescale these data first: https://lpdaac.usgs.gov/lpdaac/products/modis_products_table - MOD09Q1 Terra Surface Reflectance Bands 1–2 Tile250m8 Day https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/surface_reflectance/8_day_l3_global_250m/v5/terra - layers ... MULTIPLY BY SCALE FACTOR ... best, Markus ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Tue, 2009-03-31 at 17:13 +0200, Markus Neteler wrote: On Tue, Mar 31, 2009 at 5:00 PM, Markus Metz markus.metz.gisw...@googlemail.com wrote: Nikos Alexandris wrote: Using MODIS (range varries between bands, up to max. ~5500): Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day L3 Global 500m. Remember that you have to rescale these data first: https://lpdaac.usgs.gov/lpdaac/products/modis_products_table - MOD09Q1Terra Surface Reflectance Bands 1–2 Tile250m8 Day https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/surface_reflectance/8_day_l3_global_250m/v5/terra - layers ... MULTIPLY BY SCALE FACTOR ... best, Markus Yep, I know that. But why is it wrong to just use them as they are? Regards, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Edzer: With Markus' help I got access, and corrected/extended quite a few things, including why R sometimes doesn't print loadings (when they're close to 0). Great! I do worry about the wiki that states results are similar between R and grass, when the differences appear e.g. in the third or second digit of a loading (eigenvector value). Clearly something is is going on, here. So you think it's really important? Isn't it a matter of pixel-inclusion in the various algorithms/programs? Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos Alexandris wrote: Nikos: Hi Agus. The difference is: Using SPOT (range up to max. 255): That implied to me that the max possible value of SPOT was used, not the max observed (often the max observed is also the max possible). In my experience, when comparing imagery or derived products from different sensors, the least errors occur if everything is rescaled to the same range, e.g. 0 - 255 for surface reflectance or -1 - 1 for NDVI, whatever is convenient or appropriate. Note that the scale factor for MODIS is 0.0001 which would give you a new range of -0.01 - 1.6 (I hope I got my math right), not 0 - 255 as for e.g. SPOT or Landsat. ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos: Hi Agus. The difference is: Using SPOT (range up to max. 255): Markus M: That implied to me that the max possible value of SPOT was used, not the max observed (often the max observed is also the max possible). Right. Sorry for not being precise. In my experience, when comparing imagery or derived products from different sensors, the least errors occur if everything is rescaled to the same range, e.g. 0 - 255 for surface reflectance or -1 - 1 for NDVI, whatever is convenient or appropriate. Note that the scale factor for MODIS is 0.0001 which would give you a new range of -0.01 - 1.6 (I hope I got my math right), not 0 - 255 as for e.g. SPOT or Landsat. The thing is by multiplying by 0.0001 thing are worse concerning the *eigenvalues* (the eigenvectors are the same): # using the MODIS bands as they are r.info -r mod_b2 min=0 max=5504 # use of i.pca gives r.info -h pca_mod_b267.1 [...] Data Description: generated by i.pca Comments: Eigen values, (vectors), and [percent importance]: PC1 6307563.04 (-0.6353,-0.6485,-0.4192)[98.71%] PC2 78023.63 (-0.7124, 0.2828, 0.6422)[1.22%] PC3 4504.60 (-0.2979, 0.7067,-0.6417)[0.07%] # using MODIS bands multiplied by 0.0001 r.info -r mod_b2_x min=0 max=0.5504 # using i.pca gives r.info -h pca.mod_x.1 [...] Data Description: generated by i.pca Comments: Eigen values, (vectors), and [percent importance]: PC1 0.06 (-0.6353,-0.6485,-0.4192)[98.71%] PC2 0.00 (-0.7124, 0.2828, 0.6422)[1.22%] PC3 0.00 (-0.2979, 0.7067,-0.6417)[0.07%] ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos Alexandris wrote: The thing is by multiplying by 0.0001 thing are worse concerning the *eigenvalues* (the eigenvectors are the same): # using the MODIS bands as they are r.info -r mod_b2 min=0 max=5504 # use of i.pca gives r.info -h pca_mod_b267.1 [...] Data Description: generated by i.pca Comments: Eigen values, (vectors), and [percent importance]: PC1 6307563.04 (-0.6353,-0.6485,-0.4192)[98.71%] PC2 78023.63 (-0.7124, 0.2828, 0.6422)[1.22%] PC3 4504.60 (-0.2979, 0.7067,-0.6417)[0.07%] # using MODIS bands multiplied by 0.0001 r.info -r mod_b2_x min=0 max=0.5504 # using i.pca gives r.info -h pca.mod_x.1 [...] Data Description: generated by i.pca Comments: Eigen values, (vectors), and [percent importance]: PC1 0.06 (-0.6353,-0.6485,-0.4192)[98.71%] PC2 0.00 (-0.7124, 0.2828, 0.6422)[1.22%] PC3 0.00 (-0.2979, 0.7067,-0.6417)[0.07%] OK, I don't have the full discussion on i.pca in my head, so I don't know how much sense my comments make. The loadings and percentages explained variance are identical, that's good. The Eigenvalues are not, it seems they were calculated from unstandardised (raw) values. For imagery processing, that may be desired, for other applications AFAIK it is required that input variables variables (here different bands) are standardised first so they can be combined and principal components extracted. I'm more familiar with non-spatial PCA, so it's high time I read the manual of i.pca, and the new wiki page on it... ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos: The thing is by multiplying by 0.0001 thing are worse concerning the *eigenvalues* (the eigenvectors are the same): # use of i.pca gives r.info -h pca_mod_b267.1 [...] Eigen values, (vectors), and [percent importance]: PC1 6307563.04 (-0.6353,-0.6485,-0.4192)[98.71%] PC2 78023.63 (-0.7124, 0.2828, 0.6422)[1.22%] PC3 4504.60 (-0.2979, 0.7067,-0.6417)[0.07%] # using i.pca gives r.info -h pca.mod_x.1 [...] Eigen values, (vectors), and [percent importance]: PC1 0.06 (-0.6353,-0.6485,-0.4192)[98.71%] PC2 0.00 (-0.7124, 0.2828, 0.6422)[1.22%] PC3 0.00 (-0.2979, 0.7067,-0.6417)[0.07%] Markus M: OK, I don't have the full discussion on i.pca in my head, so I don't know how much sense my comments make. The loadings and percentages explained variance are identical, that's good. Yep. The Eigenvalues are not, it seems they were calculated from unstandardised (raw) values. Note: the percent importance is nothing else than just transforemd eigenvalues (that is: sum-up all eigenvalues and say the sum is the 100%, take then the percent of each eigenvalue). The fact that the 2nd an d 3rd eigenvalues in the above example are 0.00 is a (another) print-out/report issue I think. However, the multiplication of the MODIS bands with the recommended factor (0.0001) does nothing to the way i.pca treats the data. ## I am convinced that i.pca wrongly _depends_ currently on the range of the input data whether to apply data centering or not. I just need to rescale the MODIS bands in to 0,255 and confirm my skepsis. If I am wrong then it might be even more complicated!! ## For imagery processing, that may be desired, for other applications AFAIK it is required that input variables variables (here different bands) are standardised first so they can be combined and principal components extracted. As Augustin Lobo and others suggested, data centering should be performed. And me, with my limited knowledge and experience think the same. I'm more familiar with non-spatial PCA, so it's high time I read the manual of i.pca, and the new wiki page on it... Markus, I might missed to update some important sentences. So handle with care :-) Cheers, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Roger: Good, thanks. There you say that you are using some MODIS surface reflectance products. I guess it will be easier to check things if the data (GRASS location) are available, so that others can try the same calculations. Would it be possible to make one or more test sets available, and link them from thw Wiki? Nikos: Finally, I am, sort of, free again :-). Sorry for being late. Couldn't make it earlier. I've updated the wiki-page [1]. I have prepaired a grass-location containing the 3 modis bands I've used for all PCA related examples. I will eventually upload them in gregis [check previous post] but if there is another place (GRASS-wiki?) to upload them would be also ok. I have currently some access problem in gregis this why I can't upload the MODIS bands there. Is there any other place to put them for now? Now, I am more confused than before. Or maybe not... There seems to be no problem with the SPOT images, that is, the results of i.pca are (almost) identical with the results of R's prcomp(x, center=TRUE) *with* centering. This is completely different than with the example using MODIS data. Perhaps it depends on the range of the input data?! The SPOT bands are up to max. 255. The MODIS bands are: # band 2 r.info -r mod_b2 min=0 max=5504 # band 6 r.info -r mod_b6 min=93 max=5488 # band 7 r.info -r mod_b7 min=0 max=4133 I ran the same PCA tests (in R) with the _rescaled_ as like with the original MODIS bands (using r.rescale) but no luck, I get the same contradictory results: _i.pca_ matches with _prcomp(x, center=FALSE)_. I suspect that the range of the input data is the reason... but I am not sure. This is why I tried to repeat the test with _rescaled_ MODIS bands. Given that the r.rescale products are kind of a rules file under the cats directory, is r.rescale safe? Should I instead rescale the data with r.mapcalc or is there no difference between r.rescale and r.mapcalc? All explained in the *under construction* wiki. Kind regards, Nikos --- [1] http://grass.osgeo.org/wiki/Principal_Component_Analysis ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Thu, Mar 12, 2009 at 8:26 AM, Hamish hamis...@yahoo.com wrote: Nikos: http://grass.osgeo.org/wiki/Principal_Component_Analysis ... better yet, use the standard Spearfish SPOT imagery sample dataset for the examples. (as m.eigensystem and now i.pca help page examples do, probably others) Let me suggest the Landsat imagery in the North Carolina location which are more recent (the Spearfish *imagery* sample dataset isn't that great). Markus ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos: http://grass.osgeo.org/wiki/Principal_Component_Analysis Roger: Good, thanks. There you say that you are using some MODIS surface reflectance products. I guess it will be easier to check things if the data (GRASS location) are available, so that others can try the same calculations. Would it be possible to make one or more test sets available, and link them from thw Wiki? better yet, use the standard Spearfish SPOT imagery sample dataset for the examples. (as m.eigensystem and now i.pca help page examples do, probably others) is there a way to change text color in MediaWiki? it might be easier to highlight m.eigensystem output rows by color instead of '#'s. idea for possible google summer of code 2009 project: overhaul i.* modules/lib for grass7. US$4.5k bounty :) cheers, Hamish ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Tue, 2009-03-03 at 14:26 +0100, Nikos Alexandris wrote: On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote: On Tue, Mar 3, 2009 at 12:26 PM, Hamish hamis...@yahoo.com wrote: ... it will be really good to finally have all this documented. I find it hard to follow these long mails. Why not enjoying the GRASS Wiki to stabilize the documentation and comparisons? Markus Yes, why not? Hmmm, you mean to put these stuff already in the Wiki before we are sure about the one last and very important variable (=eigenvalues reported by i.pca)? Kind regards, Nikos I've started the PCA grass-wiki page. I am trying to build it... step by step. Expert advice is always welcome and highly appreciated. http://grass.osgeo.org/wiki/Principal_Component_Analysis Kind regards, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Tue, 10 Mar 2009, Nikos Alexandris wrote: On Tue, 2009-03-03 at 14:26 +0100, Nikos Alexandris wrote: On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote: On Tue, Mar 3, 2009 at 12:26 PM, Hamish hamis...@yahoo.com wrote: ... it will be really good to finally have all this documented. I find it hard to follow these long mails. Why not enjoying the GRASS Wiki to stabilize the documentation and comparisons? Markus Yes, why not? Hmmm, you mean to put these stuff already in the Wiki before we are sure about the one last and very important variable (=eigenvalues reported by i.pca)? Kind regards, Nikos I've started the PCA grass-wiki page. I am trying to build it... step by step. Expert advice is always welcome and highly appreciated. http://grass.osgeo.org/wiki/Principal_Component_Analysis Good, thanks. There you say that you are using some MODIS surface reflectance products. I guess it will be easier to check things if the data (GRASS location) are available, so that others can try the same calculations. Would it be possible to make one or more test sets available, and link them from thw Wiki? Roger Kind regards, Nikos ___ grass-stats mailing list grass-st...@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-stats -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: roger.biv...@nhh.no ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
Nikos: I've started the PCA grass-wiki page. I am trying to build it... step by step. Expert advice is always welcome and highly appreciated. http://grass.osgeo.org/wiki/Principal_Component_Analysis Roger: Good, thanks. There you say that you are using some MODIS surface reflectance products. I guess it will be easier to check things if the data (GRASS location) are available, so that others can try the same calculations. Would it be possible to make one or more test sets available, and link them from thw Wiki? Absolutely! I need some time though (=some days?, currently overloaded :-). Kindest regards, Nikos P.S. I was informed today that I have to wait another 2 weeks to get the book Applied Spatial Data Analysis with R :-( ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Tue, Mar 3, 2009 at 12:26 PM, Hamish hamis...@yahoo.com wrote: ... it will be really good to finally have all this documented. I find it hard to follow these long mails. Why not enjoying the GRASS Wiki to stabilize the documentation and comparisons? Markus ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()
On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote: On Tue, Mar 3, 2009 at 12:26 PM, Hamish hamis...@yahoo.com wrote: ... it will be really good to finally have all this documented. I find it hard to follow these long mails. Why not enjoying the GRASS Wiki to stabilize the documentation and comparisons? Markus Yes, why not? Hmmm, you mean to put these stuff already in the Wiki before we are sure about the one last and very important variable (=eigenvalues reported by i.pca)? Kind regards, Nikos ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user