Re: size correction discriminant functions analyses
G'day Dr. Kidd, I've just started using a new computer and I put the wrong email address on my signature. The one below is the correct address. Did you have some comments about my last posting? see ya, Brett * Brett Human Shark Researcher 27 Southern Ave West Beach SA 5024 Australia +61 8 8356 6891 [EMAIL PROTECTED] * == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
RE: size correction discriminant functions analyses
A couple of observations: Using a correlation matrix rather than a covariance matrix has nothing to do with whether the data are normally distributed or not. One usually wants to use a covariance matrix. However if the variables are in various units that cannot be made consistent then one gives up and uses a correlation matrix. The main issue about using some of the other methods that were suggested is whether the groups (clusters) are a priori defined or are groups you are trying to discover in the data. If you know the groups in advance then it makes sense to consider CVA, CPCA, manova, etc. You then will run into the problem that I mentioned in my prior message - you need more observations than variables. The comments about normality were a bit off the point. With data such as you describe there is no expectation that the entire data set be consistent with a multivariate normal distribution. What you want is for the distributions within the clusters to be normal. For those your sample sizes will be even smaller so it is difficult to perform serious tests with data such as yours. Since you want to find clusters of species you really do not want your entire dataset to be consistent with sampling from a single normally distributed population. CVA = canonical variates analysis NMDS = nonmetric multidimensional scaling analysis. NMDS would be a good thing to try on your data. It is similar to a PCA ordination but is not constrained to the axes being linear functions of your original variables. It usually does a better job of summarizing distances between points in a low dimensional space. - F. James Rohlf -SUNY Stony Brook, NY 11794-5245 FAX: 1-631-632-7626 www: http://life.bio.sunysb.edu/ee/rohlf -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 26, 2004 1:08 AM To: [EMAIL PROTECTED] Subject: Re: size correction discriminant functions analyses G'day all, Sender: [EMAIL PROTECTED] Precedence: bulk Reply-To: [EMAIL PROTECTED] Thanks to everyone for your comments. They've been a great help, and I'm glad that my question sparked a bit of discussion on the subject. After some pondering, I've got a few more questions and some more details on the way I analysed my data. Although I was looking for species clustering, I wasn't terribly concerned with quantifying any clustering, and was using PCA more as a visualisation technique to explore my data. In the future I will try the various methods suggested to try to quantify the clustering. Another thing was with regards to the issue of multivariate normality. I did not use a variance-covariance matrix, instead I used a correlation matrix. I was under the assumption that by transforming the covariances into z-scores, I would have a greater chance of my data being (or approaching) multivariate normality? Also, for testing if my data is normally distributed, if I was to do separate PCA's for each population and if a population was normally dist., then would I expect to see an ellipsoid with it's greatest length along PC1 in a PCA plot? With regards to obtaining singular matrices when # measures # specimens, this did happen to me and the way I 'got round' this was to first regress every measurement against total length and then by looking at the slopes of the regressions, chose which measurements showed the greatest potential for between species differentiation. Because I was using PCA just as a qualitative tool, I didn't think it was much of a problem, however if I want to do quantitative analysis such as discriminant analysis, can I still use this same method of choosing measures, or am I restricted to stepwise methods using the whole data set? Forgive my ignorance, but what is NMDS and CVA? I assume PCO is principal coordinates analysis? I would also appreciate a pdf of the Darroch Mosimann paper if available. A final point, to perhaps spark more debate or at least to motivate some thought, is that I have found it very difficult to get a basic understanding of the application of multivariate stats to morphometrics because the text books available are very technical. An equation may be meaningful to the gurus, but it doesn't mean a whole lot to me. It is also one thing to describe how a procedure works, but it's another thing to implement it when you are ignorant of the software availble. I think there is a great need for a text book that can introduce the new student to this field without using equations to describe what's going on. There - I've said it, let the slaughter begin. Thanks, Brett * Brett Human Shark Researcher 27 Southern Ave West Beach SA 5024 Australia +61 8 8356 6891 [EMAIL PROTECTED] * == Replies will be sent to list. For more information see http
Re: size correction discriminant functions analyses
G'day all, Sender: [EMAIL PROTECTED] Precedence: bulk Reply-To: [EMAIL PROTECTED] Thanks to everyone for your comments. They've been a great help, and I'm glad that my question sparked a bit of discussion on the subject. After some pondering, I've got a few more questions and some more details on the way I analysed my data. Although I was looking for species clustering, I wasn't terribly concerned with quantifying any clustering, and was using PCA more as a visualisation technique to explore my data. In the future I will try the various methods suggested to try to quantify the clustering. Another thing was with regards to the issue of multivariate normality. I did not use a variance-covariance matrix, instead I used a correlation matrix. I was under the assumption that by transforming the covariances into z-scores, I would have a greater chance of my data being (or approaching) multivariate normality? Also, for testing if my data is normally distributed, if I was to do separate PCA's for each population and if a population was normally dist., then would I expect to see an ellipsoid with it's greatest length along PC1 in a PCA plot? With regards to obtaining singular matrices when # measures # specimens, this did happen to me and the way I 'got round' this was to first regress every measurement against total length and then by looking at the slopes of the regressions, chose which measurements showed the greatest potential for between species differentiation. Because I was using PCA just as a qualitative tool, I didn't think it was much of a problem, however if I want to do quantitative analysis such as discriminant analysis, can I still use this same method of choosing measures, or am I restricted to stepwise methods using the whole data set? Forgive my ignorance, but what is NMDS and CVA? I assume PCO is principal coordinates analysis? I would also appreciate a pdf of the Darroch Mosimann paper if available. A final point, to perhaps spark more debate or at least to motivate some thought, is that I have found it very difficult to get a basic understanding of the application of multivariate stats to morphometrics because the text books available are very technical. An equation may be meaningful to the gurus, but it doesn't mean a whole lot to me. It is also one thing to describe how a procedure works, but it's another thing to implement it when you are ignorant of the software availble. I think there is a great need for a text book that can introduce the new student to this field without using equations to describe what's going on. There - I've said it, let the slaughter begin. Thanks, Brett * Brett Human Shark Researcher 27 Southern Ave West Beach SA 5024 Australia +61 8 8356 6891 [EMAIL PROTECTED] * == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
RE: size correction discriminant functions analyses
This is for Brett Human - I have tried to respond to your latest posting but the address you give is bouncing. Rob Kidd At: [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, 26 May 2004 9:08 AM To: [EMAIL PROTECTED] Subject: Re: size correction discriminant functions analyses G'day all, Sender: [EMAIL PROTECTED] Precedence: bulk Reply-To: [EMAIL PROTECTED] Thanks to everyone for your comments. They've been a great help, and I'm glad that my question sparked a bit of discussion on the subject. After some pondering, I've got a few more questions and some more details on the way I analysed my data. Although I was looking for species clustering, I wasn't terribly concerned with quantifying any clustering, and was using PCA more as a visualisation technique to explore my data. In the future I will try the various methods suggested to try to quantify the clustering. Another thing was with regards to the issue of multivariate normality. I did not use a variance-covariance matrix, instead I used a correlation matrix. I was under the assumption that by transforming the covariances into z-scores, I would have a greater chance of my data being (or approaching) multivariate normality? Also, for testing if my data is normally distributed, if I was to do separate PCA's for each population and if a population was normally dist., then would I expect to see an ellipsoid with it's greatest length along PC1 in a PCA plot? With regards to obtaining singular matrices when # measures # specimens, this did happen to me and the way I 'got round' this was to first regress every measurement against total length and then by looking at the slopes of the regressions, chose which measurements showed the greatest potential for between species differentiation. Because I was using PCA just as a qualitative tool, I didn't think it was much of a problem, however if I want to do quantitative analysis such as discriminant analysis, can I still use this same method of choosing measures, or am I restricted to stepwise methods using the whole data set? Forgive my ignorance, but what is NMDS and CVA? I assume PCO is principal coordinates analysis? I would also appreciate a pdf of the Darroch Mosimann paper if available. A final point, to perhaps spark more debate or at least to motivate some thought, is that I have found it very difficult to get a basic understanding of the application of multivariate stats to morphometrics because the text books available are very technical. An equation may be meaningful to the gurus, but it doesn't mean a whole lot to me. It is also one thing to describe how a procedure works, but it's another thing to implement it when you are ignorant of the software availble. I think there is a great need for a text book that can introduce the new student to this field without using equations to describe what's going on. There - I've said it, let the slaughter begin. Thanks, Brett * Brett Human Shark Researcher 27 Southern Ave West Beach SA 5024 Australia +61 8 8356 6891 [EMAIL PROTECTED] * == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
1) PCA makes no assumptions about the distribution (multivariate normal or otherwise) of your data. It is a procedure that simply produces the linear combinations of variables with maximum variance subject to orthogonality to other such axes. OK, but variance may or may not be a meaningful parameter for non-normal data. If you are interested in size relationships, regress variables on some meaningful measure of size. If I only had a meaningful measure of size ... :-) Oyvind Hammer Geological Museum University of Oslo == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
Marta, I have a pdf version of the Darroch Mosimann Biometrika paper. What is your e-mail address so I can send it directly to you. Marc Moniz [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, May 20, 2004 11:35 AM To: [EMAIL PROTECTED] Subject: Re: size correction discriminant functions analyses Dear collegues, Sender: [EMAIL PROTECTED] Precedence: bulk Reply-To: [EMAIL PROTECTED] About the above discussion on the linear measurements data for multivariate analysis, I should state that most times my problem (and I expect the problem of many people that wrks with it) is not of rows/columns number (that most times is ok, at leats in the cases I saw) nether of multivariate normality (I use R-project program, which as a test of multivariate normality, so it is easy to test) or lack of homogeneity of variances (this is a bit more dodgy, but the ref. I saw state that if you test unniveriate variances homogeneity (e.g. Bartlett test) it shoud give a good indication of the data variances). The problem that (I supose) most biologists encounter are the collinearity between variables... which strongly influences the representation givn by the PCA. I think this also happens in the NMDS, discriminant and canonical analysis. I probably did not made myself clear in the email. I am sorry... For me, it is very interesting that this things are debate in the list, and different people shows different solutions and bibliography, it is realy nice. In relation to the article from Biometrika, does anyone have the pdf? We dont have the journal in this college. In relation to the robustmess of the techniques to lack of normality, I agree with our colegue (so... I share your feelings of daring to state it... jijijij ;-)) thank you for all, Cheers, Marta - This mail sent through IMP: http://horde.org/imp/ == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
In my understanding to PCA, its main goal is to reduce the dimensionality of a problem without the loss of too much information. In other words, according to Prof. Rohlf, the purpose of PCA is to give you a low dimensional space that accounts for as much variation as possible. However, I agree with Oyvind that many scientists use PCA as a visualization device, projecting a multivariate data set onto a sheet of paper. On the other hand, testing the multivariate normality before applying any multivariate data analysis technique is one of the most serious problems because in most cases none do that and if any tried to do he may choose the wrong way. Actually, we (biologists and paleontologists) need a definite guide to follow when we face such problem. Best regards --- Dr. Ashraf M. T. Elewa Associate Professor Geology Department Faculty of Science Minia University Egypt [EMAIL PROTECTED] http://myprofile.cos.com/aelewa - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, May 19, 2004 04:29 ? Subject: Re: size correction discriminant functions analyses Just a comment on this one, from a pragmatic point of view. It is of course true that PCA is only *guaranteed* to produce components maximizing variance if you have multivariate normality. The theory of PCA is based on this assumption. But in many cases, PCA is used purely as a visualization device, projecting a multivariate data set onto a sheet of paper so we can see it. For visualization of non-normal data, one could play around with different techniques, such as PCA, PCO, NMDS, projection pursuit etc., and then find that PCA does (or does not) perform well for the given data set. There is no law against making any linear combination you want of your variates, if it reveals information. For example, PCA may be perfectly adequate for resolving two well-separated groups, if the within-group variance is relatively small. Of course, when using PCA for non-normal data one must be a little careful and not over-interpret the results (especially not the component loadings), but I think it's too harsh to dismiss its use totally. I'm sure the hard-liners will flame me to pieces for this email, but I hope they will at least give me credit for my courage :-) Dr. Oyvind Hammer Geological Museum University of Oslo PCA Analysis assumes multivariate normality. Kathleen M. Robinette, Ph.D. Principal Research Anthropologist Air Force Research Laboratory == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
Don't know what happened to cause the earlier message largely void of content, but I think the original communication was to correct the Red Book reference. The date is 1985, not 1982. -ds On Tue, 2004-05-18 at 14:12, [EMAIL PROTECTED] wrote: -- Dennis E. Slice, Ph.D. Department of Biomedical Engineering Division of Radiologic Sciences Wake Forest University School of Medicine Winston-Salem, North Carolina, USA 27157-1022 Phone: 336-716-5384 Fax: 336-716-2870 Sender: [EMAIL PROTECTED] Precedence: bulk Reply-To: [EMAIL PROTECTED] == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. -- Dennis E. Slice, Ph.D. Department of Biomedical Engineering Division of Radiologic Sciences Wake Forest University School of Medicine Winston-Salem, North Carolina, USA 27157-1022 Phone: 336-716-5384 Fax: 336-716-2870 == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
Dear collegues, Sender: [EMAIL PROTECTED] Precedence: bulk Reply-To: [EMAIL PROTECTED] About the above discussion on the linear measurements data for multivariate analysis, I should state that most times my problem (and I expect the problem of many people that wrks with it) is not of rows/columns number (that most times is ok, at leats in the cases I saw) nether of multivariate normality (I use R-project program, which as a test of multivariate normality, so it is easy to test) or lack of homogeneity of variances (this is a bit more dodgy, but the ref. I saw state that if you test unniveriate variances homogeneity (e.g. Bartlett test) it shoud give a good indication of the data variances). The problem that (I supose) most biologists encounter are the collinearity between variables... which strongly influences the representation givn by the PCA. I think this also happens in the NMDS, discriminant and canonical analysis. I probably did not made myself clear in the email. I am sorry... For me, it is very interesting that this things are debate in the list, and different people shows different solutions and bibliography, it is realy nice. In relation to the article from Biometrika, does anyone have the pdf? We dont have the journal in this college. In relation to the robustmess of the techniques to lack of normality, I agree with our colegue (so... I share your feelings of daring to state it... jijijij ;-)) thank you for all, Cheers, Marta - This mail sent through IMP: http://horde.org/imp/ == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
I applaud your courage, Dr. Hammer. I hope everyone appreciates how intimidating this list of experts can be. I also agree with your point that PCA can be used when the data are not multivariate normal if you are just using it to visualize information, or if you just know what it is doing for that matter. I am a fan of using any and all analyses that help in figuring out what is happening. However, in order to understand the results and what you are visualizing you have to understand both the data input and what the statistical analysis is doing. Sometimes the information that seems to be revealed is an artifact of violation of the assumptions and if the observer doesn't realize this it is very easy to come to the wrong conclusion. I thought, what was the analysis doing and how to interpret it were the original questions we were discussing, although I admit to reading the e-mails quickly.The original e-mail indicated that perhaps size and shape confounding was causing their odd looking results. If the shapes are the same, but the sizes are different then the source of the non-normality would be multiple modes only. This may not be a serious enough violation to cause interpretability problems. However, it sounded to me from the description of the problem and the results that in addition to multiple modes there are multiple variance/covariance matrices. That was making it difficult to interpret the results, and since PCA is based upon the variance/covariance will result in difficult to interpret or even invalid components. Separating the analysis into subgroups will allow them to visualize and test the differences in the modes and in the variance/covariance matrices and in that way understand! the source of the differences in the groups. Maybe the common PCA analysis someone else mentioned might do this as well. I am not familiar with that method. Thanx all again for your attention and patience, Kath Kathleen M. Robinette, Ph.D. Principal Research Anthropologist Air Force Research Laboratory AFRL/HEPA 2800 Q Street Wright-Patterson AFB, OH 45433-7947 (937) 255-8810 DSN 785-8810 FAX (937) 255-8752 e-mail:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Wednesday, May 19, 2004 9:29 AM To: [EMAIL PROTECTED] Subject: Re: size correction discriminant functions analyses Just a comment on this one, from a pragmatic point of view. It is of course true that PCA is only *guaranteed* to produce components maximizing variance if you have multivariate normality. The theory of PCA is based on this assumption. But in many cases, PCA is used purely as a visualization device, projecting a multivariate data set onto a sheet of paper so we can see it. For visualization of non-normal data, one could play around with different techniques, such as PCA, PCO, NMDS, projection pursuit etc., and then find that PCA does (or does not) perform well for the given data set. There is no law against making any linear combination you want of your variates, if it reveals information. For example, PCA may be perfectly adequate for resolving two well-separated groups, if the within-group variance is relatively small. Of course, when using PCA for non-normal data one must be a little careful and not over-interpret the results (especially not the component loadings), but I think it's too harsh to dismiss its use totally. I'm sure the hard-liners will flame me to pieces for this email, but I hope they will at least give me credit for my courage :-) Dr. Oyvind Hammer Geological Museum University of Oslo PCA Analysis assumes multivariate normality. Kathleen M. Robinette, Ph.D. Principal Research Anthropologist Air Force Research Laboratory == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
Dr. Hammer, Please consider your courage credited. -ds A couple of points about PCA in general: 1) PCA makes no assumptions about the distribution (multivariate normal or otherwise) of your data. It is a procedure that simply produces the linear combinations of variables with maximum variance subject to orthogonality to other such axes. Distribution assumptions only come into play for (some) significance testing procedures. 2) PC1 will only identify size variation if size variation is the source of the greatest variation in your sample. Sex, species, habitat, etc. could all be determinants (not in the matrix sense 8-) ) of PC1 or some combination of these. In general, if you have data with some extreme outlier (e.g, transcription error), then the PC1 will (probably) just point to (or pi radians away from) the direction of that outlier relative to the main sample, which will still be the linear combination of maximum variance. What people often want PCA to do is either a) identify iso/allometry due to size variation in a sample or b) separate out sexes, species, or other groups. PCA is optimal for neither of these and could be quite misleading in both cases. If you are interested in size relationships, regress variables on some meaningful measure of size. If you are interested in group differences, look into CVA. If you have many more variables than specimens, you might do either of the above in a reduced PCA space if you check carefully to see if your limited data suggest you are capturing salient aspects of a space of reduced dimension resulting from the tight correlations amongst your variables. Otherwise, you must wave your hands vigorously before proceeding. See Marcus 1990 Blue Book chapter for a nice discussion of PCA and related methods. Books by Jackson and Joliffe and other authors specifically on Principal Components are available. -ds On Wed, 2004-05-19 at 09:29, [EMAIL PROTECTED] wrote: Just a comment on this one, from a pragmatic point of view. It is of course true that PCA is only *guaranteed* to produce components maximizing variance if you have multivariate normality. The theory of PCA is based on this assumption. But in many cases, PCA is used purely as a visualization device, projecting a multivariate data set onto a sheet of paper so we can see it. For visualization of non-normal data, one could play around with different techniques, such as PCA, PCO, NMDS, projection pursuit etc., and then find that PCA does (or does not) perform well for the given data set. There is no law against making any linear combination you want of your variates, if it reveals information. For example, PCA may be perfectly adequate for resolving two well-separated groups, if the within-group variance is relatively small. Of course, when using PCA for non-normal data one must be a little careful and not over-interpret the results (especially not the component loadings), but I think it's too harsh to dismiss its use totally. I'm sure the hard-liners will flame me to pieces for this email, but I hope they will at least give me credit for my courage :-) Dr. Oyvind Hammer Geological Museum University of Oslo PCA Analysis assumes multivariate normality. Kathleen M. Robinette, Ph.D. Principal Research Anthropologist Air Force Research Laboratory == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. -- Dennis E. Slice, Ph.D. Department of Biomedical Engineering Division of Radiologic Sciences Wake Forest University School of Medicine Winston-Salem, North Carolina, USA 27157-1022 Phone: 336-716-5384 Fax: 336-716-2870 == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
Dear Brett, If the problem is separating size and shape, then, fortunately, in my edited book titled Morphometrics- Applications in Biology and Paleontology (Springer-Verlag, 2004) you will find a chapter that is written by Garcia-Rodriguez et al. They used the Sheared PCA analysis and could successfully separate size and shape as separate components. Although there are more recent techniques for doing that, however, I recommend you to read this chapter for knowing how they could separate size and shape using an excellent and easy manner. Best regards. Ashraf --- Dr. Ashraf M. T. Elewa Associate Professor Geology Department Faculty of Science Minia University Egypt [EMAIL PROTECTED] http://myprofile.cos.com/aelewa - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, May 17, 2004 05:09 ? Subject: size correction discriminant functions analyses Dear morphometrician, I have recently reviewed 3 genera of catsharks that display a great deal of morphological conservation within the genera, however, there is also prominent sexual dimorphism present (profoundly so in some species). There is quite a bit of shape variation between juveniles and adults, in one genus in particular, but I think that the shape variation is being obscured by the size component. I have a sizeable morphometric data set (# measures # taxa specimens) and have used principal components analysis on the raw data to explore shape variation within each of the genera (not between). The first component was always a general component and accounted for more than 85-90% of the variation in most instances, therefore the bipolar components only contributed relatively little to the overall shape variation resulting in crowded PCA plots. The main reference I have used for the analyses to date has been 'Pimental. 1979. Morphometrics. The multivariate analysis of biological data' however, it doesn't deal with size correction. Can anyone suggest a review that deals with size correction, or can I convert my data to ratios and then log transform the data? I am also looking for reviews of canonical discriminant functions analysis and stepwise discriminant function analysis in an attempt to quantitate differences between species within a genus. Thanks for your help. Brett Brett Human Shark Researcher 27 Southern Ave West Beach SA 5024 Australia 61 8 8356 6891 [EMAIL PROTECTED] == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
There is a method called common PCA which seems to overcome the problem of non-multinormality of overall sample that includes several subsamples all with different central momenta. The source to read is: Flury B. 1988. Common principal components and relatÃ…d multivariate models. NY: Wiley. 258 p. Cheers, Igor - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, May 18, 2004 10:09 PM Subject: Re: size correction discriminant functions analyses Dear Brett and Marta, I think the problem you are encountering may not be the size-versus-shape issue, but a Normal distribution issue. PCA Analysis assumes multivariate normality. I know for human beings the distribution of men and women combined is often not Multivariate Normal. It is bi-modal and the male and female variance-covariance structure is different. This dramatically affects the correlation and covariance matrices and provides misleading components. I would assume this could be true for catsharks as well, and suspect that is why you found such a large amount of variation seemingly explained by your first component. We have found that for humans the lack of Normality is big enough that it requires doing separate PCA analyses for men and women, and in some cases separate analyses by ethnicity as well. In addition, it sounds to me that you have additional modes or non-normalities due to age. (I generally only work with adults.) Have you checked to see if your data is Normally d! istributed? If it isn't you could consider separating your samples into subgroups (gender and age groups) that are normally distributed, prior to PCA analysis. In other words, you would do a PCA analysis for each group, rather than just one PCA for all of them combined. I don't know how difficult this may be, not knowing your data. Or you might check into classification methods that do not depend upon the normality assumption. Most discriminant analyses also assume that the attributes of the entities within each group are Multivariate Normal, and that the variance-covariance structures of the entity attributes are equal across groups. You might be OK with the within-group normality assumption, but if there are important shape differences due to age or gender as you say then you may not be OK with the assumption of equal variance-covariance across groups. For example, there may be a strong correlation (covariance) between two attributes in younger growing catsharks that disappears when they reach adulthood. This would cause a difference in the covariance structure. You could break your data into groups and look at the differences/similarities in the variance/covariance matrices. This will tell you a lot about the similarities and differences between your groups as well. Hope this is helpful, Kathleen M. Robinette, Ph.D. Principal Research Anthropologist Air Force Research Laboratory AFRL/HEPA 2800 Q Street Wright-Patterson AFB, OH 45433-7947 (937) 255-8810 DSN 785-8810 FAX (937) 255-8752 e-mail:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, May 18, 2004 9:50 AM To: [EMAIL PROTECTED] Subject: Re: size correction discriminant functions analyses Dear Brett, I have the same problem. I found several approaches in the literature, bbut non efficient or clear review... well there were some, but too mathematic for me as a simple biologist. By what I know, it is complicated to work with ratios (which have difficult statistical properties). On the other way, you also have the problem of colinearity between variables (I imagine). I found some approaches to solve this, but none was universal or definitive. There is an article by Leonart et al. that proposes a simple formula, but it has been much discussed, and a statistical lecturer told me that it is not recent. On the other way, in Ade4 lab, I saw in the other day that they standartise the columns with the mean. I tred this, and it was very good... gave much clearer results. My supervisor said to use PCA, as it is and simply consider that the first component is 'size'... however this did not gave clear images of the data... thus I am as traped in the beggining. I suppose in the end all this hypothesis are possible and correct, and most will give very similar answers. I am also puzzled by the range of multivariate techniques, that give similar answers... particularly because in many cases different authors (and statistical packages) call the same techniques with different names, which really messes the things. I started to do a summary of it (which I can send you), of information I found in several books... as well, in the end, as I saw it now, things are much simpler, and mainly consist in a couple of method with variations, which arises different names. On the other way, people from the R list have discussed a lot stepwise analysis
Re: size correction discriminant functions analyses
Just a comment on this one, from a pragmatic point of view. It is of course true that PCA is only *guaranteed* to produce components maximizing variance if you have multivariate normality. The theory of PCA is based on this assumption. But in many cases, PCA is used purely as a visualization device, projecting a multivariate data set onto a sheet of paper so we can see it. For visualization of non-normal data, one could play around with different techniques, such as PCA, PCO, NMDS, projection pursuit etc., and then find that PCA does (or does not) perform well for the given data set. There is no law against making any linear combination you want of your variates, if it reveals information. For example, PCA may be perfectly adequate for resolving two well-separated groups, if the within-group variance is relatively small. Of course, when using PCA for non-normal data one must be a little careful and not over-interpret the results (especially not the component loadings), but I think it's too harsh to dismiss its use totally. I'm sure the hard-liners will flame me to pieces for this email, but I hope they will at least give me credit for my courage :-) Dr. Oyvind Hammer Geological Museum University of Oslo PCA Analysis assumes multivariate normality. Kathleen M. Robinette, Ph.D. Principal Research Anthropologist Air Force Research Laboratory == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
Useful, though sometimes technical, information, critiques, and expositions on the traditional use of ratios in morphometric analysis can be found in: Bookstein, F. L. 1991. Morphometric Tools for Landmark Data: Geometry and Biology. (The Orange Book) and Bookstein, F. L., Chernoff, B., Elder, R. Humphries, J., Smith, G., and Strauss, R. 1982. Morphometrics in Evolutionary Biology. The Geometry of Size and Shape Change, with Examples from Fishes. (The Red Book) Information on general multivariate methods can be found in a number of places, my favorites are: Krzanowski, W. J. 1996. Principles of Multivariate Analysis. A User's Perspective. - Readable, conversational text distinquished from technical details by font. Carroll, J. D. and Green, P. E. 1997. Mathematical Tools for Applied Multivariate Analysis - an excellant exposition of the geometry of multivariate analysis. And good summaries have been provided by our late friend in: Marcus, L. F. 1990. Traditional morphometrics. In Rohlf and Bookstein (eds.) Proceedings of the Michigan morphometrics workshop. (The Blue Book). Marcus, L. F. 1993. Some aspects of multivariate statistics for morphometrics. In Marcus, Bell, and Garci'a-Valdecasas (eds) Contributions to Morphometrics. (The Black Book) -ds -- Dennis E. Slice, Ph.D. Department of Biomedical Engineering Division of Radiologic Sciences Wake Forest University School of Medicine Winston-Salem, North Carolina, USA 27157-1022 Phone: 336-716-5384 Fax: 336-716-2870 == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
You may also try looking at: Bookstein FL (1989) 'Size and shape': a comment on semantics. Systematic Zoology 38:173-180. Marc Moniz -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, May 18, 2004 9:51 AM To: [EMAIL PROTECTED] Subject: Re: size correction discriminant functions analyses Brett: Darroch and Mosimann (1985) is a frequently-cited paper that talks about scale adjustment for both PCA and CVA. They use log-shape data that are ln-transformed ratios. That paper should be a useful starting point. Darroch JN Mosimann JE (1985) Canonical and principal components of shape. Biometrika 72:241-252. Good luck, Tim Cole At 10:09 AM 5/17/2004 -0400, you wrote: Dear morphometrician, I have recently reviewed 3 genera of catsharks that display a great deal of morphological conservation within the genera, however, there is also prominent sexual dimorphism present (profoundly so in some species). There is quite a bit of shape variation between juveniles and adults, in one genus in particular, but I think that the shape variation is being obscured by the size component. I have a sizeable morphometric data set (# measures # taxa specimens) and have used principal components analysis on the raw data to explore shape variation within each of the genera (not between). The first component was always a general component and accounted for more than 85-90% of the variation in most instances, therefore the bipolar components only contributed relatively little to the overall shape variation resulting in crowded PCA plots. The main reference I have used for the analyses to date has been 'Pimental. 1979. Morphometrics. The multivariate analysis of biological data' however, it doesn't deal with size correction. Can anyone suggest a review that deals with size correction, or can I convert my data to ratios and then log transform the data? I am also looking for reviews of canonical discriminant functions analysis and stepwise discriminant function analysis in an attempt to quantitate differences between species within a genus. Thanks for your help. Brett Brett Human Shark Researcher 27 Southern Ave West Beach SA 5024 Australia 61 8 8356 6891 [EMAIL PROTECTED] == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. Theodore M. Cole III, Ph.D. Department of Basic Medical Science School of Medicine University of Missouri - Kansas City 2411 Holmes St. Kansas City, MO 64108 USA Phone: (816) 235 -1829 FAX: (816) 235 - 6517 e-mail: [EMAIL PROTECTED] www: http://c.faculty.umkc.edu/colet == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
You mention that you have many more variables than specimens. As a result, you cannot use the various alternatives that you list. Discriminant functions, canonical variates, etc. all require that the pooled within-group covariance matrix be based on a sample size larger than the number of variables. If that is not true then the matrix will be singular and the analysis will blow up. Stepwise methods appear to get around this problem because they consider fewer variables at one time. Their main limitation is in interpretation: one cannot conclude that the variables in the best set are the important ones and the other variables are unimportant. One also cannot interpret the various probabilities produced by such methods as probabilities from usual tests of significance. They need to be adjusted for the fact they result from testing many combinations of variables and groups. They are just convenient indices. -- F. James Rohlf State University of New York, Stony Brook, NY 11794-5245 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, May 17, 2004 10:10 AM To: [EMAIL PROTECTED] Subject: size correction discriminant functions analyses Dear morphometrician, I have recently reviewed 3 genera of catsharks that display a great deal of morphological conservation within the genera, however, there is also prominent sexual dimorphism present (profoundly so in some species). There is quite a bit of shape variation between juveniles and adults, in one genus in particular, but I think that the shape variation is being obscured by the size component. I have a sizeable morphometric data set (# measures # taxa specimens) and have used principal components analysis on the raw data to explore shape variation within each of the genera (not between). The first component was always a general component and accounted for more than 85- 90% of the variation in most instances, therefore the bipolar components only contributed relatively little to the overall shape variation resulting in crowded PCA plots. The main reference I have used for the analyses to date has been 'Pimental. 1979. Morphometrics. The multivariate analysis of biological data' however, it doesn't deal with size correction. Can anyone suggest a review that deals with size correction, or can I convert my data to ratios and then log transform the data? I am also looking for reviews of canonical discriminant functions analysis and stepwise discriminant function analysis in an attempt to quantitate differences between species within a genus. Thanks for your help. Brett Brett Human Shark Researcher 27 Southern Ave West Beach SA 5024 Australia 61 8 8356 6891 [EMAIL PROTECTED] == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html. == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
Re: size correction discriminant functions analyses
Dear Brett and Marta, I think the problem you are encountering may not be the size-versus-shape issue, but a Normal distribution issue. PCA Analysis assumes multivariate normality. I know for human beings the distribution of men and women combined is often not Multivariate Normal. It is bi-modal and the male and female variance-covariance structure is different. This dramatically affects the correlation and covariance matrices and provides misleading components. I would assume this could be true for catsharks as well, and suspect that is why you found such a large amount of variation seemingly explained by your first component. We have found that for humans the lack of Normality is big enough that it requires doing separate PCA analyses for men and women, and in some cases separate analyses by ethnicity as well. In addition, it sounds to me that you have additional modes or non-normalities due to age. (I generally only work with adults.) Have you checked to see if your data is Normally d! istributed? If it isn't you could consider separating your samples into subgroups (gender and age groups) that are normally distributed, prior to PCA analysis. In other words, you would do a PCA analysis for each group, rather than just one PCA for all of them combined. I don't know how difficult this may be, not knowing your data. Or you might check into classification methods that do not depend upon the normality assumption. Most discriminant analyses also assume that the attributes of the entities within each group are Multivariate Normal, and that the variance-covariance structures of the entity attributes are equal across groups. You might be OK with the within-group normality assumption, but if there are important shape differences due to age or gender as you say then you may not be OK with the assumption of equal variance-covariance across groups. For example, there may be a strong correlation (covariance) between two attributes in younger growing catsharks that disappears when they reach adulthood. This would cause a difference in the covariance structure. You could break your data into groups and look at the differences/similarities in the variance/covariance matrices. This will tell you a lot about the similarities and differences between your groups as well. Hope this is helpful, Kathleen M. Robinette, Ph.D. Principal Research Anthropologist Air Force Research Laboratory AFRL/HEPA 2800 Q Street Wright-Patterson AFB, OH 45433-7947 (937) 255-8810 DSN 785-8810 FAX (937) 255-8752 e-mail:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, May 18, 2004 9:50 AM To: [EMAIL PROTECTED] Subject: Re: size correction discriminant functions analyses Dear Brett, I have the same problem. I found several approaches in the literature, bbut non efficient or clear review... well there were some, but too mathematic for me as a simple biologist. By what I know, it is complicated to work with ratios (which have difficult statistical properties). On the other way, you also have the problem of colinearity between variables (I imagine). I found some approaches to solve this, but none was universal or definitive. There is an article by Leonart et al. that proposes a simple formula, but it has been much discussed, and a statistical lecturer told me that it is not recent. On the other way, in Ade4 lab, I saw in the other day that they standartise the columns with the mean. I tred this, and it was very good... gave much clearer results. My supervisor said to use PCA, as it is and simply consider that the first component is 'size'... however this did not gave clear images of the data... thus I am as traped in the beggining. I suppose in the end all this hypothesis are possible and correct, and most will give very similar answers. I am also puzzled by the range of multivariate techniques, that give similar answers... particularly because in many cases different authors (and statistical packages) call the same techniques with different names, which really messes the things. I started to do a summary of it (which I can send you), of information I found in several books... as well, in the end, as I saw it now, things are much simpler, and mainly consist in a couple of method with variations, which arises different names. On the other way, people from the R list have discussed a lot stepwise analysis, and some do not recommend it at all... so some care should be taken in this point as well. Anyway, I can adive you of a free online manual from the VEGAN package (from www.R-project.org) which for me was very good and compares many methods using the same data: http://cc.oulu.fi/%7Ejarioksa/opetus/metodi/index.html hope this helps somehow, or at least shows solidarity with your question ;-) Please let me know if if you finally find 'a' answer :-) Best wishes, Marta
Re: size correction discriminant functions analyses
-- Dennis E. Slice, Ph.D. Department of Biomedical Engineering Division of Radiologic Sciences Wake Forest University School of Medicine Winston-Salem, North Carolina, USA 27157-1022 Phone: 336-716-5384 Fax: 336-716-2870 Sender: [EMAIL PROTECTED] Precedence: bulk Reply-To: [EMAIL PROTECTED] == Replies will be sent to list. For more information see http://life.bio.sunysb.edu/morph/morphmet.html.