Re: size correction discriminant functions analyses

2004-05-29 Thread Brett Human
G'day Dr. Kidd,

I've just started using a new computer and I put the wrong email address on
my signature. The one below is the correct address.

Did you have some comments about my last posting?

see ya,

Brett

*
Brett Human
Shark Researcher
27 Southern Ave
West Beach SA 5024
Australia
+61 8 8356 6891
[EMAIL PROTECTED]
*

==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


RE: size correction discriminant functions analyses

2004-05-26 Thread F. James Rohlf
A couple of observations:

Using a correlation matrix rather than a covariance matrix has nothing to do
with whether the data are normally distributed or not. One usually wants to
use a covariance matrix. However if the variables are in various units that
cannot be made consistent then one gives up and uses a correlation matrix.

The main issue about using some of the other methods that were suggested is
whether the groups (clusters) are a priori defined or are groups you are
trying to discover in the data. If you know the groups in advance then it
makes sense to consider CVA, CPCA, manova, etc. You then will run into the
problem that I mentioned in my prior message - you need more observations
than variables. 

The comments about normality were a bit off the point. With data such as you
describe there is no expectation that the entire data set be consistent with
a multivariate normal distribution. What you want is for the distributions
within the clusters to be normal. For those your sample sizes will be even
smaller so it is difficult to perform serious tests with data such as yours.
Since you want to find clusters of species you really do not want your
entire dataset to be consistent with sampling from a single normally
distributed population.

CVA = canonical variates analysis
NMDS = nonmetric multidimensional scaling analysis. 

NMDS would be a good thing to try on your data. It is similar to a PCA
ordination but is not constrained to the axes being linear functions of your
original variables. It usually does a better job of summarizing distances
between points in a low dimensional space.
 
-
F. James Rohlf -SUNY Stony Brook, NY 11794-5245
FAX: 1-631-632-7626 www: http://life.bio.sunysb.edu/ee/rohlf

 

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, May 26, 2004 1:08 AM
 To: [EMAIL PROTECTED]
 Subject: Re: size correction  discriminant functions analyses
 
 G'day all,
 Sender: [EMAIL PROTECTED]
 Precedence: bulk
 Reply-To: [EMAIL PROTECTED]
 
 Thanks to everyone for your comments. They've been a great 
 help, and I'm glad that my question sparked a bit of 
 discussion on the subject.
 
 After some pondering, I've got a few more questions and some 
 more details on the way I analysed my data. Although I was 
 looking for species clustering, I wasn't terribly concerned 
 with quantifying any clustering, and was using PCA more as a 
 visualisation technique to explore my data. In the future I 
 will try the various methods suggested to try to quantify the 
 clustering.
 
 Another thing was with regards to the issue of multivariate 
 normality. I did not use a variance-covariance matrix, 
 instead I used a correlation matrix. I was under the 
 assumption that by transforming the covariances into 
 z-scores, I would have a greater chance of my data being (or
 approaching) multivariate normality? Also, for testing if my 
 data is normally distributed, if I was to do separate PCA's 
 for each population and if a population was normally dist., 
 then would I expect to see an ellipsoid with it's greatest 
 length along PC1 in a PCA plot?
 
 With regards to obtaining singular matrices when # measures 
  # specimens, this did happen to me and the way I 'got 
 round' this was to first regress every measurement against 
 total length and then by looking at the slopes of the 
 regressions, chose which measurements showed the greatest 
 potential for between species differentiation. Because I was 
 using PCA just as a qualitative tool, I didn't think it was 
 much of a problem, however if I want to do quantitative 
 analysis such as discriminant analysis, can I still use this 
 same method of choosing measures, or am I restricted to 
 stepwise methods using the whole data set?
 
 Forgive my ignorance, but what is NMDS and CVA? I assume PCO 
 is principal coordinates analysis? I would also appreciate a 
 pdf of the Darroch  Mosimann paper if available.
 
 A final point, to perhaps spark more debate or at least to 
 motivate some thought, is that I have found it very difficult 
 to get a basic understanding of the application of 
 multivariate stats to morphometrics because the text books 
 available are very technical. An equation may be meaningful 
 to the gurus, but it doesn't mean a whole lot to me. It is 
 also one thing to describe how a procedure works, but it's 
 another thing to implement it when you are ignorant of the 
 software availble. I think there is a great need for a text 
 book that can introduce the new student to this field without 
 using equations to describe what's going on. There
 - I've said it, let the slaughter begin.
 
 Thanks,
 
 Brett
 
 *
 Brett Human
 Shark Researcher
 27 Southern Ave
 West Beach SA 5024
 Australia
 +61 8 8356 6891
 [EMAIL PROTECTED]
 *
 ==
 Replies will be sent to list.
 For more information see 
 http

Re: size correction discriminant functions analyses

2004-05-25 Thread morphmet
G'day all,
Sender: [EMAIL PROTECTED]
Precedence: bulk
Reply-To: [EMAIL PROTECTED]

Thanks to everyone for your comments. They've been a great help, and I'm
glad that my question sparked a bit of discussion on the subject.

After some pondering, I've got a few more questions and some more
details on the way I analysed my data. Although I was looking for
species clustering, I wasn't terribly concerned with quantifying any
clustering, and was using PCA more as a visualisation technique to
explore my data. In the future I will try the various methods suggested
to try to quantify the clustering.

Another thing was with regards to the issue of multivariate normality. I
did not use a variance-covariance matrix, instead I used a correlation
matrix. I was under the assumption that by transforming the covariances
into z-scores, I would have a greater chance of my data being (or
approaching) multivariate normality? Also, for testing if my data is
normally distributed, if I was to do separate PCA's for each population
and if a population was normally dist., then would I expect to see an
ellipsoid with it's greatest length along PC1 in a PCA plot?

With regards to obtaining singular matrices when # measures  #
specimens, this did happen to me and the way I 'got round' this was to
first regress every measurement against total length and then by looking
at the slopes of the regressions, chose which measurements showed the
greatest potential for between species differentiation. Because I was
using PCA just as a qualitative tool, I didn't think it was much of a
problem, however if I want to do quantitative analysis such as
discriminant analysis, can I still use this same method of choosing
measures, or am I restricted to stepwise methods using the whole data
set?

Forgive my ignorance, but what is NMDS and CVA? I assume PCO is
principal coordinates analysis? I would also appreciate a pdf of the
Darroch  Mosimann paper if available.

A final point, to perhaps spark more debate or at least to motivate some
thought, is that I have found it very difficult to get a basic
understanding of the application of multivariate stats to morphometrics
because the text books available are very technical. An equation may be
meaningful to the gurus, but it doesn't mean a whole lot to me. It is
also one thing to describe how a procedure works, but it's another thing
to implement it when you are ignorant of the software availble. I think
there is a great need for a text book that can introduce the new student
to this field without using equations to describe what's going on. There
- I've said it, let the slaughter begin.

Thanks,

Brett

*
Brett Human
Shark Researcher
27 Southern Ave
West Beach SA 5024
Australia
+61 8 8356 6891
[EMAIL PROTECTED]
*
==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


RE: size correction discriminant functions analyses

2004-05-25 Thread Dr Robert Kidd


This is for Brett Human - I have tried to respond to your latest posting but
the address you give is bouncing.

Rob Kidd
At: [EMAIL PROTECTED]


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Wednesday, 26 May 2004 9:08 AM
To: [EMAIL PROTECTED]
Subject: Re: size correction  discriminant functions analyses

G'day all,
Sender: [EMAIL PROTECTED]
Precedence: bulk
Reply-To: [EMAIL PROTECTED]

Thanks to everyone for your comments. They've been a great help, and I'm
glad that my question sparked a bit of discussion on the subject.

After some pondering, I've got a few more questions and some more
details on the way I analysed my data. Although I was looking for
species clustering, I wasn't terribly concerned with quantifying any
clustering, and was using PCA more as a visualisation technique to
explore my data. In the future I will try the various methods suggested
to try to quantify the clustering.

Another thing was with regards to the issue of multivariate normality. I
did not use a variance-covariance matrix, instead I used a correlation
matrix. I was under the assumption that by transforming the covariances
into z-scores, I would have a greater chance of my data being (or
approaching) multivariate normality? Also, for testing if my data is
normally distributed, if I was to do separate PCA's for each population
and if a population was normally dist., then would I expect to see an
ellipsoid with it's greatest length along PC1 in a PCA plot?

With regards to obtaining singular matrices when # measures  #
specimens, this did happen to me and the way I 'got round' this was to
first regress every measurement against total length and then by looking
at the slopes of the regressions, chose which measurements showed the
greatest potential for between species differentiation. Because I was
using PCA just as a qualitative tool, I didn't think it was much of a
problem, however if I want to do quantitative analysis such as
discriminant analysis, can I still use this same method of choosing
measures, or am I restricted to stepwise methods using the whole data
set?

Forgive my ignorance, but what is NMDS and CVA? I assume PCO is
principal coordinates analysis? I would also appreciate a pdf of the
Darroch  Mosimann paper if available.

A final point, to perhaps spark more debate or at least to motivate some
thought, is that I have found it very difficult to get a basic
understanding of the application of multivariate stats to morphometrics
because the text books available are very technical. An equation may be
meaningful to the gurus, but it doesn't mean a whole lot to me. It is
also one thing to describe how a procedure works, but it's another thing
to implement it when you are ignorant of the software availble. I think
there is a great need for a text book that can introduce the new student
to this field without using equations to describe what's going on. There
- I've said it, let the slaughter begin.

Thanks,

Brett

*
Brett Human
Shark Researcher
27 Southern Ave
West Beach SA 5024
Australia
+61 8 8356 6891
[EMAIL PROTECTED]
*
==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.

==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-21 Thread morphmet
 1) PCA makes no assumptions about the distribution (multivariate
normal
 or otherwise) of your data. It is a procedure that simply produces the
 linear combinations of variables with maximum variance subject to
 orthogonality to other such axes.

OK, but variance may or may not be a meaningful parameter
for non-normal data.


 If you are interested in size relationships, regress variables on some
 meaningful measure of size.

If I only had a meaningful measure of size ...
:-)


Oyvind Hammer
Geological Museum
University of Oslo
==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-21 Thread morphmet

Marta,

I have a pdf version of the Darroch  Mosimann Biometrika paper.  What
is your e-mail address so I can send it directly to you.

Marc Moniz
[EMAIL PROTECTED]

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 20, 2004 11:35 AM
To: [EMAIL PROTECTED]
Subject: Re: size correction  discriminant functions analyses


Dear collegues,
Sender: [EMAIL PROTECTED]
Precedence: bulk
Reply-To: [EMAIL PROTECTED]


About the above discussion on the linear measurements data for
multivariate analysis, I should state that most times my problem (and I
expect the problem of many people that wrks with it) is not of
rows/columns number (that most times is ok, at leats in the cases I saw)
nether of multivariate normality (I use R-project program, which as a
test of multivariate normality, so it is easy to test) or lack of
homogeneity of variances (this is a bit more dodgy, but the ref. I saw
state that if you test unniveriate variances homogeneity (e.g. 
Bartlett test) it shoud give a good indication of the data variances). 
The problem that (I supose) most biologists encounter are the
collinearity between variables... which strongly influences the
representation givn by the PCA. I think this also happens in the NMDS,
discriminant and canonical analysis.

I probably did not made myself clear in the email. I am sorry... For me,
it is very interesting that this things are debate in the list, and 
different people shows different solutions and bibliography, it is realy
nice.

In relation to the article from Biometrika, does anyone have the pdf? We
dont have the journal in this college. In relation to the robustmess of
the techniques to lack of normality, I agree with our colegue (so... I
share your feelings of daring to state it... jijijij ;-))

thank you for all,
Cheers,
Marta


-
This mail sent through IMP: http://horde.org/imp/



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-20 Thread morphmet
In my understanding to PCA, its main goal is to reduce the dimensionality of
a problem without the loss of too much information.  In other words,
according to Prof. Rohlf, the purpose of PCA is to give you a low
dimensional space that accounts for as much variation as possible. However,
I agree with Oyvind that many scientists use PCA as a visualization device,
projecting a multivariate data set onto a sheet of paper.

On the other hand, testing the multivariate normality before applying any
multivariate data analysis technique is one of the most serious problems
because in most cases none do that and if any tried to do he may choose the
wrong way. Actually, we (biologists and paleontologists)  need a definite
guide to follow when we face such problem.

Best regards

---
Dr. Ashraf M. T. Elewa
Associate Professor
Geology Department
Faculty of Science
Minia University
Egypt
[EMAIL PROTECTED]
http://myprofile.cos.com/aelewa
- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 04:29 ?
Subject: Re: size correction  discriminant functions analyses


 Just a comment on this one, from a pragmatic point of view.

 It is of course true that PCA is only *guaranteed* to
 produce components maximizing variance if you have
 multivariate normality. The theory of PCA is based on this
 assumption. But in many cases, PCA is used purely as a
 visualization device, projecting a multivariate data set
 onto a sheet of paper so we can see it. For visualization
 of non-normal data, one could play around with different
 techniques, such as PCA, PCO, NMDS, projection pursuit etc.,
 and then find that PCA does (or does not) perform well
 for the given data set. There is no law against making
 any linear combination you want of your variates, if it
 reveals information. For example, PCA may be perfectly
 adequate for resolving two well-separated groups, if
 the within-group variance is relatively small.

 Of course, when using PCA for non-normal data one must
 be a little careful and not over-interpret the results
 (especially not the component loadings), but I think
 it's too harsh to dismiss its use totally.

 I'm sure the hard-liners will flame me to pieces for
 this email, but I hope they will at least give me
 credit for my courage  :-)


 Dr. Oyvind Hammer
 Geological Museum
 University of Oslo



  PCA Analysis assumes multivariate normality.
 
  Kathleen M. Robinette, Ph.D.
  Principal Research Anthropologist
  Air Force Research Laboratory



 ==
 Replies will be sent to list.
 For more information see http://life.bio.sunysb.edu/morph/morphmet.html.




==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-20 Thread morphmet
Don't know what happened to cause the earlier message largely void of content, but I 
think the original communication was to correct the Red Book reference.
The date is 1985, not 1982. -ds

On Tue, 2004-05-18 at 14:12, [EMAIL PROTECTED] wrote:
 --
  Dennis E. Slice, Ph.D.
  Department of Biomedical Engineering
  Division of Radiologic Sciences
  Wake Forest University School of Medicine
  Winston-Salem, North Carolina, USA
  27157-1022
  Phone: 336-716-5384
  Fax: 336-716-2870
 Sender: [EMAIL PROTECTED]
 Precedence: bulk
 Reply-To: [EMAIL PROTECTED]
 
 
 ==
 Replies will be sent to list.
 For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
-- 
Dennis E. Slice, Ph.D.
Department of Biomedical Engineering
Division of Radiologic Sciences
Wake Forest University School of Medicine
Winston-Salem, North Carolina, USA 
27157-1022
Phone: 336-716-5384
Fax: 336-716-2870



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-20 Thread morphmet
Dear collegues,
Sender: [EMAIL PROTECTED]
Precedence: bulk
Reply-To: [EMAIL PROTECTED]


About the above discussion on the linear measurements data for multivariate 
analysis, I should state that most times my problem (and I expect the problem 
of many people that wrks with it) is not of rows/columns number (that most 
times is ok, at leats in the cases I saw) nether of multivariate normality (I 
use R-project program, which as a test of multivariate normality, so it is easy 
to test) or lack of homogeneity of variances (this is a bit more dodgy, but the 
ref. I saw state that if you test unniveriate variances homogeneity (e.g. 
Bartlett test) it shoud give a good indication of the data variances). 
The problem that (I supose) most biologists encounter are the collinearity 
between variables... which strongly influences the representation givn by the 
PCA. I think this also happens in the NMDS, discriminant and canonical analysis.

I probably did not made myself clear in the email. I am sorry...
For me, it is very interesting that this things are debate in the list, and 
different people shows different solutions and bibliography, it is realy nice.

In relation to the article from Biometrika, does anyone have the pdf? We dont 
have the journal in this college.
In relation to the robustmess of the techniques to lack of normality, I agree 
with our colegue (so... I share your feelings of daring to state it... 
jijijij ;-))

thank you for all,
Cheers,
Marta


-
This mail sent through IMP: http://horde.org/imp/



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-20 Thread morphmet
I applaud your courage, Dr. Hammer.  I hope everyone appreciates how intimidating this 
list of experts can be. 

I also agree with your point that PCA can be used when the data are not multivariate 
normal if you are just using it to visualize information, or if you just know what it 
is doing for that matter.  I am a fan of using any and all analyses that help in 
figuring out what is happening.  However, in order to understand the results and what 
you are visualizing you have to understand both the data input and what the 
statistical analysis is doing.  Sometimes the information that seems to be revealed is 
an artifact of violation of the assumptions and if the observer doesn't realize this 
it is very easy to come to the wrong conclusion.   

I thought, what was the analysis doing and how to interpret it were the original 
questions we were discussing, although I admit to reading the e-mails quickly.The 
original e-mail indicated that perhaps size and shape confounding was causing their 
odd looking results.  If the shapes are the same, but the sizes are different then the 
source of the non-normality would be multiple modes only.  This may not be a serious 
enough violation to cause interpretability problems.  However, it sounded to me from 
the description of the problem and the results that in addition to multiple modes 
there are multiple variance/covariance matrices. That was making it difficult to 
interpret the results, and since PCA is based upon the variance/covariance will result 
in difficult to interpret or even invalid components.  Separating the analysis into 
subgroups will allow them to visualize and test the differences in the modes and in 
the variance/covariance matrices and in that way understand!
  the source of the differences in the groups.  

Maybe the common PCA analysis someone else mentioned might do this as well.  I am 
not familiar with that method.

Thanx all again for your attention and patience,
Kath



Kathleen M. Robinette, Ph.D.
Principal Research Anthropologist
Air Force Research Laboratory
AFRL/HEPA
2800 Q Street
Wright-Patterson AFB, OH 45433-7947
(937) 255-8810
DSN 785-8810
FAX (937) 255-8752
e-mail:[EMAIL PROTECTED] 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 9:29 AM
To: [EMAIL PROTECTED]
Subject: Re: size correction  discriminant functions analyses


Just a comment on this one, from a pragmatic point of view.

It is of course true that PCA is only *guaranteed* to
produce components maximizing variance if you have
multivariate normality. The theory of PCA is based on this assumption. But in many 
cases, PCA is used purely as a visualization device, projecting a multivariate data 
set onto a sheet of paper so we can see it. For visualization of non-normal data, one 
could play around with different techniques, such as PCA, PCO, NMDS, projection 
pursuit etc., and then find that PCA does (or does not) perform well for the given 
data set. There is no law against making any linear combination you want of your 
variates, if it reveals information. For example, PCA may be perfectly adequate for 
resolving two well-separated groups, if the within-group variance is relatively small.

Of course, when using PCA for non-normal data one must
be a little careful and not over-interpret the results (especially not the component 
loadings), but I think it's too harsh to dismiss its use totally.

I'm sure the hard-liners will flame me to pieces for
this email, but I hope they will at least give me
credit for my courage  :-)


Dr. Oyvind Hammer
Geological Museum
University of Oslo



 PCA Analysis assumes multivariate normality.

 Kathleen M. Robinette, Ph.D.
 Principal Research Anthropologist
 Air Force Research Laboratory



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-20 Thread morphmet
Dr. Hammer, Please consider your courage credited. -ds

A couple of points about PCA in general:

1) PCA makes no assumptions about the distribution (multivariate normal
or otherwise) of your data. It is a procedure that simply produces the
linear combinations of variables with maximum variance subject to
orthogonality to other such axes. Distribution assumptions only come
into play for (some) significance testing procedures.

2) PC1 will only identify size variation if size variation is the source
of the greatest variation in your sample. Sex, species, habitat, etc.
could all be determinants (not in the matrix sense 8-) ) of PC1 or some
combination of these.

In general, if you have data with some extreme outlier (e.g,
transcription error), then the PC1 will (probably) just point to (or pi
radians away from) the direction of that outlier relative to the main
sample, which will still be the linear combination of maximum variance.

What people often want PCA to do is either a) identify iso/allometry
due to size variation in a sample or b) separate out sexes, species, or
other groups. PCA is optimal for neither of these and could be quite
misleading in both cases.

If you are interested in size relationships, regress variables on some
meaningful measure of size. If you are interested in group differences,
look into CVA. 

If you have many more variables than specimens, you might do either of
the above in a reduced PCA space if you check carefully to see if your
limited data suggest you are capturing salient aspects of a space of
reduced dimension resulting from the tight correlations amongst your
variables. Otherwise, you must wave your hands vigorously before
proceeding.

See Marcus 1990 Blue Book chapter for a nice discussion of PCA and
related methods. 

Books by Jackson and Joliffe and other authors specifically on Principal Components 
are available.

-ds


On Wed, 2004-05-19 at 09:29, [EMAIL PROTECTED] wrote:
 Just a comment on this one, from a pragmatic point of view.
 
 It is of course true that PCA is only *guaranteed* to
 produce components maximizing variance if you have
 multivariate normality. The theory of PCA is based on this
 assumption. But in many cases, PCA is used purely as a
 visualization device, projecting a multivariate data set
 onto a sheet of paper so we can see it. For visualization
 of non-normal data, one could play around with different
 techniques, such as PCA, PCO, NMDS, projection pursuit etc.,
 and then find that PCA does (or does not) perform well
 for the given data set. There is no law against making
 any linear combination you want of your variates, if it
 reveals information. For example, PCA may be perfectly
 adequate for resolving two well-separated groups, if
 the within-group variance is relatively small.
 
 Of course, when using PCA for non-normal data one must
 be a little careful and not over-interpret the results
 (especially not the component loadings), but I think
 it's too harsh to dismiss its use totally.
 
 I'm sure the hard-liners will flame me to pieces for
 this email, but I hope they will at least give me
 credit for my courage  :-)
 
 
 Dr. Oyvind Hammer
 Geological Museum
 University of Oslo
 
 
 
  PCA Analysis assumes multivariate normality.
 
  Kathleen M. Robinette, Ph.D.
  Principal Research Anthropologist
  Air Force Research Laboratory
 
 
 
 ==
 Replies will be sent to list.
 For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
-- 
Dennis E. Slice, Ph.D.
Department of Biomedical Engineering
Division of Radiologic Sciences
Wake Forest University School of Medicine
Winston-Salem, North Carolina, USA 
27157-1022
Phone: 336-716-5384
Fax: 336-716-2870



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-19 Thread morphmet
Dear Brett,
If the problem is separating size and shape, then, fortunately, in my edited
book titled Morphometrics- Applications in Biology and Paleontology
(Springer-Verlag, 2004) you will find a chapter that is written by
Garcia-Rodriguez et al. They used the Sheared PCA analysis and could
successfully separate size and shape as separate components. Although there
are more recent techniques for doing that, however, I recommend you to read
this chapter for knowing how they could separate size and shape using an
excellent and easy manner.
Best regards.
Ashraf

---
Dr. Ashraf M. T. Elewa
Associate Professor
Geology Department
Faculty of Science
Minia University
Egypt
[EMAIL PROTECTED]
http://myprofile.cos.com/aelewa
- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, May 17, 2004 05:09 ?
Subject: size correction  discriminant functions analyses


 Dear morphometrician,

 I have recently reviewed 3 genera of catsharks that display a great deal
of morphological conservation within the genera, however, there is also
prominent sexual dimorphism present (profoundly so in some species). There
is quite a bit of shape variation between juveniles and adults, in one genus
in particular, but I think that the shape variation is being obscured by the
size component.

 I have a sizeable morphometric data set (# measures  # taxa  specimens)
and have used principal components analysis on the raw data to explore shape
variation within each of the genera (not between). The first component was
always a general component and accounted for more than 85-90% of the
variation in most instances, therefore the bipolar components only
contributed relatively little to the overall shape variation resulting in
crowded PCA plots.

 The main reference I have used for the analyses to date has been
'Pimental. 1979. Morphometrics. The multivariate analysis of biological
data' however, it doesn't deal with size correction. Can anyone suggest a
review that deals with size correction, or can I convert my data to ratios
and then log transform the data?

 I am also looking for reviews of canonical discriminant functions analysis
and stepwise discriminant function analysis in an attempt to quantitate
differences between species within a genus.

 Thanks for your help.

 Brett

 
  Brett Human
  Shark Researcher
  27 Southern Ave
  West Beach SA 5024
  Australia
  61 8 8356 6891
  [EMAIL PROTECTED]
  



 ==
 Replies will be sent to list.
 For more information see http://life.bio.sunysb.edu/morph/morphmet.html.




==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-19 Thread morphmet
There is a method called common PCA which seems to overcome the problem of
non-multinormality of overall sample that includes several subsamples all
with different central momenta. The source to read is:

Flury B. 1988. Common principal components and relatÃ…d multivariate models.
NY: Wiley. 258 p.

Cheers,

Igor
- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, May 18, 2004 10:09 PM
Subject: Re: size correction  discriminant functions analyses


 Dear Brett and Marta,

 I think the problem you are encountering may not be the size-versus-shape
issue, but a Normal distribution issue.  PCA Analysis assumes multivariate
normality.  I know for human beings the distribution of men and women
combined is often not Multivariate Normal.  It is bi-modal and the male and
female variance-covariance structure is different.  This dramatically
affects the correlation and covariance matrices and provides misleading
components.  I would assume this could be true for catsharks as well, and
suspect that is why you found such a large amount of variation seemingly
explained by your first component.  We have found that for humans the lack
of Normality is big enough that it requires doing separate PCA analyses for
men and women, and in some cases separate analyses by ethnicity as well.  In
addition, it sounds to me that you have additional modes or non-normalities
due to age.  (I generally only work with adults.)  Have you checked to see
if your data is Normally d!
  istributed?  If it isn't you could consider separating your samples into
subgroups (gender and age groups) that are normally distributed, prior to
PCA analysis.  In other words, you would do a PCA analysis for each group,
rather than just one PCA for all of them combined.   I don't know how
difficult this may be, not knowing your data.  Or you might check into
classification methods that do not depend upon the normality assumption.

 Most discriminant analyses also assume that the attributes of the entities
within each group are Multivariate Normal, and that the variance-covariance
structures of the entity attributes are equal across groups.  You might be
OK with the within-group normality assumption, but if there are important
shape differences due to age or gender as you say then you may not be OK
with the assumption of equal variance-covariance across groups.   For
example, there may be a strong correlation (covariance) between two
attributes in younger growing catsharks that disappears when they reach
adulthood.  This would cause a difference in the covariance structure.  You
could break your data into groups and look at the differences/similarities
in the variance/covariance matrices.  This will tell you a lot about the
similarities and differences between your groups as well.

 Hope this is helpful,

 Kathleen M. Robinette, Ph.D.
 Principal Research Anthropologist
 Air Force Research Laboratory
 AFRL/HEPA
 2800 Q Street
 Wright-Patterson AFB, OH 45433-7947
 (937) 255-8810
 DSN 785-8810
 FAX (937) 255-8752
 e-mail:[EMAIL PROTECTED]



 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of [EMAIL PROTECTED]
 Sent: Tuesday, May 18, 2004 9:50 AM
 To: [EMAIL PROTECTED]
 Subject: Re: size correction  discriminant functions analyses


 Dear Brett,

 I have the same problem. I found several approaches in the literature,
bbut non
 efficient or clear review... well there were some, but too mathematic for
me as
 a simple biologist.
 By what I know, it is complicated to work with ratios (which have
difficult
 statistical properties). On the other way, you also have the problem of
 colinearity between variables (I imagine).
 I found some approaches to solve this, but none was universal or
definitive.
 There is an article by Leonart et al.
 that proposes a simple formula, but it has been much discussed, and a
 statistical lecturer told me that it is not recent.
 On the other way, in Ade4 lab, I saw in the other day that they
standartise the
 columns with the mean. I tred this, and it was very good... gave much
clearer
 results.
 My supervisor said to use PCA, as it is and simply consider that the first
 component is 'size'... however this did not gave clear images of the
data...
 thus I am as traped in the beggining. I suppose in the end all this
hypothesis
 are possible and correct, and most will give very similar answers.

 I am also puzzled by the range of multivariate techniques, that give
similar
 answers... particularly because in many cases different authors (and
 statistical packages) call the same techniques with different names, which
 really messes the things. I started to do a summary of it (which I can
send
 you), of information I found in several books... as well, in the end, as I
saw
 it now, things are much simpler, and mainly consist in a couple of method
with
 variations, which arises different names. On the other way, people from
the R
 list have discussed a lot stepwise analysis

Re: size correction discriminant functions analyses

2004-05-19 Thread morphmet
Just a comment on this one, from a pragmatic point of view.

It is of course true that PCA is only *guaranteed* to
produce components maximizing variance if you have
multivariate normality. The theory of PCA is based on this
assumption. But in many cases, PCA is used purely as a
visualization device, projecting a multivariate data set
onto a sheet of paper so we can see it. For visualization
of non-normal data, one could play around with different
techniques, such as PCA, PCO, NMDS, projection pursuit etc.,
and then find that PCA does (or does not) perform well
for the given data set. There is no law against making
any linear combination you want of your variates, if it
reveals information. For example, PCA may be perfectly
adequate for resolving two well-separated groups, if
the within-group variance is relatively small.

Of course, when using PCA for non-normal data one must
be a little careful and not over-interpret the results
(especially not the component loadings), but I think
it's too harsh to dismiss its use totally.

I'm sure the hard-liners will flame me to pieces for
this email, but I hope they will at least give me
credit for my courage  :-)


Dr. Oyvind Hammer
Geological Museum
University of Oslo



 PCA Analysis assumes multivariate normality.

 Kathleen M. Robinette, Ph.D.
 Principal Research Anthropologist
 Air Force Research Laboratory



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-18 Thread morphmet
Useful, though sometimes technical, information, critiques, and
expositions on the traditional use of ratios in morphometric analysis
can be found in:

Bookstein, F. L. 1991. Morphometric Tools for Landmark Data: Geometry
and Biology. (The Orange Book)

and

Bookstein, F. L., Chernoff, B., Elder, R. Humphries, J., Smith, G., and
Strauss, R. 1982. Morphometrics in Evolutionary Biology. The Geometry of
Size and Shape Change, with Examples from Fishes. (The Red Book)

Information on general multivariate methods can be found in a number of
places, my favorites are:

Krzanowski, W. J. 1996. Principles of Multivariate Analysis. A User's
Perspective. - Readable, conversational text distinquished from
technical details by font.

Carroll, J. D. and Green, P. E. 1997. Mathematical Tools for Applied
Multivariate Analysis - an excellant exposition of the geometry of
multivariate analysis.

And good summaries have been provided by our late friend in:

Marcus, L. F. 1990. Traditional morphometrics. In Rohlf and Bookstein
(eds.) Proceedings of the Michigan morphometrics workshop. (The Blue
Book).

Marcus, L. F. 1993. Some aspects of multivariate statistics for
morphometrics. In Marcus, Bell, and Garci'a-Valdecasas (eds)
Contributions to Morphometrics. (The Black Book)

-ds

-- 
Dennis E. Slice, Ph.D.
Department of Biomedical Engineering
Division of Radiologic Sciences
Wake Forest University School of Medicine
Winston-Salem, North Carolina, USA 
27157-1022
Phone: 336-716-5384
Fax: 336-716-2870



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-18 Thread morphmet
You may also try looking at:

Bookstein FL (1989)  'Size and shape': a comment on semantics.
Systematic Zoology 38:173-180.


Marc Moniz


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, May 18, 2004 9:51 AM
To: [EMAIL PROTECTED]
Subject: Re: size correction  discriminant functions analyses


Brett:
 Darroch and Mosimann (1985) is a frequently-cited paper that
talks 
about scale adjustment for both PCA and CVA.  They use log-shape data 
that are ln-transformed ratios.  That paper should be a useful starting
point.

Darroch JN  Mosimann JE  (1985) Canonical and principal components of 
shape.  Biometrika 72:241-252.

Good luck,
Tim Cole


At 10:09 AM 5/17/2004 -0400, you wrote:
Dear morphometrician,

I have recently reviewed 3 genera of catsharks that display a great 
deal
of morphological conservation within the genera, however, there is also

prominent sexual dimorphism present (profoundly so in some species).
There 
is quite a bit of shape variation between juveniles and adults, in one 
genus in particular, but I think that the shape variation is being 
obscured by the size component.

I have a sizeable morphometric data set (# measures  # taxa  
specimens)
and have used principal components analysis on the raw data to explore 
shape variation within each of the genera (not between). The first 
component was always a general component and accounted for more than 
85-90% of the variation in most instances, therefore the bipolar 
components only contributed relatively little to the overall shape 
variation resulting in crowded PCA plots.

The main reference I have used for the analyses to date has been
'Pimental. 1979. Morphometrics. The multivariate analysis of biological

data' however, it doesn't deal with size correction. Can anyone suggest
a 
review that deals with size correction, or can I convert my data to
ratios 
and then log transform the data?

I am also looking for reviews of canonical discriminant functions 
analysis
and stepwise discriminant function analysis in an attempt to quantitate

differences between species within a genus.

Thanks for your help.

Brett


  Brett Human
  Shark Researcher
  27 Southern Ave
  West Beach SA 5024
  Australia
  61 8 8356 6891
  [EMAIL PROTECTED]
  



==
Replies will be sent to list.
For more information see 
http://life.bio.sunysb.edu/morph/morphmet.html.


Theodore M. Cole III, Ph.D.
Department of Basic Medical Science
School of Medicine
University of Missouri - Kansas City
2411 Holmes St.
Kansas City, MO  64108
USA

Phone: (816) 235 -1829
FAX: (816) 235 - 6517
e-mail: [EMAIL PROTECTED]
www:  http://c.faculty.umkc.edu/colet





==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.





==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-18 Thread morphmet
You mention that you have many more variables than specimens. As a result,
you cannot use the various alternatives that you list. Discriminant
functions, canonical variates, etc. all require that the pooled within-group
covariance matrix be based on a sample size larger than the number of
variables. If that is not true then the matrix will be singular and the
analysis will blow up.  

Stepwise methods appear to get around this problem because they consider
fewer variables at one time. Their main limitation is in interpretation: one
cannot conclude that the variables in the best set are the important ones
and the other variables are unimportant. One also cannot interpret the
various probabilities produced by such methods as probabilities from usual
tests of significance. They need to be adjusted for the fact they result
from testing many combinations of variables and groups. They are just
convenient indices.

--
F. James Rohlf
State University of New York, Stony Brook, NY 11794-5245

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
 Behalf Of [EMAIL PROTECTED]
 Sent: Monday, May 17, 2004 10:10 AM
 To: [EMAIL PROTECTED]
 Subject: size correction  discriminant functions analyses
 
 Dear morphometrician,
 
 I have recently reviewed 3 genera of catsharks that display a great deal
 of morphological conservation within the genera, however, there is also
 prominent sexual dimorphism present (profoundly so in some species). There
 is quite a bit of shape variation between juveniles and adults, in one
 genus in particular, but I think that the shape variation is being
 obscured by the size component.
 
 I have a sizeable morphometric data set (# measures  # taxa  specimens)
 and have used principal components analysis on the raw data to explore
 shape variation within each of the genera (not between). The first
 component was always a general component and accounted for more than 85-
 90% of the variation in most instances, therefore the bipolar components
 only contributed relatively little to the overall shape variation
 resulting in crowded PCA plots.
 
 The main reference I have used for the analyses to date has been
 'Pimental. 1979. Morphometrics. The multivariate analysis of biological
 data' however, it doesn't deal with size correction. Can anyone suggest a
 review that deals with size correction, or can I convert my data to ratios
 and then log transform the data?
 
 I am also looking for reviews of canonical discriminant functions analysis
 and stepwise discriminant function analysis in an attempt to quantitate
 differences between species within a genus.
 
 Thanks for your help.
 
 Brett
 
 
  Brett Human
  Shark Researcher
  27 Southern Ave
  West Beach SA 5024
  Australia
  61 8 8356 6891
  [EMAIL PROTECTED]
  
 
 
 
 ==
 Replies will be sent to list.
 For more information see http://life.bio.sunysb.edu/morph/morphmet.html.




==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.


Re: size correction discriminant functions analyses

2004-05-18 Thread morphmet
Dear Brett and Marta,

I think the problem you are encountering may not be the size-versus-shape issue, but a 
Normal distribution issue.  PCA Analysis assumes multivariate normality.  I know for 
human beings the distribution of men and women combined is often not Multivariate 
Normal.  It is bi-modal and the male and female variance-covariance structure is 
different.  This dramatically affects the correlation and covariance matrices and 
provides misleading components.  I would assume this could be true for catsharks as 
well, and suspect that is why you found such a large amount of variation seemingly 
explained by your first component.  We have found that for humans the lack of 
Normality is big enough that it requires doing separate PCA analyses for men and 
women, and in some cases separate analyses by ethnicity as well.  In addition, it 
sounds to me that you have additional modes or non-normalities due to age.  (I 
generally only work with adults.)  Have you checked to see if your data is Normally d!
 istributed?  If it isn't you could consider separating your samples into subgroups 
(gender and age groups) that are normally distributed, prior to PCA analysis.  In 
other words, you would do a PCA analysis for each group, rather than just one PCA for 
all of them combined.   I don't know how difficult this may be, not knowing your data. 
 Or you might check into classification methods that do not depend upon the normality 
assumption.  

Most discriminant analyses also assume that the attributes of the entities within each 
group are Multivariate Normal, and that the variance-covariance structures of the 
entity attributes are equal across groups.  You might be OK with the within-group 
normality assumption, but if there are important shape differences due to age or 
gender as you say then you may not be OK with the assumption of equal 
variance-covariance across groups.   For example, there may be a strong correlation 
(covariance) between two attributes in younger growing catsharks that disappears when 
they reach adulthood.  This would cause a difference in the covariance structure.  You 
could break your data into groups and look at the differences/similarities in the 
variance/covariance matrices.  This will tell you a lot about the similarities and 
differences between your groups as well.  

Hope this is helpful,

Kathleen M. Robinette, Ph.D.
Principal Research Anthropologist
Air Force Research Laboratory
AFRL/HEPA
2800 Q Street
Wright-Patterson AFB, OH 45433-7947
(937) 255-8810
DSN 785-8810
FAX (937) 255-8752
e-mail:[EMAIL PROTECTED] 



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, May 18, 2004 9:50 AM
To: [EMAIL PROTECTED]
Subject: Re: size correction  discriminant functions analyses


Dear Brett,

I have the same problem. I found several approaches in the literature, bbut non 
efficient or clear review... well there were some, but too mathematic for me as 
a simple biologist. 
By what I know, it is complicated to work with ratios (which have difficult 
statistical properties). On the other way, you also have the problem of 
colinearity between variables (I imagine).
I found some approaches to solve this, but none was universal or definitive. 
There is an article by Leonart et al.
that proposes a simple formula, but it has been much discussed, and a 
statistical lecturer told me that it is not recent.
On the other way, in Ade4 lab, I saw in the other day that they standartise the 
columns with the mean. I tred this, and it was very good... gave much clearer 
results.
My supervisor said to use PCA, as it is and simply consider that the first 
component is 'size'... however this did not gave clear images of the data... 
thus I am as traped in the beggining. I suppose in the end all this hypothesis 
are possible and correct, and most will give very similar answers.

I am also puzzled by the range of multivariate techniques, that give similar 
answers... particularly because in many cases different authors (and 
statistical packages) call the same techniques with different names, which 
really messes the things. I started to do a summary of it (which I can send 
you), of information I found in several books... as well, in the end, as I saw 
it now, things are much simpler, and mainly consist in a couple of method with 
variations, which arises different names. On the other way, people from the R 
list have discussed a lot stepwise analysis, and some do not recommend it at 
all... so some care should be taken in this point as well. Anyway, I can adive you of 
a free online manual from the VEGAN package (from 
www.R-project.org) which for me was very good and compares many methods using 
the same data: http://cc.oulu.fi/%7Ejarioksa/opetus/metodi/index.html

hope this helps somehow, or at least shows solidarity with your question ;-)

Please let me know if if you finally find 'a' answer :-)

Best wishes,

Marta

Re: size correction discriminant functions analyses

2004-05-18 Thread morphmet
--
 Dennis E. Slice, Ph.D.
 Department of Biomedical Engineering
 Division of Radiologic Sciences
 Wake Forest University School of Medicine
 Winston-Salem, North Carolina, USA
 27157-1022
 Phone: 336-716-5384
 Fax: 336-716-2870
Sender: [EMAIL PROTECTED]
Precedence: bulk
Reply-To: [EMAIL PROTECTED]


==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.