RE: [MORPHMET] Mahalanobis distance in cluster analysis of shape variables

2016-01-30 Thread F. James Rohlf
The distinction is that Mahalanobis distance should be thought of as a 
statistical distance. For a single variable it is like a z-score (a difference 
divided by a standard deviation). It is not a measure of the absolute amount of 
difference. In the multivariate case Mahalanobis distance is relative to the 
amount of the amount of variation in the direction of the difference (that is 
what taking into account within-group covariation gives you).

 

Both Mahalanobis and Euclidean distances are valid. It depends on what you wish 
“distance” to mean. In morphometrics do you want to cluster based on how 
similar shapes are (in terms of  distance in Kendall shape space) or based on 
the degree of statistical overlap in population samples (e.g., the degree to 
which specimens from the two groups might be misidentified).

 

A practical problem with Mahalanobis distance in many morphometric studies is 
that it requires large sample sizes within groups because landmark data is 
usually high dimensional and thus very large samples are needed for reliable 
results.

 



F. James Rohlf, Distinguished Professor, Emeritus. Ecology & Evolution

Research Professor, Anthropology

Stony Brook University

 

From: Elahep [mailto:ellie.parv...@gmail.com] 
Sent: Saturday, January 30, 2016 7:14 AM
To: MORPHMET 
Cc: ellie.parv...@gmail.com; jkunk...@une.edu
Subject: Re: [MORPHMET] Mahalanobis distance in cluster analysis of shape 
variables

 

Dear Joseph,

 

Thanks for your detailed explanation. As it is recommended by Claude in 
"morphometrics with R" (2008) it's better to use the Mahalanobis distance for 
clustering group means, because this will be scaled by the within-group 
variance-covariance. In my analysis, I calculated the mean value of relative 
warp scores for each population and then carried out a UPGMA cluster analysis 
based on the Euclidian distance and results were satisfying for me and they 
were congruent with my other results. According to the book and other articles 
I ran the same analysis but based on the Mahalanobis distance in PAST software, 
but unfortunately whenever I ran the analysis the software error "Invalid 
floating point operation" appeared!! so I couldn't see the Mahalanobis's 
cluster!! (I couldn't realize why this error happens)

Euclidian distance worked for me, but I was just curious to understand if my 
analyses is statistically meaningful!!

 

Thanks again for your answer,

Elahe

On Saturday, January 30, 2016 at 5:12:46 PM UTC+3:30, Joseph Kunkel wrote:

I can not speak directly to why it is frequently used in GM cluster analysis 
but I would like to mention how I look at Mahalanobis distance based on its 
calculation. 

Mahalanobis distance is not a pure distance metric like Euclidian or Manhattan 
distance, as you have stated it is ‘standardized’.  What doe that really mean?  
It sounds supeficially good. 

One way of computing it is to rotate the k-landmark data set to simplest form 
treating the landmarks as factors.  This way would consider all landmarks to 
have a common covariance structure in XY or XYZ in three dimensions.  That is a 
already a streetch, since not all landmarks can be assumed to have the same 
covariance structure.  In addition the landmarks have all been already centered 
about their centroid and rotated to coincide, which has eliminated a dgeree of 
freedom of variability that can have consequences.   

Furthermore not all species landmarks can be expected to have the same 
covariance structure, which is an assumption made in the ordinary Mahalanobis 
distance application to strut analysis between populations or species.  The 
assumption of similar data structure of course applies to the null hypothesis 
where there is no difference.  The typical statistical test explodes when the 
null hypothesis is falsified so just when you want the Mahalanobis distance 
metric to be accurate it starts misbehaving. 

After rotation to simplest axes one does an 1 df F-test between each of the 
landmarks.  These tests are all independent so they can be summed together to 
produce a k df F-test which is Mahalonobis D squared.So Mahalonobis D is 
the square root of the sum of independent F-tests, but those F-tests are based 
on all sorts of assumptions about the variance of the landmarks.  I immagine on 
could modify calculation of D by limiting the sum over the top 95 or 99% 
variance components of the principal components. 

Many times applications of analytical techniques are judged by whether they 
‘work’ or not.   If a clustering method works for you, use it(?).  I am of the 
opinion that I use statistics to convince myself rather than the audience.   A 
confluence on many arguments is used to make a case. 

Joe 

-·.  .· ·.  .><º>·.  .· ·.  .><º>·.  .· ·.  .><º> .··.· >=-   
=º}>< 
Joseph G. Kunkel, Research Professor 
UNE Biddeford ME 04005 

Re: [MORPHMET] Mahalanobis distance in cluster analysis of shape variables

2016-01-30 Thread Elahep
Dear Joseph,

Thanks for your detailed explanation. As it is recommended by Claude in 
"morphometrics with R" (2008) it's better to use the Mahalanobis distance 
for clustering group means, because this will be scaled by the within-group 
variance-covariance. In my analysis, I calculated the mean value of 
relative warp scores for each population and then carried out a UPGMA 
cluster analysis based on the Euclidian distance and results were 
satisfying for me and they were congruent with my other results. According 
to the book and other articles I ran the same analysis but based on the 
Mahalanobis distance in PAST software, but unfortunately whenever I ran the 
analysis the software error "Invalid floating point operation" appeared!! 
so I couldn't see the Mahalanobis's cluster!! (I couldn't realize why this 
error happens)
Euclidian distance worked for me, but I was just curious to understand if 
my analyses is statistically meaningful!!

Thanks again for your answer,
Elahe

On Saturday, January 30, 2016 at 5:12:46 PM UTC+3:30, Joseph Kunkel wrote:
>
> I can not speak directly to why it is frequently used in GM cluster 
> analysis but I would like to mention how I look at Mahalanobis distance 
> based on its calculation. 
>
> Mahalanobis distance is not a pure distance metric like Euclidian or 
> Manhattan distance, as you have stated it is ‘standardized’.  What doe that 
> really mean?  It sounds supeficially good. 
>
> One way of computing it is to rotate the k-landmark data set to simplest 
> form treating the landmarks as factors.  This way would consider all 
> landmarks to have a common covariance structure in XY or XYZ in three 
> dimensions.  That is a already a streetch, since not all landmarks can be 
> assumed to have the same covariance structure.  In addition the landmarks 
> have all been already centered about their centroid and rotated to 
> coincide, which has eliminated a dgeree of freedom of variability that can 
> have consequences.   
>
> Furthermore not all species landmarks can be expected to have the same 
> covariance structure, which is an assumption made in the ordinary 
> Mahalanobis distance application to strut analysis between populations or 
> species.  The assumption of similar data structure of course applies to the 
> null hypothesis where there is no difference.  The typical statistical test 
> explodes when the null hypothesis is falsified so just when you want the 
> Mahalanobis distance metric to be accurate it starts misbehaving. 
>
> After rotation to simplest axes one does an 1 df F-test between each of 
> the landmarks.  These tests are all independent so they can be summed 
> together to produce a k df F-test which is Mahalonobis D squared.So 
> Mahalonobis D is the square root of the sum of independent F-tests, but 
> those F-tests are based on all sorts of assumptions about the variance of 
> the landmarks.  I immagine on could modify calculation of D by limiting the 
> sum over the top 95 or 99% variance components of the principal components. 
>
> Many times applications of analytical techniques are judged by whether 
> they ‘work’ or not.   If a clustering method works for you, use it(?).  I 
> am of the opinion that I use statistics to convince myself rather than the 
> audience.   A confluence on many arguments is used to make a case. 
>
> Joe 
>
> -·.  .· ·.  .><º>·.  .· ·.  .><º>·.  .· ·.  .><º> .··.· >=-   
> =º}>< 
> Joseph G. Kunkel, Research Professor 
> UNE Biddeford ME 04005 
> http://www.bio.umass.edu/biology/kunkel/ 
>
> > On Jan 30, 2016, at 7:11 AM, Elahep  
> wrote: 
> > 
> > 
> > Hello all, 
> > 
> > 
> > 
> > I have seen in many GM articles people use Mahalanobis distance for 
> cluster analysis. What is the advantage of using Mahalanobis distance over 
> Euclidian distance as similarity measure in cluster analysis of shape 
> variables? 
> > 
> > As far as I know Mahalanobis distance is the standardized form of 
> Euclidean distance which standardized data with adjustments made for 
> correlation between variables and weights all variables equally. 
> > 
> > Why this distance measure is frequently used in GM cluster analysis?? 
> > 
> > 
> > 
> > Thanks in advance 
> > 
> > Elahe 
> > 
> > 
> > -- 
> > MORPHMET may be accessed via its webpage at http://www.morphometrics.org 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "MORPHMET" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to morphmet+u...@morphometrics.org . 
>
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Mahalanobis distance in cluster analysis of shape variables

2016-01-30 Thread Joseph Kunkel
I can not speak directly to why it is frequently used in GM cluster analysis 
but I would like to mention how I look at Mahalanobis distance based on its 
calculation.

Mahalanobis distance is not a pure distance metric like Euclidian or Manhattan 
distance, as you have stated it is ‘standardized’.  What doe that really mean?  
It sounds supeficially good.

One way of computing it is to rotate the k-landmark data set to simplest form 
treating the landmarks as factors.  This way would consider all landmarks to 
have a common covariance structure in XY or XYZ in three dimensions.  That is a 
already a streetch, since not all landmarks can be assumed to have the same 
covariance structure.  In addition the landmarks have all been already centered 
about their centroid and rotated to coincide, which has eliminated a dgeree of 
freedom of variability that can have consequences.  

Furthermore not all species landmarks can be expected to have the same 
covariance structure, which is an assumption made in the ordinary Mahalanobis 
distance application to strut analysis between populations or species.  The 
assumption of similar data structure of course applies to the null hypothesis 
where there is no difference.  The typical statistical test explodes when the 
null hypothesis is falsified so just when you want the Mahalanobis distance 
metric to be accurate it starts misbehaving.

After rotation to simplest axes one does an 1 df F-test between each of the 
landmarks.  These tests are all independent so they can be summed together to 
produce a k df F-test which is Mahalonobis D squared.So Mahalonobis D is 
the square root of the sum of independent F-tests, but those F-tests are based 
on all sorts of assumptions about the variance of the landmarks.  I immagine on 
could modify calculation of D by limiting the sum over the top 95 or 99% 
variance components of the principal components.

Many times applications of analytical techniques are judged by whether they 
‘work’ or not.   If a clustering method works for you, use it(?).  I am of the 
opinion that I use statistics to convince myself rather than the audience.   A 
confluence on many arguments is used to make a case.

Joe

-·.  .· ·.  .><º>·.  .· ·.  .><º>·.  .· ·.  .><º> .··.· >=-   
=º}><
Joseph G. Kunkel, Research Professor
UNE Biddeford ME 04005
http://www.bio.umass.edu/biology/kunkel/

> On Jan 30, 2016, at 7:11 AM, Elahep  wrote:
> 
> 
> Hello all,
> 
> 
> 
> I have seen in many GM articles people use Mahalanobis distance for cluster 
> analysis. What is the advantage of using Mahalanobis distance over Euclidian 
> distance as similarity measure in cluster analysis of shape variables?
> 
> As far as I know Mahalanobis distance is the standardized form of Euclidean 
> distance which standardized data with adjustments made for correlation 
> between variables and weights all variables equally.
> 
> Why this distance measure is frequently used in GM cluster analysis??
> 
> 
> 
> Thanks in advance
> 
> Elahe
> 
> 
> -- 
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> --- 
> You received this message because you are subscribed to the Google Groups 
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to morphmet+unsubscr...@morphometrics.org.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.



[MORPHMET] Mahalanobis distance in cluster analysis of shape variables

2016-01-30 Thread Elahep


Hello all,


I have seen in many GM articles people use Mahalanobis distance for cluster 
analysis. What is the advantage of using Mahalanobis distance over 
Euclidian distance as similarity measure in cluster analysis of shape 
variables?

As far as I know Mahalanobis distance is the standardized form of Euclidean 
distance which standardized data with adjustments made for correlation 
between variables and weights all variables equally. 

Why this distance measure is frequently used in GM cluster analysis??


Thanks in advance

Elahe

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.