[MORPHMET] Re: procD.allometry with group inclusion

2016-12-12 Thread Christy Hipsley
Hi Mike, 

I have a follow-up to Tsung's procD.allometry question -

If the initial HOS test is nonsignificant for (shape~size, ~group), but the 
following ANOVA table is significant for size and group effects, can one 
interpret that as differences in y-intercept but not slope? So basically 
the HOS test determined that the size:group interaction did not improve the 
model, and so removed it from the ANOVA formula? In that case, the ANOVA 
model is (shape~size+group), correct? That makes perfect sense looking at 
the output graphs, but I just want to be sure. And if one wanted to then 
determine which groups differed in y-intercept, would one set up the 
advanced.procD.lm model like this and compare the pairwise LS means? 
advanced.procD.lm(Y ~ size, ~ size+group, groups = ~group, slope = NULL, 
iter=1)

Thanks if you can confirm that I'm setting this up correctly,
Christy

On Thursday, December 8, 2016 at 7:37:33 PM UTC+11, Tsung Fei Khang wrote:
>
> Hi all,
>
> I would like to use procD.allometry to study allometry in two species. 
>
> I understand that the function returns the regression score for each 
> specimen as Reg.proj, and that the calculation is obtained as:
> s = Xa, where X is the nxp matrix of Procrustes shape variables, and a is 
> the px1 vector of regression coefficients normalized to 1. I am able to 
> verify this computation from first principles when all samples are presumed 
> to come from the same species. 
>
> However, what happens when we are interested in more than 1 species (say 
> 2)? I could run procD.allometry by including the species labels via 
> f2=~gps, where gps gives the species labels. Is there just 1 regression 
> vector (which feels weird, since this should be species-specific), or 2? If 
> so, how can I recover both vectors? What is the difference of including 
> f2=~gps using all data, compared to if we make two separate runs of 
> procD.allometry, one for samples from species 1, and another for samples 
> from species 2?
>
> Thanks for any help.
>
> Rgds,
>
> TF
>
>
>
>
>
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: brief comment on non-significance Re: [MORPHMET] procD.allometry with group inclusion

2016-12-12 Thread andrea cardini
Thanks both. I fully agree. I kind of did not want to mention the 
opposite case (detecting tiny effects with very large samples) not to 
raise too many issues at the same time. There's an example of that kind 
(high power in large samples) in an old paper of mine where, with a 
total N of more than 1000, we found significant slopes but R2 with 
separate lines was about 43% and with parallel was 41%, and angles of 
trajectories were on average relatively small.


Reporting also R2s, assuming that samples are large and those R2s are 
accurate, readers can judge by themselves whether an effect is really 
important (and not just statistically significant). Unfortunately, R2s 
are still missing in many papers I read or review and that concerns not 
only regressions but also simply pairwise comparisons of group means 
(pairs of taxa, sexes etc.).



Thanks everyone for your feedback!

Cheers


Andrea





On 12/12/16 15:51, Adams, Dean [EEOBS] wrote:


Andrea,

I agree that one must consider both statistical significance and 
biological meaningfulness in evaluating patterns. Considering one of 
these without the other can often get one into trouble.


Your post concerned the inability to statistically detect differences 
due to sample size limitations, and the possibility of concluding 
homogeneity from this result when it may not be the case. But as Mike 
mentioned, the opposite is also a concern. In fact, one might recall a 
discussion some months ago on Morphmet on this very issue; where large 
samples afforded the ability to discern allometric differences between 
groups, but where those statistical differences may not be 
biologically important. In both cases, critical thinking and a merger 
of statistical result and biological knowledge of the system are 
required to arrive at a well-reasoned understanding of the patterns in 
the data.


Best,

Dean

Dr. Dean C. Adams

Professor

Department of Ecology, Evolution, and Organismal Biology

   Department of Statistics

Iowa State University

www.public.iastate.edu/~dcadams/ 



phone: 515-294-3834

*From:*Mike Collyer [mailto:mlcoll...@gmail.com]
*Sent:* Monday, December 12, 2016 8:34 AM
*To:* andrea cardini 
*Cc:* morphmet@morphometrics.org
*Subject:* Re: brief comment on non-significance Re: [MORPHMET] 
procD.allometry with group inclusion


Andrea,

My opinion on this is that the researcher who has collected the data 
must retain at all times a biological wisdom that supersedes a 
suggested course of action based on results from a statistical test. 
 If the purpose of a study is to assess the allometric pattern of 
shape variation within populations, then maybe the results of a 
homogeneity of slopes test can be an unnecessary burden.  If a 
researcher wants to compare the mean shapes of different groups but is 
concerned that allometric variation might differ among groups, then a 
homogeneity of slopes test could be an important first step, but I 
agree that a non-significant result should not spur the researcher to 
immediately conclude a common allometry or no allometry is 
appropriate.  Sample size, variation in size among groups, and 
appropriate distributions of specimen size within groups might all be 
things to think about.


The point you make about a potential type II error is a real concern. 
 The opposite problem is also a real concern.  One might have very 
large sample sizes and sufficient statistical power to suggest that 
allometric slopes are heterogeneous.  However, the coefficient of 
determination and/or effect size for size:group interaction might be 
quite small.  Just because there is a low probability of finding as 
large of an effect based on thousands of random permutations, is one 
ready to accept that different groups have evolved unique allometric 
trajectories?  It is easy to forget that the choice of “significance 
level” - the a priori acceptable rate of type I error - is arbitrary. 
 Making strong inferential decisions based on a binary decision for an 
arbitrary criterion is probably not wise.  I would argue that instead 
of focusing on a P-value, one could just as arbitrarily, but perhaps 
more justifiably, choose a coefficient of determination of R^2 = 0.10 
or an effect size of 2 SD as a criterion for whether to retain or omit 
the interaction coefficients that allow for heterogenous slopes.


*** Warning: pedantic discussion on model selection starts here.  Skip 
if unappealing.


One could also turn to model selection approaches.  However, I think 
multivariate generalization for indices like AIC is an area lacking 
needed theoretical research for high-dimensional shape data.  There 
are two reasons for this.  First, the oft-defined AIC is model 
log-likelihood + 2K, where K is the number of coefficients in a linear 
model (rank of the model design matrix) + 1, where the 1 is the 
dimension of the value for the variance of the error.  This is a 

RE: brief comment on non-significance Re: [MORPHMET] procD.allometry with group inclusion

2016-12-12 Thread Adams, Dean [EEOBS]
Andrea,

I agree that one must consider both statistical significance and biological 
meaningfulness in evaluating patterns.  Considering one of these without the 
other can often get one into trouble.

Your post concerned the inability to statistically detect differences due to 
sample size limitations, and the possibility of concluding homogeneity from 
this result when it may not be the case. But as Mike mentioned, the opposite is 
also a concern. In fact, one might recall a discussion some months ago on 
Morphmet on this very issue; where large samples afforded the ability to 
discern allometric differences between groups, but where those statistical 
differences may not be biologically important. In both cases, critical thinking 
and a merger of statistical result and biological knowledge of the system are 
required to arrive at a well-reasoned understanding of the patterns in the data.

Best,

Dean

Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
   Department of Statistics
Iowa State University
www.public.iastate.edu/~dcadams/
phone: 515-294-3834

From: Mike Collyer [mailto:mlcoll...@gmail.com]
Sent: Monday, December 12, 2016 8:34 AM
To: andrea cardini 
Cc: morphmet@morphometrics.org
Subject: Re: brief comment on non-significance Re: [MORPHMET] procD.allometry 
with group inclusion

Andrea,

My opinion on this is that the researcher who has collected the data must 
retain at all times a biological wisdom that supersedes a suggested course of 
action based on results from a statistical test.  If the purpose of a study is 
to assess the allometric pattern of shape variation within populations, then 
maybe the results of a homogeneity of slopes test can be an unnecessary burden. 
 If a researcher wants to compare the mean shapes of different groups but is 
concerned that allometric variation might differ among groups, then a 
homogeneity of slopes test could be an important first step, but I agree that a 
non-significant result should not spur the researcher to immediately conclude a 
common allometry or no allometry is appropriate.  Sample size, variation in 
size among groups, and appropriate distributions of specimen size within groups 
might all be things to think about.

The point you make about a potential type II error is a real concern.  The 
opposite problem is also a real concern.  One might have very large sample 
sizes and sufficient statistical power to suggest that allometric slopes are 
heterogeneous.  However, the coefficient of determination and/or effect size 
for size:group interaction might be quite small.  Just because there is a low 
probability of finding as large of an effect based on thousands of random 
permutations, is one ready to accept that different groups have evolved unique 
allometric trajectories?  It is easy to forget that the choice of “significance 
level” - the a priori acceptable rate of type I error - is arbitrary.  Making 
strong inferential decisions based on a binary decision for an arbitrary 
criterion is probably not wise.  I would argue that instead of focusing on a 
P-value, one could just as arbitrarily, but perhaps more justifiably, choose a 
coefficient of determination of R^2 = 0.10 or an effect size of 2 SD as a 
criterion for whether to retain or omit the interaction coefficients that allow 
for heterogenous slopes.

*** Warning: pedantic discussion on model selection starts here.  Skip if 
unappealing.

One could also turn to model selection approaches.  However, I think 
multivariate generalization for indices like AIC is an area lacking needed 
theoretical research for high-dimensional shape data.  There are two reasons 
for this.  First, the oft-defined AIC is model log-likelihood + 2K, where K is 
the number of coefficients in a linear model (rank of the model design matrix) 
+ 1, where the 1 is the dimension of the value for the variance of the error.  
This is a simplification for univariate data.  The second half of the equation 
is actually 2[pk + 0.5p(p+1)], where p is the number of shape variables and k 
is the rank of the design matrix.  (One might define p as the rank of the shape 
variable matrix - the number of actual dimensions in the tangent space, also 
equal to the number of principal components with positive eigen values greater 
than 0 from a PCA - if using high-dimensional data or small samples.)  Notice 
that substituting 1 for p in this equation gets one back to the 2K, as defined 
first.  The pk part of the equation represents the dimensions of linear model 
coefficients; the 0.5p(p+1) part represents the dimensions of the error 
covariance matrix.  The reason this is important is that one might have picked 
up along the way that a delta AIC of 1-2 means two models are comparable (as if 
with equal likelihoods, they differ by around 1 parameter or less).  This rule 
of thumb would have to be augmented with highly 

Re: brief comment on non-significance Re: [MORPHMET] procD.allometry with group inclusion

2016-12-12 Thread Mike Collyer
Andrea,

My opinion on this is that the researcher who has collected the data must 
retain at all times a biological wisdom that supersedes a suggested course of 
action based on results from a statistical test.  If the purpose of a study is 
to assess the allometric pattern of shape variation within populations, then 
maybe the results of a homogeneity of slopes test can be an unnecessary burden. 
 If a researcher wants to compare the mean shapes of different groups but is 
concerned that allometric variation might differ among groups, then a 
homogeneity of slopes test could be an important first step, but I agree that a 
non-significant result should not spur the researcher to immediately conclude a 
common allometry or no allometry is appropriate.  Sample size, variation in 
size among groups, and appropriate distributions of specimen size within groups 
might all be things to think about.

The point you make about a potential type II error is a real concern.  The 
opposite problem is also a real concern.  One might have very large sample 
sizes and sufficient statistical power to suggest that allometric slopes are 
heterogeneous.  However, the coefficient of determination and/or effect size 
for size:group interaction might be quite small.  Just because there is a low 
probability of finding as large of an effect based on thousands of random 
permutations, is one ready to accept that different groups have evolved unique 
allometric trajectories?  It is easy to forget that the choice of “significance 
level” - the a priori acceptable rate of type I error - is arbitrary.  Making 
strong inferential decisions based on a binary decision for an arbitrary 
criterion is probably not wise.  I would argue that instead of focusing on a 
P-value, one could just as arbitrarily, but perhaps more justifiably, choose a 
coefficient of determination of R^2 = 0.10 or an effect size of 2 SD as a 
criterion for whether to retain or omit the interaction coefficients that allow 
for heterogenous slopes.

*** Warning: pedantic discussion on model selection starts here.  Skip if 
unappealing.

One could also turn to model selection approaches.  However, I think 
multivariate generalization for indices like AIC is an area lacking needed 
theoretical research for high-dimensional shape data.  There are two reasons 
for this.  First, the oft-defined AIC is model log-likelihood + 2K, where K is 
the number of coefficients in a linear model (rank of the model design matrix) 
+ 1, where the 1 is the dimension of the value for the variance of the error.  
This is a simplification for univariate data.  The second half of the equation 
is actually 2[pk + 0.5p(p+1)], where p is the number of shape variables and k 
is the rank of the design matrix.  (One might define p as the rank of the shape 
variable matrix - the number of actual dimensions in the tangent space, also 
equal to the number of principal components with positive eigen values greater 
than 0 from a PCA - if using high-dimensional data or small samples.)  Notice 
that substituting 1 for p in this equation gets one back to the 2K, as defined 
first.  The pk part of the equation represents the dimensions of linear model 
coefficients; the 0.5p(p+1) part represents the dimensions of the error 
covariance matrix.  The reason this is important is that one might have picked 
up along the way that a delta AIC of 1-2 means two models are comparable (as if 
with equal likelihoods, they differ by around 1 parameter or less).  This rule 
of thumb would have to be augmented with highly multivariate data to 1*p to 
2*p, which makes it hard to have a good general sense of when models are 
comparable, unless one takes into consideration how many shape variables are in 
use.

Second, the log-likelihood involves calculating the determinant of the error 
covariance matrix, which is problematic for singular matrices, like might be 
found with high-dimensional shape data.  Recently, colleagues and I have used 
plots of the log of the trace of error covariance matrices versus the log of 
parameter penalties - the 2[pk + 0.5p(p+1)] part - as a way of scanning 
candidate models for the one or two that have lower error relative to the 
number of parameters in the model.  Such an approach allows one to have no 
allometric slope, a common allometric slope, and unique allometric slopes, in 
combination with other important factors, and consider many models at once.  
But again, there is a certain level of arbitrariness to this.

*** End pedantic discussion

There are other issues that can be quite real with real data.  For example, if 
one wishes to consider if there are shape differences among groups but first 
wishes to address if there is meaningful allometric shape variation, and 
whether there might be different allometries among groups, a homogeneity of 
slopes test might be done.  But what if it is revealed that one group has all 
small specimens and one group has all large specimens?  The 

Re: [MORPHMET] PC Scores From Other PCA in MorphoJ

2016-12-12 Thread Chris Klingenberg

Dear Andrea

For the "PC Scores From Other PCA" procedure, you need a dataset (which 
contains the items for which you want to compute the scores) and a PCA.


The dialog box has two drop-down menus, one for selecting the dataset 
and the other for selecting the PCA. When you select a dataset, the set 
of PCAs that is available in the secod drop-down menu changes: only 
those PCAs are shown that are consistent with the dataset in the number 
of landmarks, dimensionality, object symmetry etc. So PCAs do appear and 
disappear in the drop-down menu depending on what dataset you have selected.


Covariance matrices are only involved as far as a covariance matrix is 
necessary to run a PCA. The covariance matrices themselves do not appear 
in conection with the procedure.


I hope this helps.

Best wishes,
Chris


On 12/12/2016 12:18, andrea cardini wrote:

Today is my morphmet-day.

A question, this time, please.


Is there anyone with experience on the "PC Scores From Other PCA" in 
MorphoJ?


I am trying to project shape coordinates on a matrix I imported and 
attached to a dataset with mean shapes, but the attached dataset does 
not show up in the menu: I can see the parent dataset (before 
averaging) with the imported covariance matrix but, as soon as I 
select the mean shapes (despite being the data to which the covar. 
matrix is attached), the imported covar. matrix vanishes. That means 
that I can project the parent dataset (non-averaged) onto the vectors 
from the imported matrix but can't do it with the mean shapes (which 
is my aim).



I'll do it in NTSYSpc or R, but I'd be curious to understand what I am 
getting wrong in MorphoJ.


Thanks in advance for any suggestion.


Cheers


Andrea





--
***
Christian Peter Klingenberg
School of Biological Sciences
University of Manchester
Michael Smith Building
Oxford Road
Manchester M13 9PT
United Kingdom

Web site: http://www.flywings.org.uk
E-mail: c...@manchester.ac.uk
Phone: +44 161 2753899
Skype: chris_klingenberg
***

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups "MORPHMET" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.



[MORPHMET] PC Scores From Other PCA in MorphoJ

2016-12-12 Thread andrea cardini

Today is my morphmet-day.

A question, this time, please.


Is there anyone with experience on the "PC Scores From Other PCA" in 
MorphoJ?


I am trying to project shape coordinates on a matrix I imported and 
attached to a dataset with mean shapes, but the attached dataset does 
not show up in the menu: I can see the parent dataset (before averaging) 
with the imported covariance matrix but, as soon as I select the mean 
shapes (despite being the data to which the covar. matrix is attached), 
the imported covar. matrix vanishes. That means that I can project the 
parent dataset (non-averaged) onto the vectors from the imported matrix 
but can't do it with the mean shapes (which is my aim).



I'll do it in NTSYSpc or R, but I'd be curious to understand what I am 
getting wrong in MorphoJ.


Thanks in advance for any suggestion.


Cheers


Andrea



--

Dr. Andrea Cardini
Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università di Modena 
e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
tel. 0039 059 2058472

Adjunct Associate Professor, School of Anatomy, Physiology and Human Biology, 
The University of Western Australia, 35 Stirling Highway, Crawley WA 6009, 
Australia

E-mail address: alcard...@gmail.com, andrea.card...@unimore.it
WEBPAGE: https://sites.google.com/site/alcardini/home/main

FREE Yellow BOOK on Geometric Morphometrics: 
http://www.italian-journal-of-mammalogy.it/public/journals/3/issue_241_complete_100.pdf

ESTIMATE YOUR GLOBAL FOOTPRINT: 
http://www.footprintnetwork.org/en/index.php/GFN/page/calculators/

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups "MORPHMET" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.



Re: brief comment on non-significance Re: [MORPHMET] procD.allometry with group inclusion

2016-12-12 Thread Carmelo Fruciano

Dear Andrea (and all),
I think you make a very interesting point.

My comment will come across as very unsophisticate and I certainly  
look forward to reading the comments of more statistically-literate  
list users.


To me, it seems that, as long as we interpret non-significant results  
as "failure to reject the null hypothesis", then that encompasses both  
cases due to statistical power, cases due to the null hypothesis being  
true (parallelism of the two regression lines) and so on.


Of course, one can elaborate - based on experience and/or further  
evidence - on the most likely reasons of a non-significant test. But,  
as you rightly write, one has to be careful and understand a sort of  
"acceptance of increased error" when making statements on why a test  
might be non-significant.


You also make a good point that power analyses can be helpful.  
However, to me it feels like in many situations power analyses can be  
limited by the difficulty of defining realistic effect sizes and, in  
some cases, other factors influencing inference (e.g., non-isotropic  
variation).


Best,
Carmelo




andrea cardini  ha scritto:


Dear All,

if I can, I'd add a brief comment on the interpretation of  
non-significant results. I'd appreciate this to be checked by those  
with a proper understanding and background on stats (which I  
haven't!).


I use Mike's sentence on non-significant slopes as an example but  
the issue is a general one, although I find it particularly tricky  
in the context of comparing trajectories (allometries or other)  
across groups. Mike wisely said "approximately ("If not significant,  
than the slope vectors are APPROXIMATELY parallel"). With  
permutations, one might be able to perform tests even when sample  
sizes are small (and maybe, which is even more problematic,  
heterogeneous across groups): then, non-significance could simply  
mean that samples are not large enough to make strong statements  
(rejection of the null hp) with confidence (i.e., statistical power  
is low). Especially with short trajectories (allometries or other),  
it might happen to find n.s. slopes with very large angles between  
the vectors, a case where it is probably hard to conclude that  
allometries really are parallel.


That of small samples is a curse of many studies in taxonomy and  
evolution. We've done a couple of exploratory (non-very-rigorous!)  
empirical analyses of the effect of reducing sample sizes on means,  
variances, vector angles etc. in geometric morphometrics (Cardini &  
Elton, 2007, Zoomorphol.; Cardini et al., 2015, Zoomorphol.) and  
some, probably, most of these, literally blow up when N goes down.  
That happened even when differences were relatively large (species  
separated by several millions of years of independent evolution or  
samples including domestic breeds hugely different from their wild  
cpunterpart).


Unless one has done power analyses and/or has very large samples,  
I'd be careful with the interpretations. There's plenty on this in  
the difficult (for me) statistical literature. Surely one can do  
sophisticated power analyses in R and, although probably and  
unfortunately not used by many, one of the programs of the TPS  
series (TPSPower) was written by Jim exactly for this aim (possibly  
not for power analyses in the case of MANCOVAs/vector angles but  
certainly in the simpler case of comparisons of means).


Cheers


Andrea


On 11/12/16 19:17, Mike Collyer wrote:

Dear Tsung,

The geomorph function, advanced.procD.lm, allows one to extract  
group slopes and model coefficients.  In fact, procD.allometry is a  
specialized function that uses advanced.procD.lm to perform the HOS  
test and then uses procD.lm to produce an ANOVA table, depending on  
the results of the HOS test.  It also uses the coefficients and  
fitted values from procD.lm to generate the various types of  
regression scores.  In essence, procD.allometry is a function that  
carries out several analyses with geomorph base functions, procD.lm  
and advanced.procD.lm, in a specified way.  By comparison, the  
output is more limited, but one can use the base functions to get  
much more output.


In advanced.procD.lm, if one specifies groups and a slope, one of  
the outputs is a matrix of slope vectors.  Also, one can perform  
pairwise tests to compare either the correlation or angle between  
slope vectors.


Regarding the operation of the HOS test, it is a permutational test  
that does the following: calculate the sum of squared residuals for  
a “full” model, shape ~ size + group + size:group and the same for  
a “reduced” model, shape ~ size + group.  (The sum of squared  
residuals is the trace of the error SSCP matrix, which is the same  
of the sum of the summed squared residuals for every shape  
variable.)The difference between these two values is the sum of  
squares for the size:group effect.  If significantly large (i.e.,  
is found