[MORPHMET] Re: number of landmarks and sample size

2017-05-31 Thread mitte...@univie.ac.at
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to 
the biological question or not. This can be advantageous for the 
reconstruction of shapes, especially when producing 3D morphs by warping 
dense surface representations. Basic developmental or evolutionary trends, 
group structures, etc., often are visible in an ordination analysis with a 
smaller set of relevant landmarks; finer anatomical resolution not 
necessarily affects these patterns. However, adding more landmarks cannot 
reduce or even remove any signals that were found with less landmarks, but 
it can make ordination analyses and the interpretation distances and angles 
in shape space more challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates 
on some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix. 

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and 
relative eigenanalysis require the inversions of a full-rank covariance 
matrix, which implies an access of specimens over variables. The same 
applies to many multivariate parametric test statistics, such as 
Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full 
rank and thus can never be subjected to any of these methods without prior 
variable reduction. In fact, reliable results can only be obtained if there 
are manifold more specimens than variables, which usually requires variable 
reduction by PCA, PLS or other techniques, or the regularization of 
covariance matrices (which is more common in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large 
number of landmarks, except for a waste of time perhaps. If life time is an 
issue, one can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


RE: [MORPHMET] number of landmarks and sample size

2017-05-31 Thread F. James Rohlf
Another, though non-statistical, approach to judge whether one has an 
appropriate number of landmarks or perhaps too many is to use the tpsSuper 
software. 

One could start with many landmarks and confirm (one hopes) that the average 
unwarped image is clear implying that the landmarks have captured the variation 
of not only the landmarks but the structures around them. You can then remove 
landmarks and see whether the average looks fuzzier. If so, that reflects 
variation not well tracked by the chosen landmarks.  If there is little change 
then the landmarks you removed are not really necessary to track the variation 
in the sample. One could then continue the process. Clearly the issue is not 
just the number of landmarks but where they are located relative to the 
variation among the specimens. This process could be automated to try 
combinations of landmarks such that some measure of variation in pixels of the 
unwarped images are minimized. I seem to remember that Mardia made a suggestion 
like that many years ago.
_ _ _ _ _ _ _ _ _
F. James Rohlf, Distinguished Prof. Emeritus
Stony Brook University
Depts. of Anthropology and of Ecology & Evolution

-Original Message-
From: Murat Maga [mailto:m...@uw.edu] 
Sent: Wednesday, May 31, 2017 12:33 PM
To: Mike Collyer ; Lea Wolter 
Cc: MORPHMET 
Subject: RE: [MORPHMET] number of landmarks and sample size

I want to chime in on Mike's comment about density of landmarking changing the 
effect size. Nicolas Navarro and I did something similar in context of 
quantitative genetics of mandible shape and came to a similar conclusion using 
2D, 3D and 3D semi-landmarks sets on same dataset.

Navarro N, Maga AM. 2016. Does 3D Phenotyping Yield Substantial Insights in the 
Genetics of the Mouse Mandible Shape? G3: Genes, Genomes, Genetics 6:1153–1163.


-Original Message-
From: Mike Collyer [mailto:mlcoll...@gmail.com] 
Sent: Wednesday, May 31, 2017 7:43 AM
To: Lea Wolter 
Cc: MORPHMET 
Subject: Re: [MORPHMET] number of landmarks and sample size

Dear Lea,

I see others have responded to your inquiry, already.  I thought I would add an 
additional perspective.

Your question about statistical significance requires asking a follow-up 
question.  What statistical methods would you intend to use to evaluate 
“significance”?  If you are worried about the number of landmarks, your concern 
suggests you might be using parametric test statistics frequently associated 
with MANOVA, like Wilks lambda or Pilai trace.  Indeed, when using these 
statistics and converting them to approximate F values, one must have many more 
specimens than landmarks (more error degrees of freedom than shape variables, 
to be more precise), if “significance” is to be inferred from probabilities 
associated with F-distributions.  Therefore, limiting the number of landmarks 
might be a goal.

When using resampling procedures to conduct ANOVA, using fewer landmarks can 
paradoxically decrease effect sizes, as an overly simplified definition of 
shape becomes implied.  We demonstrated this in our paper: Collyer, M.L., D.J. 
Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for 
phenotypes described by high-dimensional data. Heredity. 115: 357-365.  This is 
consistent with Andrea’s comment about quality over quantity with the caveat 
that limited quantity precludes quality.  In other words, too few landmarks 
translates to limited ability to discern shape differences, because the shape 
compared is basic.  In the paper, we used two separate landmark configurations: 
one with few landmarks and the other with the same landmarks plus sliding 
semilandmarks between fixed points, on different populations of fish.  We found 
that adding the semilandmarks increased the effect size for population 
differences and sexual dimorphism.  But if we constrained our analyses to 
parametric MANOVA for our small samples, we would have to use the simpler 
landmark configurations and live with the results.

I do not wish to suggest that adding more landmarks is better.  Overkill is 
certainly a concern.  I would suggest though that statistical power would be 
for me less of a concern than a proper characterization of the shape I wish to 
compare among samples.  If I suspect curvature is important but am afraid to 
use (semi)landmarks that would allow me to assess the curvature differences 
among groups, opting instead to use just the endpoints of a structure because I 
am worried about statistical power, then I just allowed a statistical procedure 
to take me away from the biologically relevant question I sought to address.  
Andrea is correct that quality is better than quantity, but quantity can be a 
burden in either direction (too few or too many).  Additionally, statistical 
power will vary among statistical methods.  Reconsidering methods might be 

RE: [MORPHMET] number of landmarks and sample size

2017-05-31 Thread Murat Maga
I want to chime in on Mike's comment about density of landmarking changing the 
effect size. Nicolas Navarro and I did something similar in context of 
quantitative genetics of mandible shape and came to a similar conclusion using 
2D, 3D and 3D semi-landmarks sets on same dataset.

Navarro N, Maga AM. 2016. Does 3D Phenotyping Yield Substantial Insights in the 
Genetics of the Mouse Mandible Shape? G3: Genes, Genomes, Genetics 6:1153–1163.


-Original Message-
From: Mike Collyer [mailto:mlcoll...@gmail.com] 
Sent: Wednesday, May 31, 2017 7:43 AM
To: Lea Wolter 
Cc: MORPHMET 
Subject: Re: [MORPHMET] number of landmarks and sample size

Dear Lea,

I see others have responded to your inquiry, already.  I thought I would add an 
additional perspective.

Your question about statistical significance requires asking a follow-up 
question.  What statistical methods would you intend to use to evaluate 
“significance”?  If you are worried about the number of landmarks, your concern 
suggests you might be using parametric test statistics frequently associated 
with MANOVA, like Wilks lambda or Pilai trace.  Indeed, when using these 
statistics and converting them to approximate F values, one must have many more 
specimens than landmarks (more error degrees of freedom than shape variables, 
to be more precise), if “significance” is to be inferred from probabilities 
associated with F-distributions.  Therefore, limiting the number of landmarks 
might be a goal.

When using resampling procedures to conduct ANOVA, using fewer landmarks can 
paradoxically decrease effect sizes, as an overly simplified definition of 
shape becomes implied.  We demonstrated this in our paper: Collyer, M.L., D.J. 
Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for 
phenotypes described by high-dimensional data. Heredity. 115: 357-365.  This is 
consistent with Andrea’s comment about quality over quantity with the caveat 
that limited quantity precludes quality.  In other words, too few landmarks 
translates to limited ability to discern shape differences, because the shape 
compared is basic.  In the paper, we used two separate landmark configurations: 
one with few landmarks and the other with the same landmarks plus sliding 
semilandmarks between fixed points, on different populations of fish.  We found 
that adding the semilandmarks increased the effect size for population 
differences and sexual dimorphism.  But if we constrained our analyses to 
parametric MANOVA for our small samples, we would have to use the simpler 
landmark configurations and live with the results.

I do not wish to suggest that adding more landmarks is better.  Overkill is 
certainly a concern.  I would suggest though that statistical power would be 
for me less of a concern than a proper characterization of the shape I wish to 
compare among samples.  If I suspect curvature is important but am afraid to 
use (semi)landmarks that would allow me to assess the curvature differences 
among groups, opting instead to use just the endpoints of a structure because I 
am worried about statistical power, then I just allowed a statistical procedure 
to take me away from the biologically relevant question I sought to address.  
Andrea is correct that quality is better than quantity, but quantity can be a 
burden in either direction (too few or too many).  Additionally, statistical 
power will vary among statistical methods.  Reconsidering methods might be as 
important as reconsidering landmarks configurations.

Regards!
Mike



> On May 4, 2017, at 5:19 AM, Lea Wolter  wrote:
> 
> Hello everyone,
> 
> I am new in the field of geometric morphometrics and have a question for my 
> bachelor thesis.
> 
> I am not sure how many landmarks I should use at most in regard to the sample 
> size. I have a sample of about 22 individuals per population or maybe a bit 
> less (using sternum and epigyne of spiders) with 5 populations. 
> I have read a paper in which they use 18 landmarks with an even lower sample 
> size (3 populations with 20 individuals, 1 with 10). But I have also heard 
> that I should use twice as much individuals per population as land marks... 
> 
> Maybe there is some mathematical formula for it to know if it would be 
> statistically significant? Could you recommend some paper?
> 
> Because of the symmetry of the epigyne I am now thinking of using just one 
> half of it for setting landmarks (so I get 5 instead of 9 landmarks). For the 
> sternum I thought about 7 or 9 landmarks, so at most I would also get 18 
> landmarks like in the paper. 
> 
> I would also like to use two type specimens in the analysis, but I have just 
> this one individual per population... would it be totally nonesens in a 
> statistical point of view?
> 
> Thanks very much for your help!
> 
> Best regards
> Lea
> 
> -- 
> MORPHMET may be accessed via its webpage at 

Re: [MORPHMET] number of landmarks and sample size

2017-05-31 Thread Mike Collyer
Dear Lea,

I see others have responded to your inquiry, already.  I thought I would add an 
additional perspective.

Your question about statistical significance requires asking a follow-up 
question.  What statistical methods would you intend to use to evaluate 
“significance”?  If you are worried about the number of landmarks, your concern 
suggests you might be using parametric test statistics frequently associated 
with MANOVA, like Wilks lambda or Pilai trace.  Indeed, when using these 
statistics and converting them to approximate F values, one must have many more 
specimens than landmarks (more error degrees of freedom than shape variables, 
to be more precise), if “significance” is to be inferred from probabilities 
associated with F-distributions.  Therefore, limiting the number of landmarks 
might be a goal.

When using resampling procedures to conduct ANOVA, using fewer landmarks can 
paradoxically decrease effect sizes, as an overly simplified definition of 
shape becomes implied.  We demonstrated this in our paper: Collyer, M.L., D.J. 
Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for 
phenotypes described by high-dimensional data. Heredity. 115: 357-365.  This is 
consistent with Andrea’s comment about quality over quantity with the caveat 
that limited quantity precludes quality.  In other words, too few landmarks 
translates to limited ability to discern shape differences, because the shape 
compared is basic.  In the paper, we used two separate landmark configurations: 
one with few landmarks and the other with the same landmarks plus sliding 
semilandmarks between fixed points, on different populations of fish.  We found 
that adding the semilandmarks increased the effect size for population 
differences and sexual dimorphism.  But if we constrained our analyses to 
parametric MANOVA for our small samples, we would have to use the simpler 
landmark configurations and live with the results.

I do not wish to suggest that adding more landmarks is better.  Overkill is 
certainly a concern.  I would suggest though that statistical power would be 
for me less of a concern than a proper characterization of the shape I wish to 
compare among samples.  If I suspect curvature is important but am afraid to 
use (semi)landmarks that would allow me to assess the curvature differences 
among groups, opting instead to use just the endpoints of a structure because I 
am worried about statistical power, then I just allowed a statistical procedure 
to take me away from the biologically relevant question I sought to address.  
Andrea is correct that quality is better than quantity, but quantity can be a 
burden in either direction (too few or too many).  Additionally, statistical 
power will vary among statistical methods.  Reconsidering methods might be as 
important as reconsidering landmarks configurations.

Regards!
Mike



> On May 4, 2017, at 5:19 AM, Lea Wolter  wrote:
> 
> Hello everyone,
> 
> I am new in the field of geometric morphometrics and have a question for my 
> bachelor thesis.
> 
> I am not sure how many landmarks I should use at most in regard to the sample 
> size. I have a sample of about 22 individuals per population or maybe a bit 
> less (using sternum and epigyne of spiders) with 5 populations. 
> I have read a paper in which they use 18 landmarks with an even lower sample 
> size (3 populations with 20 individuals, 1 with 10). But I have also heard 
> that I should use twice as much individuals per population as land marks... 
> 
> Maybe there is some mathematical formula for it to know if it would be 
> statistically significant? Could you recommend some paper?
> 
> Because of the symmetry of the epigyne I am now thinking of using just one 
> half of it for setting landmarks (so I get 5 instead of 9 landmarks). For the 
> sternum I thought about 7 or 9 landmarks, so at most I would also get 18 
> landmarks like in the paper. 
> 
> I would also like to use two type specimens in the analysis, but I have just 
> this one individual per population... would it be totally nonesens in a 
> statistical point of view?
> 
> Thanks very much for your help!
> 
> Best regards
> Lea
> 
> -- 
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> --- 
> You received this message because you are subscribed to the Google Groups 
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to morphmet+unsubscr...@morphometrics.org.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.



Re: [MORPHMET] Re: number of landmarks and sample size

2017-05-31 Thread andrea cardini

Dear All,
I'd like to add a few comments on sampling (landmarks but also 
specimens). I hope that some of the other subscribers, who know much 
more than I do about morphometrics, will refine and correct my points.



A very short one on my two papers. They make a very simple point: if one 
is landmarking just one side of a structure with object symmetry simply 
to speed up data collection, then mirror-reconstructing the missing side 
will make a nicer visualization and probably make shape data which are 
closer to those obtained by landmarking both sides. The difference may 
be tiny and I said "probably" because I am reporting results of 
empirical studies: out of 11-12 datasets, all but one had shape 
distances closer to those of the full bilateral landmark data after 
mirror-reconstructing the missing side. This did not work in one dataset 
which happened to have a very large amount of fluctuating asymmetry.
To what extent these results are generalizable, I can't say but everyone 
can plan a small preliminary analysis to check it in her/his own data.


I fully agree with Aki that, if time, money etc. are not a constraint, 
even when one is not interested in asymmetry, it is better to measure 
both sides. That's in fact true also for structures with matching symmetry.



In terms of the choice of landmarks, I wish to stress (once more!) that 
quality may be more important than quantity: first one should think well 
about what she/he wants to measure, which will relate to the specific 
question being asked, and then decide about where and how many landmarks 
to use. There are at least two wonderful papers I suggested several 
times on this issue:

Oxnard & O'Higgins, 2009, Biological Theory 4(1), 84–97.
Klingenberg, 2008, Evol Biol 35:186–190

Then, especially for semilandmarks, I guess that as Aki (and others 
before) suggested, one can see what a good compromise is between 
information and the number of points (maybe considering also, but not 
principally, the visualization).



For sample size, one should consider whether differences are presumably 
big (and a small sample might be OK...ish) or small (as in most 
microevolutionary studies, which generally require large N). I believe 
that Rohlf, already in the early days of geometric morphometrics, had 
written a software for exploring statistical power in shape data 
(TPSPower) but I am not sure if he kept developing it. In any case, 
power and sensitivity (to sampling) analyeses are certainly available in R.
With small differences, although resampling methods may allow to perform 
tests even with tiny samples, power will be low and estimates (say, mean 
size and shape, variance and covariance etc.) will be likely inaccurate.
Unfortunately, often, the most interesting taxa are rare populations (or 
fossils) for which specimens are difficult to find.


A couple of people told me that there's an important paper coming out 
soon on sampling error in geometric morphometrics and it might suggest 
that one really needs huge samples. I would not be surprised and suspect 
that the few empirical studies we did (a couple of papers in 
Zoomorphology) were overoptimistic despite already suggesting (more or 
less) that one might need several dozens of specimens even when 
differences are relatively large and the number of landmarks was not 
particularly large. Again, they were empirical studies and one cannot 
say how generalizable they are.
Anyway, I look forward to this new paper and hope it will be announced 
in MORPHMET, as well as I look forward to Aki's paper.



Cheers

Andrea


On 29/05/17 18:35, Aki Watanabe wrote:

Dear Lea,

Unfortunately, there isn't (yet) a magic mathematical formula to 
determine whether you've sampled enough landmarks, but there are some 
exploratory approaches you can take to see if you're landmark sampling 
is converging to the "true" shape variation. One simple thing you can do 
is sample as many landmarks as you can on a representative sampling of 
specimens, then create a PC morphospace. Then, subsample the landmarks 
(e.g., 75%, 50%, 25% of the landmarks) and see if the PC morphospace 
from these subsampled datasets mirror the distribution of shapes of the 
full dataset. If the morphospaces begin deviating from the PC 
morphospace of the full dataset, then you have a visual cue that the 
subsampling is not adequately characterizing the shape variation of your 
specimens. In terms of a statistically significant test for landmark 
sampling, I suppose one can test for correlation between subsampled and 
full dataset, but because the subsampled and full dataset will be 
auto-correlated to some extent, the null would have to reflect this.


Alternatively, I have a script that automatically subsamples the 
landmarks of a given dataset and creates a plot to see how well the 
subsampled datasets converge to the point distribution of the full 
dataset. If you are interested, I would be happy to describe the 
technique in more detail