Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-11 Thread Justin Bagley
Hi Will,

I think you meant to say that you are writing a study design paper
presenting results of simulations and power analysis to determine
appropriate sample sizes for multivariate analyses in geometric
morphometrics. But I would think that would have already been settled by
now, and possibly would be more relevant for certain clustering methods.
The only parameterized PCA variant I am aware of is Kernel PCA, which is a
nonlinear PCA method used for pattern analysis (e.g. used in image
analysis), but that is not often employed in biological geometric
morphometrics papers (at least, those that I frequently come across). When
kernels are used they usually are meant to estimate densities of
reduced-dimensionality data like CS, or PCs as shape variables.

Best,

Justin

Justin C. Bagley, Ph.D.
Postdoctoral Scholar
Plant Evolutionary Genomics Laboratory
Department of Biology
Virginia Commonwealth University
Richmond, VA 23284-2012
jcbag...@vcu.edu

Senior/Postdoctoral Research Associate
Departamento de Zoologia
Universidade de Brasília
Campus Universitário Darcy Ribeiro
70910-900 Brasília, DF, Brasil
Website: http://www.justinbagley.org
Lattes CV: http://lattes.cnpq.br/0028570120872581

On Wed, May 31, 2017 at 6:41 PM, William Gelnaw  wrote:

> I'm currently working on a paper that deals with the problem of
> over-parameterizing PCA in morphometrics.  The recommendations that I'm
> making in the paper are that you should try to have at least 3 times as
> many samples as variables.  That means that if you have 10 2D landmarks,
> you should have at least 60 specimens that you measure.  Based on
> simulations, if you have fewer than 3 specimens per variable, you quickly
> start getting eigenvalues for a PCA that are very different from known true
> eigenvalues.  I did a literature survey and about a quarter of
> morphometrics studies in the last decade haven't met that standard.  A good
> way to test if you have enough samples is to do a jackknife analysis.  If
> you cut out about 10% of your observations and still get the same
> eigenvalues, then your results are probably stable.
>   I hope this helps.
>   - Will
>
> On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at <
> mitte...@univie.ac.at> wrote:
>
>> Adding more (semi)landmarks inevitably increases the spatial resolution
>> and thus allows one to capture finer anatomical details - whether relevant
>> to the biological question or not. This can be advantageous for the
>> reconstruction of shapes, especially when producing 3D morphs by warping
>> dense surface representations. Basic developmental or evolutionary trends,
>> group structures, etc., often are visible in an ordination analysis with a
>> smaller set of relevant landmarks; finer anatomical resolution not
>> necessarily affects these patterns. However, adding more landmarks cannot
>> reduce or even remove any signals that were found with less landmarks, but
>> it can make ordination analyses and the interpretation distances and angles
>> in shape space more challenging.
>>
>> An excess of variables (landmarks) over specimens does NOT pose problems
>> to statistical methods such as the computation of mean shapes and
>> Procrustes distances, PCA, PLS, and the multivariate regression of shape
>> coordinates on some independent variable (shape regression). These methods
>> are based on averages or regressions computed for each variable separately,
>> or on the decomposition of a covariance matrix.
>>
>> Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and
>> relative eigenanalysis require the inversions of a full-rank covariance
>> matrix, which implies an access of specimens over variables. The same
>> applies to many multivariate parametric test statistics, such as
>> Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full
>> rank and thus can never be subjected to any of these methods without prior
>> variable reduction. In fact, reliable results can only be obtained if there
>> are manifold more specimens than variables, which usually requires variable
>> reduction by PCA, PLS or other techniques, or the regularization of
>> covariance matrices (which is more common in the bioinformatic community).
>>
>> For these reasons, I do not see any disadvantage of measuring a large
>> number of landmarks, except for a waste of time perhaps. If life time is an
>> issue, one can optimize landmark schemes as suggested by Jim or Aki.
>>
>> Best,
>>
>> Philipp
>>
>> --
>> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "MORPHMET" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to morphmet+unsubscr...@morphometrics.org.
>>
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" 

Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-03 Thread Norman MacLeod
In discussions like these it would be helpful if the writer could clarify 
whether they are referring to the concepts of biological homology, topological 
homology or "semantic homology". These aren't the same things and the whole 
issue of “homology” in geometric morphometrics has always seemed, at least to 
me, to be very confused. For example, refer to the definitions of “homology” 
and “landmark” in the Glossary on the SB Morphometrics web site. Because it 
means different things to different specialists homology isn't a term to be 
thrown around as lightly as morphometricians seem prone to do. Imprecise and/or 
ambiguous usage renders the meaning of sentences difficult or impossible to 
understand for me and I suspect confuses others as well.

Norm MacLeod


> On 3 Jun 2017, at 08:53, alcardini  wrote:
> 
> Hi Philipp,
> I am not worried about the number of variables (although I am not sure
> one needs thousands of highly correlated points on a relatively simple
> structure and seem to remember that Gunz and you suggest to start with
> many and then reduce as appropriate).
> 
> Regardless of whether point homology makes sense, I am worried that
> many users believe that semilandmarks (maybe after sliding according
> to purely mathematical principles) are the same as "traditional
> landmarks" with a clear one-to-one correspondence. Even saying that
> what's "homologous" is the curve or surface is tricky, because at the
> end of the day that curve/surface is discretized using points, shape
> distances are based on those points and there are many ways of placing
> points with no clear "homology" (figure 7 of Oxnard & O'Higgins,
> 2009); indeed, in a ontogenetic study of the cranial vault, for
> instance, where sutures may become invisible in adults and therefore
> cannot be used as a "boundary", semilandmarks close to the sutures may
> end up on different bones in different stages/individuals.
> 
> Semilandmarks are a fantastic tool, which I am happy to use when
> needed, but they have their own limitations, which one should be aware
> of.
> Cheers
> 
> Andrea
> 
> 
> 
> On 03/06/2017, mitte...@univie.ac.at  wrote:
>> I think a few topics get mixed up here.
>> 
>> Of course, a sample can be too small to be representative (as in Andrea's
>> example), and one should think carefully about the measures to take. It is
>> also clear that an increase in sample size reduces standard errors of
>> statistical estimates, including that of a covariance matrix and its
>> eigenvalues. But, as mentioned by Dean, the standard errors of the
>> eigenvalues are of secondary interest in PCA.
>> 
>> If one has a clear expectation about the signal in the data - and if one
>> does not aim at new discoveries - a few specific measurements may suffice,
>> perhaps even a few distance measurements. But effective exploratory
>> analyses have always been a major strength of geometric morphometrics,
>> enabled by the powerful visualization methods together with the large
>> number of measured variables.
>> 
>> Andrea, I am actually curious what worries you if one "collects between
>> 2700 and 10 400 homologous landmarks from each rib" (whatever the term
>> "homologous" is supposed to mean here)?
>> 
>> Compared to many other disciplines in contemporary biology and biomedicine,
>> 
>> a few thousand variables are not particularly many. Consider, for instance,
>> 
>> 2D and 3D image analysis, FEA, and all the "omics", with millions and
>> billions of variables. In my opinion, the challenge with these "big data"
>> is not statistical power in testing a signal, but finding the signal - the
>> low-dimensional subspace of interest - in the fist place. But this applies
>> to 50 or 100 variables as well, not only to thousands or millions. If no
>> prior expectation about this signal existed (which the mere presence of so
>> many variables usually implies), no hypothesis test should be performed at
>> all. The ignorance of this rule is one of the main reasons why so many GWAS
>> 
>> and voxel-based morphometry studies fail to be replicable.
>> 
>> Best wishes,
>> 
>> Philipp
>> 
>> --
>> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "MORPHMET" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to morphmet+unsubscr...@morphometrics.org.
>> 
> 
> 
> -- 
> 
> Dr. Andrea Cardini
> Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università
> di Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
> tel. 0039 059 2058472
> 
> Adjunct Associate Professor, School of Anatomy, Physiology and Human
> Biology, The University of Western Australia, 35 Stirling Highway,
> Crawley WA 6009, Australia
> 
> E-mail address: alcard...@gmail.com, andrea.card...@unimore.it
> WEBPAGE: https://sites.google.com/site/alcardini/home/main
> 
> FREE Yellow 

Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-03 Thread alcardini
Hi Philipp,
I am not worried about the number of variables (although I am not sure
one needs thousands of highly correlated points on a relatively simple
structure and seem to remember that Gunz and you suggest to start with
many and then reduce as appropriate).

Regardless of whether point homology makes sense, I am worried that
many users believe that semilandmarks (maybe after sliding according
to purely mathematical principles) are the same as "traditional
landmarks" with a clear one-to-one correspondence. Even saying that
what's "homologous" is the curve or surface is tricky, because at the
end of the day that curve/surface is discretized using points, shape
distances are based on those points and there are many ways of placing
points with no clear "homology" (figure 7 of Oxnard & O'Higgins,
2009); indeed, in a ontogenetic study of the cranial vault, for
instance, where sutures may become invisible in adults and therefore
cannot be used as a "boundary", semilandmarks close to the sutures may
end up on different bones in different stages/individuals.

Semilandmarks are a fantastic tool, which I am happy to use when
needed, but they have their own limitations, which one should be aware
of.
Cheers

Andrea



On 03/06/2017, mitte...@univie.ac.at  wrote:
> I think a few topics get mixed up here.
>
> Of course, a sample can be too small to be representative (as in Andrea's
> example), and one should think carefully about the measures to take. It is
> also clear that an increase in sample size reduces standard errors of
> statistical estimates, including that of a covariance matrix and its
> eigenvalues. But, as mentioned by Dean, the standard errors of the
> eigenvalues are of secondary interest in PCA.
>
> If one has a clear expectation about the signal in the data - and if one
> does not aim at new discoveries - a few specific measurements may suffice,
> perhaps even a few distance measurements. But effective exploratory
> analyses have always been a major strength of geometric morphometrics,
> enabled by the powerful visualization methods together with the large
> number of measured variables.
>
> Andrea, I am actually curious what worries you if one "collects between
> 2700 and 10 400 homologous landmarks from each rib" (whatever the term
> "homologous" is supposed to mean here)?
>
> Compared to many other disciplines in contemporary biology and biomedicine,
>
> a few thousand variables are not particularly many. Consider, for instance,
>
> 2D and 3D image analysis, FEA, and all the "omics", with millions and
> billions of variables. In my opinion, the challenge with these "big data"
> is not statistical power in testing a signal, but finding the signal - the
> low-dimensional subspace of interest - in the fist place. But this applies
> to 50 or 100 variables as well, not only to thousands or millions. If no
> prior expectation about this signal existed (which the mere presence of so
> many variables usually implies), no hypothesis test should be performed at
> all. The ignorance of this rule is one of the main reasons why so many GWAS
>
> and voxel-based morphometry studies fail to be replicable.
>
> Best wishes,
>
> Philipp
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to morphmet+unsubscr...@morphometrics.org.
>


-- 

Dr. Andrea Cardini
Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università
di Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
tel. 0039 059 2058472

Adjunct Associate Professor, School of Anatomy, Physiology and Human
Biology, The University of Western Australia, 35 Stirling Highway,
Crawley WA 6009, Australia

E-mail address: alcard...@gmail.com, andrea.card...@unimore.it
WEBPAGE: https://sites.google.com/site/alcardini/home/main

FREE Yellow BOOK on Geometric Morphometrics:
http://www.italian-journal-of-mammalogy.it/public/journals/3/issue_241_complete_100.pdf

ESTIMATE YOUR GLOBAL FOOTPRINT:
http://www.footprintnetwork.org/en/index.php/GFN/page/calculators/

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.



Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread David Thulman
Hello,
I'm an archaeologist who works on artifacts in North America. There are not
many of us that use LGM, but even we can't seem to agree on how many LMs
are appropriate. Because I use discriminant function analysis as the
workhorse for discriminating groups of artifacts, I worry about the misuse
of that technique. One thing I've read (e.g., Qiao et al. 2009) in regards
to DFA is that too many variables (LMs) can affect its discriminatory power
through data piling or the related phenomenon of overfitting. I have seen
this in my practice but have not tested it rigorously. By reducing the
number of LMs, I can sometimes get better discrimination between groups.

Numbers of artifacts (specimens) is not a problem. I'm about to embark on a
regional analysis using 1000's.

Does anyone who understands this phenomenon better than I do care to
comment?

Thanks,
Dave Thulman

On Fri, Jun 2, 2017 at 6:12 PM, mitte...@univie.ac.at  wrote:

> I think a few topics get mixed up here.
>
> Of course, a sample can be too small to be representative (as in Andrea's
> example), and one should think carefully about the measures to take. It is
> also clear that an increase in sample size reduces standard errors of
> statistical estimates, including that of a covariance matrix and its
> eigenvalues. But, as mentioned by Dean, the standard errors of the
> eigenvalues are of secondary interest in PCA.
>
> If one has a clear expectation about the signal in the data - and if one
> does not aim at new discoveries - a few specific measurements may suffice,
> perhaps even a few distance measurements. But effective exploratory
> analyses have always been a major strength of geometric morphometrics,
> enabled by the powerful visualization methods together with the large
> number of measured variables.
>
> Andrea, I am actually curious what worries you if one "collects between
> 2700 and 10 400 homologous landmarks from each rib" (whatever the term
> "homologous" is supposed to mean here)?
>
> Compared to many other disciplines in contemporary biology and
> biomedicine, a few thousand variables are not particularly many. Consider,
> for instance, 2D and 3D image analysis, FEA, and all the "omics", with
> millions and billions of variables. In my opinion, the challenge with these
> "big data" is not statistical power in testing a signal, but finding the
> signal - the low-dimensional subspace of interest - in the fist place. But
> this applies to 50 or 100 variables as well, not only to thousands or
> millions. If no prior expectation about this signal existed (which the mere
> presence of so many variables usually implies), no hypothesis test should
> be performed at all. The ignorance of this rule is one of the main reasons
> why so many GWAS and voxel-based morphometry studies fail to be replicable.
>
> Best wishes,
>
> Philipp
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to morphmet+unsubscr...@morphometrics.org.
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread mitte...@univie.ac.at
I think a few topics get mixed up here.

Of course, a sample can be too small to be representative (as in Andrea's 
example), and one should think carefully about the measures to take. It is 
also clear that an increase in sample size reduces standard errors of 
statistical estimates, including that of a covariance matrix and its 
eigenvalues. But, as mentioned by Dean, the standard errors of the 
eigenvalues are of secondary interest in PCA.

If one has a clear expectation about the signal in the data - and if one 
does not aim at new discoveries - a few specific measurements may suffice, 
perhaps even a few distance measurements. But effective exploratory 
analyses have always been a major strength of geometric morphometrics, 
enabled by the powerful visualization methods together with the large 
number of measured variables.

Andrea, I am actually curious what worries you if one "collects between 
2700 and 10 400 homologous landmarks from each rib" (whatever the term 
"homologous" is supposed to mean here)? 

Compared to many other disciplines in contemporary biology and biomedicine, 
a few thousand variables are not particularly many. Consider, for instance, 
2D and 3D image analysis, FEA, and all the "omics", with millions and 
billions of variables. In my opinion, the challenge with these "big data" 
is not statistical power in testing a signal, but finding the signal - the 
low-dimensional subspace of interest - in the fist place. But this applies 
to 50 or 100 variables as well, not only to thousands or millions. If no 
prior expectation about this signal existed (which the mere presence of so 
many variables usually implies), no hypothesis test should be performed at 
all. The ignorance of this rule is one of the main reasons why so many GWAS 
and voxel-based morphometry studies fail to be replicable.

Best wishes,

Philipp

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


RE: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread Murat Maga
Just to comment.

While it is worthwhile to investigate these issues, in my experience same sizes 
are limited not because investigators are NOT willing to measure more 
specimens, but there are no additional specimens to include in the analysis, 
especially for studies based on natural populations, or historical collections.

M


From: William Gelnaw [mailto:wgel...@gmail.com]
Sent: Wednesday, May 31, 2017 3:41 PM
To: mitte...@univie.ac.at
Cc: MORPHMET <morphmet@morphometrics.org>
Subject: Re: [MORPHMET] Re: number of landmarks and sample size

I'm currently working on a paper that deals with the problem of 
over-parameterizing PCA in morphometrics.  The recommendations that I'm making 
in the paper are that you should try to have at least 3 times as many samples 
as variables.  That means that if you have 10 2D landmarks, you should have at 
least 60 specimens that you measure.  Based on simulations, if you have fewer 
than 3 specimens per variable, you quickly start getting eigenvalues for a PCA 
that are very different from known true eigenvalues.  I did a literature survey 
and about a quarter of morphometrics studies in the last decade haven't met 
that standard.  A good way to test if you have enough samples is to do a 
jackknife analysis.  If you cut out about 10% of your observations and still 
get the same eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, 
mitte...@univie.ac.at<mailto:mitte...@univie.ac.at> 
<mitte...@univie.ac.at<mailto:mitte...@univie.ac.at>> wrote:
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to the 
biological question or not. This can be advantageous for the reconstruction of 
shapes, especially when producing 3D morphs by warping dense surface 
representations. Basic developmental or evolutionary trends, group structures, 
etc., often are visible in an ordination analysis with a smaller set of 
relevant landmarks; finer anatomical resolution not necessarily affects these 
patterns. However, adding more landmarks cannot reduce or even remove any 
signals that were found with less landmarks, but it can make ordination 
analyses and the interpretation distances and angles in shape space more 
challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates on 
some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix.

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative 
eigenanalysis require the inversions of a full-rank covariance matrix, which 
implies an access of specimens over variables. The same applies to many 
multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, 
etc. But shape coordinates are NEVER of full rank and thus can never be 
subjected to any of these methods without prior variable reduction. In fact, 
reliable results can only be obtained if there are manifold more specimens than 
variables, which usually requires variable reduction by PCA, PLS or other 
techniques, or the regularization of covariance matrices (which is more common 
in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large number of 
landmarks, except for a waste of time perhaps. If life time is an issue, one 
can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>.

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


RE: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread Adams, Dean [EEOBS]
Will,

I’m not quite sure what over-parameterizing means in the case of PCA, as it is 
simply a rigid-rotation of the dataspace and does not provide parameters for 
statistical inference.

As for the distribution of eigenvalues, this of course is based on the 
underlying covariance matrix for the traits, which in turn will be affected by 
sample size. However, when traits become even mildly correlated (as is 
certainly the case for landmark coordinates), the distribution of eigenvalues 
of the covariance matrix becomes much better behaved. Specifically, the 
eigenvalues associated with low and high PC axes are less extreme than is 
observed with uncorrelated traits. That implies greater stability in their 
estimation, as the covariance matrix is further from singular (see the large 
statistical literature on the condition of a covariance matrix and subsequent 
estimation issues for ill-behaved covariance matrices).

Best,

Dean


Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
   Department of Statistics
Iowa State University
www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/>
phone: 515-294-3834

From: William Gelnaw [mailto:wgel...@gmail.com]
Sent: Wednesday, May 31, 2017 5:41 PM
To: mitte...@univie.ac.at
Cc: MORPHMET <morphmet@morphometrics.org>
Subject: Re: [MORPHMET] Re: number of landmarks and sample size

I'm currently working on a paper that deals with the problem of 
over-parameterizing PCA in morphometrics.  The recommendations that I'm making 
in the paper are that you should try to have at least 3 times as many samples 
as variables.  That means that if you have 10 2D landmarks, you should have at 
least 60 specimens that you measure.  Based on simulations, if you have fewer 
than 3 specimens per variable, you quickly start getting eigenvalues for a PCA 
that are very different from known true eigenvalues.  I did a literature survey 
and about a quarter of morphometrics studies in the last decade haven't met 
that standard.  A good way to test if you have enough samples is to do a 
jackknife analysis.  If you cut out about 10% of your observations and still 
get the same eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, 
mitte...@univie.ac.at<mailto:mitte...@univie.ac.at> 
<mitte...@univie.ac.at<mailto:mitte...@univie.ac.at>> wrote:
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to the 
biological question or not. This can be advantageous for the reconstruction of 
shapes, especially when producing 3D morphs by warping dense surface 
representations. Basic developmental or evolutionary trends, group structures, 
etc., often are visible in an ordination analysis with a smaller set of 
relevant landmarks; finer anatomical resolution not necessarily affects these 
patterns. However, adding more landmarks cannot reduce or even remove any 
signals that were found with less landmarks, but it can make ordination 
analyses and the interpretation distances and angles in shape space more 
challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates on 
some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix.

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative 
eigenanalysis require the inversions of a full-rank covariance matrix, which 
implies an access of specimens over variables. The same applies to many 
multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, 
etc. But shape coordinates are NEVER of full rank and thus can never be 
subjected to any of these methods without prior variable reduction. In fact, 
reliable results can only be obtained if there are manifold more specimens than 
variables, which usually requires variable reduction by PCA, PLS or other 
techniques, or the regularization of covariance matrices (which is more common 
in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large number of 
landmarks, except for a waste of time perhaps. If life time is an issue, one 
can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>.

--
MORPHMET may be accessed v

Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread William Gelnaw
I'm currently working on a paper that deals with the problem of
over-parameterizing PCA in morphometrics.  The recommendations that I'm
making in the paper are that you should try to have at least 3 times as
many samples as variables.  That means that if you have 10 2D landmarks,
you should have at least 60 specimens that you measure.  Based on
simulations, if you have fewer than 3 specimens per variable, you quickly
start getting eigenvalues for a PCA that are very different from known true
eigenvalues.  I did a literature survey and about a quarter of
morphometrics studies in the last decade haven't met that standard.  A good
way to test if you have enough samples is to do a jackknife analysis.  If
you cut out about 10% of your observations and still get the same
eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at <
mitte...@univie.ac.at> wrote:

> Adding more (semi)landmarks inevitably increases the spatial resolution
> and thus allows one to capture finer anatomical details - whether relevant
> to the biological question or not. This can be advantageous for the
> reconstruction of shapes, especially when producing 3D morphs by warping
> dense surface representations. Basic developmental or evolutionary trends,
> group structures, etc., often are visible in an ordination analysis with a
> smaller set of relevant landmarks; finer anatomical resolution not
> necessarily affects these patterns. However, adding more landmarks cannot
> reduce or even remove any signals that were found with less landmarks, but
> it can make ordination analyses and the interpretation distances and angles
> in shape space more challenging.
>
> An excess of variables (landmarks) over specimens does NOT pose problems
> to statistical methods such as the computation of mean shapes and
> Procrustes distances, PCA, PLS, and the multivariate regression of shape
> coordinates on some independent variable (shape regression). These methods
> are based on averages or regressions computed for each variable separately,
> or on the decomposition of a covariance matrix.
>
> Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and
> relative eigenanalysis require the inversions of a full-rank covariance
> matrix, which implies an access of specimens over variables. The same
> applies to many multivariate parametric test statistics, such as
> Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full
> rank and thus can never be subjected to any of these methods without prior
> variable reduction. In fact, reliable results can only be obtained if there
> are manifold more specimens than variables, which usually requires variable
> reduction by PCA, PLS or other techniques, or the regularization of
> covariance matrices (which is more common in the bioinformatic community).
>
> For these reasons, I do not see any disadvantage of measuring a large
> number of landmarks, except for a waste of time perhaps. If life time is an
> issue, one can optimize landmark schemes as suggested by Jim or Aki.
>
> Best,
>
> Philipp
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to morphmet+unsubscr...@morphometrics.org.
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Re: number of landmarks and sample size

2017-05-31 Thread andrea cardini

Dear All,
I'd like to add a few comments on sampling (landmarks but also 
specimens). I hope that some of the other subscribers, who know much 
more than I do about morphometrics, will refine and correct my points.



A very short one on my two papers. They make a very simple point: if one 
is landmarking just one side of a structure with object symmetry simply 
to speed up data collection, then mirror-reconstructing the missing side 
will make a nicer visualization and probably make shape data which are 
closer to those obtained by landmarking both sides. The difference may 
be tiny and I said "probably" because I am reporting results of 
empirical studies: out of 11-12 datasets, all but one had shape 
distances closer to those of the full bilateral landmark data after 
mirror-reconstructing the missing side. This did not work in one dataset 
which happened to have a very large amount of fluctuating asymmetry.
To what extent these results are generalizable, I can't say but everyone 
can plan a small preliminary analysis to check it in her/his own data.


I fully agree with Aki that, if time, money etc. are not a constraint, 
even when one is not interested in asymmetry, it is better to measure 
both sides. That's in fact true also for structures with matching symmetry.



In terms of the choice of landmarks, I wish to stress (once more!) that 
quality may be more important than quantity: first one should think well 
about what she/he wants to measure, which will relate to the specific 
question being asked, and then decide about where and how many landmarks 
to use. There are at least two wonderful papers I suggested several 
times on this issue:

Oxnard & O'Higgins, 2009, Biological Theory 4(1), 84–97.
Klingenberg, 2008, Evol Biol 35:186–190

Then, especially for semilandmarks, I guess that as Aki (and others 
before) suggested, one can see what a good compromise is between 
information and the number of points (maybe considering also, but not 
principally, the visualization).



For sample size, one should consider whether differences are presumably 
big (and a small sample might be OK...ish) or small (as in most 
microevolutionary studies, which generally require large N). I believe 
that Rohlf, already in the early days of geometric morphometrics, had 
written a software for exploring statistical power in shape data 
(TPSPower) but I am not sure if he kept developing it. In any case, 
power and sensitivity (to sampling) analyeses are certainly available in R.
With small differences, although resampling methods may allow to perform 
tests even with tiny samples, power will be low and estimates (say, mean 
size and shape, variance and covariance etc.) will be likely inaccurate.
Unfortunately, often, the most interesting taxa are rare populations (or 
fossils) for which specimens are difficult to find.


A couple of people told me that there's an important paper coming out 
soon on sampling error in geometric morphometrics and it might suggest 
that one really needs huge samples. I would not be surprised and suspect 
that the few empirical studies we did (a couple of papers in 
Zoomorphology) were overoptimistic despite already suggesting (more or 
less) that one might need several dozens of specimens even when 
differences are relatively large and the number of landmarks was not 
particularly large. Again, they were empirical studies and one cannot 
say how generalizable they are.
Anyway, I look forward to this new paper and hope it will be announced 
in MORPHMET, as well as I look forward to Aki's paper.



Cheers

Andrea


On 29/05/17 18:35, Aki Watanabe wrote:

Dear Lea,

Unfortunately, there isn't (yet) a magic mathematical formula to 
determine whether you've sampled enough landmarks, but there are some 
exploratory approaches you can take to see if you're landmark sampling 
is converging to the "true" shape variation. One simple thing you can do 
is sample as many landmarks as you can on a representative sampling of 
specimens, then create a PC morphospace. Then, subsample the landmarks 
(e.g., 75%, 50%, 25% of the landmarks) and see if the PC morphospace 
from these subsampled datasets mirror the distribution of shapes of the 
full dataset. If the morphospaces begin deviating from the PC 
morphospace of the full dataset, then you have a visual cue that the 
subsampling is not adequately characterizing the shape variation of your 
specimens. In terms of a statistically significant test for landmark 
sampling, I suppose one can test for correlation between subsampled and 
full dataset, but because the subsampled and full dataset will be 
auto-correlated to some extent, the null would have to reflect this.


Alternatively, I have a script that automatically subsamples the 
landmarks of a given dataset and creates a plot to see how well the 
subsampled datasets converge to the point distribution of the full 
dataset. If you are interested, I would be happy to describe the 
technique in more detail