[MORPHMET] Multivariate Analysis of Genotype-Phenotype Association

2016-02-25 Thread mitte...@univie.ac.at
I would like to announce a new paper, of which the early view pdf is 
already online.

Mitteroecker P, Cheverud JM, Pavlicev M (2016) Multivariate Analysis of 
Genotype-Phenotype Association. Genetics

http://www.genetics.org/content/early/2016/02/18/genetics.115.181339

It offers an exploratory strategy for mapping multivariate data and is 
particularly suited for geometric morphometrics. The new method identifies 
patterns of allelic variation (genetic latent variables) that are maximally 
associated - in terms of effect size - with patterns of phenotypic 
variation (phenotypic latent variables). It thereby separates phenotypic 
features under strong genetic control from less genetically determined 
features and thus permits an analysis of the multivariate structure of 
genotype-phenotype association, including its "dimensionality" and the 
clustering of genetic and phenotypic variables within this association. 

Best,

Philipp Mitteroecker

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Sliding Semilandmarks

2016-02-21 Thread mitte...@univie.ac.at

As Michael described, the average shape configuration affects the sliding 
when used as reference for the TPS; the final configurations thus are 
sample-dependent. However, if the curves/surfaces are covered densely 
enough by the semilandmarks (e.g., to avoid that a semilandmark can slide 
away from a relevant region), Procrustes distances are quite stable. Dense 
sampling can also improve the estimation of the tangents.

If the semilandmarks slide a lot relative to the local curvature, they get 
off the curve. Of course, they can be projected back, but the following 
trick often is sufficient: Instead of the full amount of sliding, let all 
the semilandmarks slide just a fraction of the computed distance, say 20% 
(multiply T by 0.2 in equation of 4 of Gunz et al. 2005). Then update the 
tangents and let the semilandmarks slide again a fraction of the computed 
distance, etc. This requires more iterations but keeps the semilandmarks 
closer to the curve or surface.

Also when minimizing Procrustes distance instead of BE, these distances are 
reduced relative to the sample average. But as for the superimposition 
itself, the sample configuration has only limited effect on the final 
configurations for small to moderate shape variation. (If variation is very 
large, the analysis is problematic anyway.) Note that the full sample must 
be slid together for a joint analysis (i.e., don't slide each population 
separately and then analyze them together). 

The choice of the minimization criterion (Proc dist versus BE) can lead to 
different configurations. For most datasets, this difference is negligible, 
but in some situations it can matter. For example, when minimizing Proc 
dist semilandmarks can change their order or slide across a real landmark, 
whereas this is almost impossible for minimizing BE (changing order would 
have a very high BE). On the other hand, minimizing BE does not minimize 
affine shape variation (because it has zero BE). If affine shape variation 
is not constrained by real landmarks, this can lead to strange results. For 
instance, I had a dataset of mandibular cross-sections, which were U-shaped 
with real landmarks only at the two upper ends and semilandmarks 
in-between. Affine variation thus was not properly controlled. After BE 
sliding, the group differences comprised a lot of (meaningless) affine 
differences. I thus decided for minimizing Proc dist. Usually, though, I 
prefer minimizing BE because its is closer to our biological understanding 
of homology, including the preservation of landmark order and large scale 
shape features. Minimizing BE leads to smoother TPS deformation grids, 
whereas miminizing Proc dists leads to smaller sum of squares.

Note that when updating the reference configuration in each iteration, the 
algorithm can converge to quite undesired minima (e.g. all semilandmarks 
collapse to a single point). This can be avoided by iterating just a few 
times, which is usually enough, or by keeping the reference constant at 
some point in the algorithm. In general, the more the semilandmarks are 
constrained by real landmarks and the smoother the curves, the more stable 
is the algorithm.

Because of these issues, it is important to apply the semilandmark 
algorithm carefully, especially for 3D surfaces. Always check the tangents 
and how the semilandmarks slide along these tangents. Check how the total 
sliding reduces from one iteration to the next, and interpret the final 
pattern of shape variation in the light of the property being minimized.

Best wishes,

Philipp Mitteroecker






Am Donnerstag, 18. Februar 2016 18:41:44 UTC+1 schrieb Collyer, Michael:
>
> Andrea, 
>
> I like to think of semilandmark sliding as iteratively finding fitted 
> (predicted) values for the generalized linear model fit described by Gunz 
> et al. (2005) (equation 4), and updating coordinates by these values until 
> there is no more meaningful change (with regard to an acceptable 
> criterion).  If Bending energy is not used, the bending energy matrix is 
> replaced by an identity matrix (i.e., independence), which produces the 
> minimized Procrustes distance version of the sliding algorithm.  (This is 
> is the same as ordinary least squares being a simplification of generalized 
> least squares by using an identity matrix for the covariance matrix in GLS 
> estimation of parameters.)  Calculating the bending energy matrix requires 
> using the reference configuration.  The hat matrix calculated in the 
> process is typically post-multiplied by the target coordinates centered by 
> the reference configuration.  Changing the reference should, therefore, 
> change the solution.  Also, let’s not forget that with surface points, if 
> we follow the Gunz et al. (2005) recommendation, 5 nearest neighbors are 
> used to estimate the principal components for defining a tangent plane. 
>  One could use more nearest neighbors, which would change the tangent 
> planes.  One could 

[MORPHMET] Postdoc and Ph.D. position at the Univ. of Vienna on the modelling of developmental canalization in

2016-08-12 Thread mitte...@univie.ac.at
In the working group of Philipp Mitteroecker in the Department of 
Theoretical Biology, University of Vienna, a two-year postdoc position and 
a three-year Ph.D. position are vacant. We are searching for enthusiastic 
persons, who are dedicated to interdisciplinary research in biology and 
physical anthropology. The positions are part of an FWF-funded project on 
developmental canalization in the human head, which comprises mathematical, 
statistical, and genetical approaches using a diversity of 2D and 3D 
morphometric data. The Dept. of Theoretical Biology is well known for its 
experience in quantitative and theoretical work in evolutionary and 
developmental biology and is part of a strong national and international 
research network (http://theoretical.univie.ac.at; 
http://www.univie.ac.at/evolvienna/).

The postdoc should primarily work on the mathematical and statistical 
modelling of growth processes and the quantitative genetic analysis of 
craniofacial shape (GWAS). A strong background in mathematical or 
statistical biology, programming experience in Mathematica or R, as well as 
excellent English writing and speech skills are required. A Ph.D. degree in 
a related field must be completed by the date of hire. Expertise in 
geometric morphometrics, human anatomy, EvoDevo, or genetics is 
advantageous. The candidate should have a publication record demonstrating 
his or her skills.

The Ph.D. should primarily work on the geometric morphometric analysis of 
cranial and facial development. A background in human anatomy, physical 
anthropology, or orthodontics, along with basic experience in programming 
and statistical analysis, as well as good English writing and speech skills 
rare required. A master's degree in a related field must be completed by 
the date of hire. Expertise in geometric morphometrics, genetics, or 
orthodontics is advantageous. 

Please submit applications including CV, list of publications, and a 
statement of research interests to Philipp Mitteroecker 
(philipp.mitteroec...@univie.ac.at) until Sept. 5.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


[MORPHMET] PhD position in biological anthropology and morphometrics

2017-01-25 Thread mitte...@univie.ac.at
We seek a PhD student for a three-year position in Vienna, working on the 
morphometrics of human pelvises in relation to childbirth and motherhood:  

www.oeaw.ac.at/fileadmin/subsites/Jobs/Job_offer_VAMOS_PhD_position_1_.pdf

Best,

Philipp Mitteroecker 

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


[MORPHMET] Re: Does a research sample need to be normally distributed (male/female ratio) for PCA?

2017-05-25 Thread mitte...@univie.ac.at
As an exploratory technique, PCA makes no distributional assumptions; it is 
used to explore the empirical distribution of the data. The sample does not 
need to be balanced with regard to sex or other grouping variables, but 
larger groups have a stronger effect on the PCA than smaller groups.

The origin of the coordinate system is arbitrary. However, many software 
packages center the data so that the origin (i.e. where the axes intersect) 
equals the mean value. 



Am Donnerstag, 25. Mai 2017 09:58:31 UTC+2 schrieb Helmi Hadi:
>
> Dear morphometricians, 
>
> Does a sample need to be normally distributed when conducting PCA in 
> geometric morphometrics? Sometimes due to research constraints there are no 
> samples of the opposite sex. Someone was asking me this question, and I do 
> not have the answer. When I look at the data distribution, there is quite 
> an imbalance male/female population. However, the classifiers male/female 
> and species are there and you can sort of tell which group belongs to 
> where. My only fear is that the confidence ellipse for the males are being 
> "gravitated" towards the females for one species as that species does not 
> have any male specimens. Attached are the file which I have recreated the 
> dataset based on memory. 
>
> Is this kind of data acceptable or publishable? 
>
> My own personal question is based on the GMM results given in MorphoJ. The 
> PC1/PC2 axes does not intersect at the middle (which I have personally 
> drawn the dotted line there). I don't mind this output, but does it matter 
> to have the axes cut at the 0 value? The data data distribution does not 
> change with the change of axes lines. I noticed some GMM papers have the 
> axes at 0. 
>
> Thanks all for the help,
>
> Helmi Hadi,
> School of Health Scienes, 
> Universiti Sains Malaysia
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread mitte...@univie.ac.at
I think a few topics get mixed up here.

Of course, a sample can be too small to be representative (as in Andrea's 
example), and one should think carefully about the measures to take. It is 
also clear that an increase in sample size reduces standard errors of 
statistical estimates, including that of a covariance matrix and its 
eigenvalues. But, as mentioned by Dean, the standard errors of the 
eigenvalues are of secondary interest in PCA.

If one has a clear expectation about the signal in the data - and if one 
does not aim at new discoveries - a few specific measurements may suffice, 
perhaps even a few distance measurements. But effective exploratory 
analyses have always been a major strength of geometric morphometrics, 
enabled by the powerful visualization methods together with the large 
number of measured variables.

Andrea, I am actually curious what worries you if one "collects between 
2700 and 10 400 homologous landmarks from each rib" (whatever the term 
"homologous" is supposed to mean here)? 

Compared to many other disciplines in contemporary biology and biomedicine, 
a few thousand variables are not particularly many. Consider, for instance, 
2D and 3D image analysis, FEA, and all the "omics", with millions and 
billions of variables. In my opinion, the challenge with these "big data" 
is not statistical power in testing a signal, but finding the signal - the 
low-dimensional subspace of interest - in the fist place. But this applies 
to 50 or 100 variables as well, not only to thousands or millions. If no 
prior expectation about this signal existed (which the mere presence of so 
many variables usually implies), no hypothesis test should be performed at 
all. The ignorance of this rule is one of the main reasons why so many GWAS 
and voxel-based morphometry studies fail to be replicable.

Best wishes,

Philipp

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


[MORPHMET] Re: number of landmarks and sample size

2017-05-31 Thread mitte...@univie.ac.at
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to 
the biological question or not. This can be advantageous for the 
reconstruction of shapes, especially when producing 3D morphs by warping 
dense surface representations. Basic developmental or evolutionary trends, 
group structures, etc., often are visible in an ordination analysis with a 
smaller set of relevant landmarks; finer anatomical resolution not 
necessarily affects these patterns. However, adding more landmarks cannot 
reduce or even remove any signals that were found with less landmarks, but 
it can make ordination analyses and the interpretation distances and angles 
in shape space more challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates 
on some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix. 

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and 
relative eigenanalysis require the inversions of a full-rank covariance 
matrix, which implies an access of specimens over variables. The same 
applies to many multivariate parametric test statistics, such as 
Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full 
rank and thus can never be subjected to any of these methods without prior 
variable reduction. In fact, reliable results can only be obtained if there 
are manifold more specimens than variables, which usually requires variable 
reduction by PCA, PLS or other techniques, or the regularization of 
covariance matrices (which is more common in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large 
number of landmarks, except for a waste of time perhaps. If life time is an 
issue, one can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


[MORPHMET] Open call for a "Professor of Theoretical Evolutionary Biology" at the University of Vienna

2018-04-05 Thread mitte...@univie.ac.at
Dear morphometrics community,

Perhaps this call is of interest to some of you.

Best,

Philipp


At the Faculty of Life Sciences of the University of Vienna the position of 
a "University Professor of Theoretical Evolutionary Biology" is to be 
filled.
 
The advertised professorship shall cover the advancement and application of 
theoretical approaches – including conceptual, mathematical, and 
statistical analysis – to different levels of organismal complexity. The 
candidate should have a background both in biology and a theoretical or 
computational discipline, with an emphasis on interdisciplinary and 
integrative research. Preference is given to approaches to understand 
biological systems from the molecular to the inter-organismal level, 
encompassing developmental to evolutionary time scales. To foster and 
complement cooperation among the research groups, the candidate's research 
should link to animal development, behavior or morphology, in an 
evolutionary context. The candidate should be enthusiastic to teach a 
theoretical discipline (e.g. mathematics, statistics, systems theory) and 
its application to evolutionary and organismal biology.

For more details and application see:
http://personalwesen.univie.ac.at/jobs-recruiting/professuren/detail-seite/news/theoretical-evolutionary-biology/?no_cache=1=d6099e9c6a5bf9ed84f0a773cc6ea192

Please note that the application deadline is 15 April 2018.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Re: semilandmarks in biology

2018-11-06 Thread mitte...@univie.ac.at
Yes, it was always well known that sliding adds covariance but this is 
irrelevant for most studies, especially for group mean comparisons and 
shape regressions: the kind of studies for which GMM is most efficient, as 
Jim noted. 
If you consider the change of variance-covariance structure due to (a small 
amount of) sliding as an approximately linear transformation, then the 
sliding is also largely irrelevant for CVA, relative PCA, Mahalanobis 
distance and the resulting group classifications, as they are all based on 
the relative eigenvalues of two covariance matrices and thus unaffected by 
linear transformations. In other words, in the lack of a reasonable 
biological null model, the interpretation of a single covariance structure 
is very difficult, but the way in which one covariance structure deviates 
from another can be interpreted much easier. 

Concerning your example: The point is that there is no useful model of 
"totally random data" (but see Bookstein 2015 Evol Biol). Complete 
statistical independence of shape coordinates is geometrically impossible 
and biologically absurd. Under which biological (null) model can two parts 
of a body, especially two traits on a single skeletal element such as the 
cranium, be complete uncorrelated?  

Clearly, semilandmarks are not always necessary, but making "cool pictures" 
can be quite important in its own right for making good biology, especially 
in exploratory settings. Isn't the visualization one of the primary 
strengths of geometric morphometrics?

It is perhaps also worth noting that one can avoid a good deal of the 
additional covariance resulting from sliding. Sliding via minimizing 
bending energy introduces covariance in the position of the semilandmarks 
_along_ the curve/surface. In some of his analyses, Fred Bookstein just 
included the coordinate perpendicular to the curve/surface for the 
semilandmarks, thus discarding a large part of the covariance. Note also 
that sliding via minimizing Procrustes distance introduces only little 
covariance among semilandmarks because Procrustes distance is minimized 
independently for each semilandmark (but the homology function implied here 
is biologically not so appealing). 

Best,

Philipp



Am Dienstag, 6. November 2018 18:34:51 UTC+1 schrieb alcardini:
>
> Yes, but doesn't that also add more covariance that wasn't there in 
> the first place? 
> Neither least squares nor minimum bending energy, that we minimize for 
> sliding, are biological models: they will reduce variance but will do 
> it in ways that are totally biologically arbitrary. 
>
> In the examples I showed sliding led to the appearance of patterns 
> from totally random data and that effect was much stronger than 
> without sliding. 
> I neither advocate sliding or not sliding. Semilandmarks are different 
> from landmarks and more is not necessarily better. There are 
> definitely some applications where I find them very useful but many 
> more where they seem to be there just to make cool pictures. 
>
> As Mike said, we've already had this discussion. Besides different 
> views on what to measure and why, at that time I hadn't appreciated 
> the problem with p/n and the potential strength of the patterns 
> introduced by the covariance created by the superimposition (plus 
> sliding!). 
>
> Cheers 
>
> Andrea 
>
> On 06/11/2018, F. James Rohlf > 
> wrote: 
> > I agree with Philipp but I would like to add that the way I think about 
> the 
> > justification for the sliding of semilandmarks is that if one were smart 
> > enough to know exactly where the most meaningful locations are along 
> some 
> > curve then one should just place the points along the curve and 
> > computationally treat them as fixed landmarks. However, if their exact 
> > positions are to some extend arbitrary (usually the case) although still 
> > along a defined curve then sliding makes sense to me as it minimizes the 
> > apparent differences among specimens (the sliding minimizes your measure 
> of 
> > how much specimens differ from each other or, usually, the mean shape. 
> > 
> > 
> > 
> > _ _ _ _ _ _ _ _ _ 
> > 
> > F. James Rohlf, Distinguished Prof. Emeritus 
> > 
> > 
> > 
> > Depts. of Anthropology and of Ecology & Evolution 
> > 
> > 
> > 
> > 
> > 
> > From: mitt...@univie.ac.at   > 
> > Sent: Tuesday, November 6, 2018 9:09 AM 
> > To: MORPHMET > 
> > Subject: [MORPHMET] Re: semilandmarks in biology 
> > 
> > 
> > 
> > I agree only in part. 
> > 
> > 
> > 
> > Whether or not semilandmarks "really are needed" may be hard to say 
> > beforehand. If the signal is known well enough before the study, even a 
> > single linear distance or distance ratio may suffice. In fact, most 
> > geometric morphometric studies are characterized by an oversampling of 
> > (anatomical) landmarks as an exploratory strategy: it allows for 
> unexpected 
> > findings (and nice visualizations). 
> > 
> > 
> > 
> > Furthermore, there is a fundamental difference 

[MORPHMET] Re: Are more semi landmarks better??

2018-11-06 Thread mitte...@univie.ac.at
I'd like to respond to your question because it comes up so often.

As noted by Carmelo in the other posting, a large number of variables 
relative to the number of cases can lead to statistical problems. But often 
it does not.

In all analyses that treat each variable separately - including the 
computation of mean shapes and shape regressions - the number of variables 
does NOT matter! Also in principal component analysis (PCA) and 
between-group PCA there is NO restriction on the number of variables. 
However, the distribution of landmarks across the organism can influence 
the results. E.g., if one part - say the face - is covered only by a few 
anatomical landmarks, and another part - e.g., the neurocranium - by many 
semilandmarks, the latter one will dominate PCA results. But this holds 
true for all kinds of landmarks and variables, not only for semilandmarks.

Analyses that involve the inversion of a covariance matrix - such as 
multiple regression, CVA, relative eigenanalysis, reduced rank regression, 
and parametric multivariate tests - require a clear excess of cases over 
variables. In any truly multivariate setting (such as geometric 
morphometrics), these analyses - if unavoidable - should ALWAYS be preceded 
by some sort of variable reduction and/or factor analysis. Again, this is 
not specific to semilandmarks.

Partial least squares (PLS) is somewhat in-between these to groups. As 
shown in Bookstein's 2016 paper, the singular values (maximal covariances) 
in PLS can be strongly inflated if the number of variables is large 
compared to the number of cases. The singular vectors, however, are more 
stable.

Essentially, the number of semilandmarks should be determined based on the 
anatomical details to be captured. More semilandmarks are not "harmful," 
perhaps just a waste of time.

Best,

Philipp Mitteroecker




 


Am Montag, 5. November 2018 18:52:57 UTC+1 schrieb Diego Ardón:
>
> Good day everybody, I actually have twoI'd like to respond to your 
> question because it comes up so often.
>
> As noted by Carmelo in the other posting, a large number of variables 
> relative to the number of cases can lead to statistical problems. But often 
> it does not.
>
> In all analyses that treat each variable separately - including the 
> computation of mean shapes and shape regressions - the number of variables 
> does NOT matter! Also in principal component analysis (PCA) and 
> between-group PCA there is NO restriction on the number of variables. 
> However, the distribution of landmarks across the organism can influence 
> the results. E.g., if one part - say the face - is covered only by a few 
> anatomical landmarks, and another part - e.g., the neurocranium - by many 
> semilandmarks, the latter one will dominate PCA results. But this holds 
> true for all kinds of landmarks and variables, not only for semilandmarks.
>
> Analyses that involve the inversion of a covariance matrix - such as 
> multiple regression, CVA, relative eigenanalysis, reduced rank regression, 
> and parametric multivariate tests - require a clear excess of cases over 
> variables. In any truly multivariate setting (such as geometric 
> morphometrics), these analyses - if unavoidable - should ALWAYS be preceded 
> by some sort of variable reduction and/or factor analysis. Again, this is 
> not specific to semilandmarks.
>
> Partial least squares (PLS) is somewhat in-between these to groups. As 
> shown in Bookstein's 2016 paper, the singular values (maximal covariances) 
> in PLS can be strongly inflated if the number of variables is large 
> compared to the number of cases. The singular vectors, however, are more 
> stable.
>
> Essentially, the number of semilandmarks should be determined based on the 
> anatomical details to be captured. More semilandmarks are not "harmful", 
> perhaps just a waste of time.
>
> Best,
>
> Philipp Mitteroecker
>
>
>
>
>  
>  questions here regarding semi-landmarks:
>
> So, I was adviced to use semi-landmarks, I placed them with MakeFan8, 
> saved the files as images and then used TpsDig to place all landmarks, 
> however I didn't make any distinctions between landmarks and 
> semi-landmarks. What unsettles me is (1) that I've recently comed across 
> the term "sliding semi-landmarks", which leads me to believe semi-landmarks 
> should behave in a particular way. 
>
> The second thing that unsettles me is whether "more semi-landmarks" means 
> a better analysis. I can understand that most people wouldn't use 65 
> landmarks+semilandmarks because it's a painstaking job to digitize them, 
> however, in my recent reads I've comed across concepts like a "Variables to 
> specimen ratio", which one paper suggested specimens should be 5 times the 
> number of variables. I do have a a data set of nearly 400 specimens, but it 
> does come short if indeed I should have 65*2*5 specimens!
>
> Please, I'll appreciate some feedback :)
>

-- 
MORPHMET may be accessed via its webpage at 

[MORPHMET] Re: semilandmarks in biology

2018-11-06 Thread mitte...@univie.ac.at
I agree only in part.

Whether or not semilandmarks "really are needed" may be hard to say 
beforehand. If the signal is known well enough before the study, even a 
single linear distance or distance ratio may suffice. In fact, most 
geometric morphometric studies are characterized by an oversampling of 
(anatomical) landmarks as an exploratory strategy: it allows for unexpected 
findings (and nice visualizations). 

Furthermore, there is a fundamental difference between sliding 
semilandmarks and other outline methods, including EFA. When establishing 
correspondence of semilandmarks across individuals, the minBE sliding 
algorithm takes the anatomical landmarks (and their stronger biological 
homology) into account, while standard EFA and related techniques cannot 
easily combine point homology with curve or surface homology. Clearly, when 
point homology exists, it should be parameterized accordingly. If smooth 
curves or surfaces exists, they should also be parameterized, whether or 
not this makes the analysis slightly more challenging.
 
Anyway, different landmarks often convey different biological signals and 
different homology criteria. For instance, Type I and Type II landmarks 
(sensu Bookstein 1991) differ fundamentally in their notion of homology. 
Whereas Type I landmarks are defined in terms of local anatomy or 
histology, a Type II landmark is a purely geometric construct, which may or 
may not coincide with notions of anatomical/developmental homology. ANY 
reasonable morphometric analysis must be interpreted in the light of the 
correspondence function employed, and the some holds true for 
semilandmarks. For this, of course, one needs to understand the basic 
properties of sliding landmarks, much as the basic properties of Procrustes 
alignment, etc.. For instance, both the sliding algorithm and Procrustes 
alignment introduce correlations between shape coordinates (hence their 
reduced degrees of freedom). This is one of the reasons why I have warned 
for many years and in many publications about the biological interpretation 
of raw correlations (e.g., summarized in Mitteroecker et al. 2012 Evol 
Biol). Interpretations in terms of morphological integration or modularity 
are even more difficult because in most studies these concepts are not 
operationalized. They are either described by vague and biologically 
trivial narratives, or they are themselves defined as patterns of 
correlations, which is circular and makes most "hypotheses" untestable.

The same criticism applies to the naive interpretation of PCA scree plots 
and derived statistics. An isotropic (circular) distribution of shape 
coordinates corresponds to no biological model or hypothesis whatsoever 
(e.g., Huttegger & Mitteroecker 2011, Bookstein & Mitteroecker 2014, and 
Bookstein 2015, all three in Evol Biol). Accordingly, a deviation from 
isometry does not itself inform about integration or modularity (in any 
reasonable biological sense).
The multivariate distribution of shape coordinates, including "dominant 
directions of variation," depend on many arbitrary factors, including the 
spacing, superimposition, and sliding of landmarks as well as on the number 
of landmarks relative to the number of cases. But all of this applies to 
both anatomical landmarks and sliding semilandmarks.

I don't understand how the fact that semilandmarks makes some of these 
issues more obvious is an argument against their use.

Best,

Philipp







Am Dienstag, 6. November 2018 13:28:55 UTC+1 schrieb alcardini:
>
> As a biologist, for me, the question about whether or not to use 
> semilandmarks starts with whether I really need them and what they're 
> actually measuring.
>
> On this, among others, Klingenberg, O'Higgins and Oxnard have written some 
> very important easy-to-read papers that everyone doing morphometrics should 
> consider and carefully ponder. They can be found at: 
> https://preview.tinyurl.com/semilandmarks
>
> I've included there also an older criticism by O'Higgins on EFA and 
> related methods. As semilandmarks, EFA and similar methods for the analysis 
> of outlines measure curves (or surfaces) where landmarks might be few or 
> missing: if semilandmarks are OK because where the points map is 
> irrelevant, as long as they capture homologous curves or surfaces, the same 
> applies for EFAs and related methods; however, the opposite is also true 
> and, if there are problems with 'homology' in EFA etc., those problems are 
> there also using semilandmarks as a trick to discretize curves and 
> surfaces. 
>
> Even with those problems, one could still have valid reasons to use 
> semilandmarks but it should be honestly acknowledged that they are the best 
> we can do (for now at least) in very difficult cases. Most of the studies I 
> know (certainly a minority from a now huge literature) seem to only provide 
> post-hoc justification of the putative importance of semilandmarks: there 
> were few 'good landmarks'; 

[MORPHMET] Comment and advice on bgPCA

2019-05-28 Thread mitte...@univie.ac.at
Dear all,

I also want to comment on the recent bgPCA postings.

Andrea et al. and Fred are right that bgPCA produces ordination plots in 
which two or more groups are discriminated more (i.e., the groups overlap 
less) than they should, whenever p (number of variables) is large relative 
to n (sample size). Thanks Andrea for noticing that, or whoever figured it 
out first; it was not me, admittedly. In the case of samples from the same 
distribution (i.e., no "real" group differences), the samples can even 
appear to be distinct if p is larger than n. This phenomenon is much more 
severe in CVA than in bgPCA (as we showed in the 2011 paper), but we were 
not aware back then that it is also present in bgPCA. Please note that this 
does NOT mean that ALL results inferred from bgPCA are wrong, only those 
about group separation are biased; the relationship between group means in 
bgPCA is necessarily the same as in an ordinary PCA (but see below).

I have two main comments and advices.

1) The simulations of identical independent noise for an increasing number 
of variables, as in Fred's current manuscript and in our 2011 paper, are 
not quite realistic because morphometric variables are highly correlated; 
the "real" degrees of freedom thus are much less than the number of 
variables. Put another way, if you set more and more landmarks on an sample 
of specimens, not every landmark introduces a new degree of freedom because 
its location may be predictable by the adjacent landmarks. Theoretically, 
there is a maximal number of degrees of freedoms in a given sample that 
reflects the actual spatial scale of the shape differences studied. If the 
given shape differences are captured well by the current landmark set, 
adding more landmarks will not add any further information and not increase 
the relevant degrees of freedom. For example, if shape variation comprises 
only affine shape variation (linear scaling and shearing), the relevant 
shape space has only two degrees of freedoms (two dimensions), regardless 
of how many landmarks were measured.

As a result of this, most morphometric data, even those consisting of many 
landmarks, can be described well by a small number of principal components, 
as we all know. Ideally, these few PCs capture the "real" dimensionality of 
shape space (i.e., they are some rotation of the underlying factor 
structure), which is much less than the number of landmarks. In practice, 
the problem is that every landmarks entails some small independent 
measurement error, and hence the "cut-off" for the number of dimensions is 
not necessarily obvious. In the above example with only affine shape 
variation, for more than three landmarks there will still be more than two 
PCs with non-zero variance, but hopefully the first PCs are a good estimate 
of these non-affine components. Other methods than ordinary PCA may do a 
better job for this task, e.g. methods that take into account spatial 
scale, such as the spatially weighted relative warps in Bookstein's orange 
book or the relative intrinsic warps in Bookstein (2015). Blame Fred for 
these names ;-) 

Many multivariate statistical analyses - including bgPCA, CVA, relative 
PCA, and also the computation of shape distances or angles between shape 
trajectories, etc. - should be performed within this subspace (i.e., based 
on the first few PCs rather than on the original shape coordinates). bgPCA 
and CVA may be considered kinds of factor rotation within this subspace 
rather than methods of variable reduction.

Hence, many of the problems described by Andrea et al. and Fred can be 
avoided by variable reduction (ordinary PCA) prior to bgPCA and related 
techniques. This requires a careful inspection of the scree plot and the 
corresponding PCs. The actual sample size must be large relative to the 
number of PCs retained (not necessarily relative to the number of 
landmarks).


2) Many applications of PCA or CVA aim to combine multiple analytical steps 
that are not necessarily commensurate: 

- Exploratory study of group mean differences
- Relating multivariate mean differences across multiple groups by an 
ordination analysis
- Discrimination analysis (studying if and to what degree groups overlap in 
their distribution of individual variation)
- Perhaps even the estimation of a discrimination function, i.e., a 
combination of variables that maximally discriminates the groups.

The value or burden of having many landmarks is different for each of these 
tasks.

When "exploring" differences in average shape between groups, without 
strong prior expectations (i.e., without knowing where the signal is), it 
is clearly useful to measure as many landmarks as possible, as this 
increases spatial resolution. In contrast to Andrea, I think that 
"beautiful pictures" can be of value because morphology is a visual 
discipline, after all. For computing group means or shape regressions, p>n 
is no problem. The challenge in this step is to 

[MORPHMET] Dennis Slice

2019-06-15 Thread mitte...@univie.ac.at
 Dear subscribers to morphmet,

 With the deepest grief we must inform you of the sudden
 death on June 13 of Prof. Dennis E. Slice,
 holder of the fourth Rohlf Award for Excellence in Morphometrics
 and tireless founder and moderator of this newsgroup,
 who suffered a heart attack in his home town of
 Tallahassee, Florida. Morphometrics will not be the same
 without him.

Jim Rohlf, Fred Bookstein, Paul O'Higgins,
  Benedikt Hallgrimsson, June 15, 2019

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.