[MORPHMET] Multivariate Analysis of Genotype-Phenotype Association
I would like to announce a new paper, of which the early view pdf is already online. Mitteroecker P, Cheverud JM, Pavlicev M (2016) Multivariate Analysis of Genotype-Phenotype Association. Genetics http://www.genetics.org/content/early/2016/02/18/genetics.115.181339 It offers an exploratory strategy for mapping multivariate data and is particularly suited for geometric morphometrics. The new method identifies patterns of allelic variation (genetic latent variables) that are maximally associated - in terms of effect size - with patterns of phenotypic variation (phenotypic latent variables). It thereby separates phenotypic features under strong genetic control from less genetically determined features and thus permits an analysis of the multivariate structure of genotype-phenotype association, including its "dimensionality" and the clustering of genetic and phenotypic variables within this association. Best, Philipp Mitteroecker -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
Re: [MORPHMET] Sliding Semilandmarks
As Michael described, the average shape configuration affects the sliding when used as reference for the TPS; the final configurations thus are sample-dependent. However, if the curves/surfaces are covered densely enough by the semilandmarks (e.g., to avoid that a semilandmark can slide away from a relevant region), Procrustes distances are quite stable. Dense sampling can also improve the estimation of the tangents. If the semilandmarks slide a lot relative to the local curvature, they get off the curve. Of course, they can be projected back, but the following trick often is sufficient: Instead of the full amount of sliding, let all the semilandmarks slide just a fraction of the computed distance, say 20% (multiply T by 0.2 in equation of 4 of Gunz et al. 2005). Then update the tangents and let the semilandmarks slide again a fraction of the computed distance, etc. This requires more iterations but keeps the semilandmarks closer to the curve or surface. Also when minimizing Procrustes distance instead of BE, these distances are reduced relative to the sample average. But as for the superimposition itself, the sample configuration has only limited effect on the final configurations for small to moderate shape variation. (If variation is very large, the analysis is problematic anyway.) Note that the full sample must be slid together for a joint analysis (i.e., don't slide each population separately and then analyze them together). The choice of the minimization criterion (Proc dist versus BE) can lead to different configurations. For most datasets, this difference is negligible, but in some situations it can matter. For example, when minimizing Proc dist semilandmarks can change their order or slide across a real landmark, whereas this is almost impossible for minimizing BE (changing order would have a very high BE). On the other hand, minimizing BE does not minimize affine shape variation (because it has zero BE). If affine shape variation is not constrained by real landmarks, this can lead to strange results. For instance, I had a dataset of mandibular cross-sections, which were U-shaped with real landmarks only at the two upper ends and semilandmarks in-between. Affine variation thus was not properly controlled. After BE sliding, the group differences comprised a lot of (meaningless) affine differences. I thus decided for minimizing Proc dist. Usually, though, I prefer minimizing BE because its is closer to our biological understanding of homology, including the preservation of landmark order and large scale shape features. Minimizing BE leads to smoother TPS deformation grids, whereas miminizing Proc dists leads to smaller sum of squares. Note that when updating the reference configuration in each iteration, the algorithm can converge to quite undesired minima (e.g. all semilandmarks collapse to a single point). This can be avoided by iterating just a few times, which is usually enough, or by keeping the reference constant at some point in the algorithm. In general, the more the semilandmarks are constrained by real landmarks and the smoother the curves, the more stable is the algorithm. Because of these issues, it is important to apply the semilandmark algorithm carefully, especially for 3D surfaces. Always check the tangents and how the semilandmarks slide along these tangents. Check how the total sliding reduces from one iteration to the next, and interpret the final pattern of shape variation in the light of the property being minimized. Best wishes, Philipp Mitteroecker Am Donnerstag, 18. Februar 2016 18:41:44 UTC+1 schrieb Collyer, Michael: > > Andrea, > > I like to think of semilandmark sliding as iteratively finding fitted > (predicted) values for the generalized linear model fit described by Gunz > et al. (2005) (equation 4), and updating coordinates by these values until > there is no more meaningful change (with regard to an acceptable > criterion). If Bending energy is not used, the bending energy matrix is > replaced by an identity matrix (i.e., independence), which produces the > minimized Procrustes distance version of the sliding algorithm. (This is > is the same as ordinary least squares being a simplification of generalized > least squares by using an identity matrix for the covariance matrix in GLS > estimation of parameters.) Calculating the bending energy matrix requires > using the reference configuration. The hat matrix calculated in the > process is typically post-multiplied by the target coordinates centered by > the reference configuration. Changing the reference should, therefore, > change the solution. Also, let’s not forget that with surface points, if > we follow the Gunz et al. (2005) recommendation, 5 nearest neighbors are > used to estimate the principal components for defining a tangent plane. > One could use more nearest neighbors, which would change the tangent > planes. One could
[MORPHMET] Postdoc and Ph.D. position at the Univ. of Vienna on the modelling of developmental canalization in
In the working group of Philipp Mitteroecker in the Department of Theoretical Biology, University of Vienna, a two-year postdoc position and a three-year Ph.D. position are vacant. We are searching for enthusiastic persons, who are dedicated to interdisciplinary research in biology and physical anthropology. The positions are part of an FWF-funded project on developmental canalization in the human head, which comprises mathematical, statistical, and genetical approaches using a diversity of 2D and 3D morphometric data. The Dept. of Theoretical Biology is well known for its experience in quantitative and theoretical work in evolutionary and developmental biology and is part of a strong national and international research network (http://theoretical.univie.ac.at; http://www.univie.ac.at/evolvienna/). The postdoc should primarily work on the mathematical and statistical modelling of growth processes and the quantitative genetic analysis of craniofacial shape (GWAS). A strong background in mathematical or statistical biology, programming experience in Mathematica or R, as well as excellent English writing and speech skills are required. A Ph.D. degree in a related field must be completed by the date of hire. Expertise in geometric morphometrics, human anatomy, EvoDevo, or genetics is advantageous. The candidate should have a publication record demonstrating his or her skills. The Ph.D. should primarily work on the geometric morphometric analysis of cranial and facial development. A background in human anatomy, physical anthropology, or orthodontics, along with basic experience in programming and statistical analysis, as well as good English writing and speech skills rare required. A master's degree in a related field must be completed by the date of hire. Expertise in geometric morphometrics, genetics, or orthodontics is advantageous. Please submit applications including CV, list of publications, and a statement of research interests to Philipp Mitteroecker (philipp.mitteroec...@univie.ac.at) until Sept. 5. -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
[MORPHMET] PhD position in biological anthropology and morphometrics
We seek a PhD student for a three-year position in Vienna, working on the morphometrics of human pelvises in relation to childbirth and motherhood: www.oeaw.ac.at/fileadmin/subsites/Jobs/Job_offer_VAMOS_PhD_position_1_.pdf Best, Philipp Mitteroecker -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
[MORPHMET] Re: Does a research sample need to be normally distributed (male/female ratio) for PCA?
As an exploratory technique, PCA makes no distributional assumptions; it is used to explore the empirical distribution of the data. The sample does not need to be balanced with regard to sex or other grouping variables, but larger groups have a stronger effect on the PCA than smaller groups. The origin of the coordinate system is arbitrary. However, many software packages center the data so that the origin (i.e. where the axes intersect) equals the mean value. Am Donnerstag, 25. Mai 2017 09:58:31 UTC+2 schrieb Helmi Hadi: > > Dear morphometricians, > > Does a sample need to be normally distributed when conducting PCA in > geometric morphometrics? Sometimes due to research constraints there are no > samples of the opposite sex. Someone was asking me this question, and I do > not have the answer. When I look at the data distribution, there is quite > an imbalance male/female population. However, the classifiers male/female > and species are there and you can sort of tell which group belongs to > where. My only fear is that the confidence ellipse for the males are being > "gravitated" towards the females for one species as that species does not > have any male specimens. Attached are the file which I have recreated the > dataset based on memory. > > Is this kind of data acceptable or publishable? > > My own personal question is based on the GMM results given in MorphoJ. The > PC1/PC2 axes does not intersect at the middle (which I have personally > drawn the dotted line there). I don't mind this output, but does it matter > to have the axes cut at the 0 value? The data data distribution does not > change with the change of axes lines. I noticed some GMM papers have the > axes at 0. > > Thanks all for the help, > > Helmi Hadi, > School of Health Scienes, > Universiti Sains Malaysia > -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
Re: [MORPHMET] Re: number of landmarks and sample size
I think a few topics get mixed up here. Of course, a sample can be too small to be representative (as in Andrea's example), and one should think carefully about the measures to take. It is also clear that an increase in sample size reduces standard errors of statistical estimates, including that of a covariance matrix and its eigenvalues. But, as mentioned by Dean, the standard errors of the eigenvalues are of secondary interest in PCA. If one has a clear expectation about the signal in the data - and if one does not aim at new discoveries - a few specific measurements may suffice, perhaps even a few distance measurements. But effective exploratory analyses have always been a major strength of geometric morphometrics, enabled by the powerful visualization methods together with the large number of measured variables. Andrea, I am actually curious what worries you if one "collects between 2700 and 10 400 homologous landmarks from each rib" (whatever the term "homologous" is supposed to mean here)? Compared to many other disciplines in contemporary biology and biomedicine, a few thousand variables are not particularly many. Consider, for instance, 2D and 3D image analysis, FEA, and all the "omics", with millions and billions of variables. In my opinion, the challenge with these "big data" is not statistical power in testing a signal, but finding the signal - the low-dimensional subspace of interest - in the fist place. But this applies to 50 or 100 variables as well, not only to thousands or millions. If no prior expectation about this signal existed (which the mere presence of so many variables usually implies), no hypothesis test should be performed at all. The ignorance of this rule is one of the main reasons why so many GWAS and voxel-based morphometry studies fail to be replicable. Best wishes, Philipp -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
[MORPHMET] Re: number of landmarks and sample size
Adding more (semi)landmarks inevitably increases the spatial resolution and thus allows one to capture finer anatomical details - whether relevant to the biological question or not. This can be advantageous for the reconstruction of shapes, especially when producing 3D morphs by warping dense surface representations. Basic developmental or evolutionary trends, group structures, etc., often are visible in an ordination analysis with a smaller set of relevant landmarks; finer anatomical resolution not necessarily affects these patterns. However, adding more landmarks cannot reduce or even remove any signals that were found with less landmarks, but it can make ordination analyses and the interpretation distances and angles in shape space more challenging. An excess of variables (landmarks) over specimens does NOT pose problems to statistical methods such as the computation of mean shapes and Procrustes distances, PCA, PLS, and the multivariate regression of shape coordinates on some independent variable (shape regression). These methods are based on averages or regressions computed for each variable separately, or on the decomposition of a covariance matrix. Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative eigenanalysis require the inversions of a full-rank covariance matrix, which implies an access of specimens over variables. The same applies to many multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full rank and thus can never be subjected to any of these methods without prior variable reduction. In fact, reliable results can only be obtained if there are manifold more specimens than variables, which usually requires variable reduction by PCA, PLS or other techniques, or the regularization of covariance matrices (which is more common in the bioinformatic community). For these reasons, I do not see any disadvantage of measuring a large number of landmarks, except for a waste of time perhaps. If life time is an issue, one can optimize landmark schemes as suggested by Jim or Aki. Best, Philipp -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
[MORPHMET] Open call for a "Professor of Theoretical Evolutionary Biology" at the University of Vienna
Dear morphometrics community, Perhaps this call is of interest to some of you. Best, Philipp At the Faculty of Life Sciences of the University of Vienna the position of a "University Professor of Theoretical Evolutionary Biology" is to be filled. The advertised professorship shall cover the advancement and application of theoretical approaches – including conceptual, mathematical, and statistical analysis – to different levels of organismal complexity. The candidate should have a background both in biology and a theoretical or computational discipline, with an emphasis on interdisciplinary and integrative research. Preference is given to approaches to understand biological systems from the molecular to the inter-organismal level, encompassing developmental to evolutionary time scales. To foster and complement cooperation among the research groups, the candidate's research should link to animal development, behavior or morphology, in an evolutionary context. The candidate should be enthusiastic to teach a theoretical discipline (e.g. mathematics, statistics, systems theory) and its application to evolutionary and organismal biology. For more details and application see: http://personalwesen.univie.ac.at/jobs-recruiting/professuren/detail-seite/news/theoretical-evolutionary-biology/?no_cache=1=d6099e9c6a5bf9ed84f0a773cc6ea192 Please note that the application deadline is 15 April 2018. -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
Re: [MORPHMET] Re: semilandmarks in biology
Yes, it was always well known that sliding adds covariance but this is irrelevant for most studies, especially for group mean comparisons and shape regressions: the kind of studies for which GMM is most efficient, as Jim noted. If you consider the change of variance-covariance structure due to (a small amount of) sliding as an approximately linear transformation, then the sliding is also largely irrelevant for CVA, relative PCA, Mahalanobis distance and the resulting group classifications, as they are all based on the relative eigenvalues of two covariance matrices and thus unaffected by linear transformations. In other words, in the lack of a reasonable biological null model, the interpretation of a single covariance structure is very difficult, but the way in which one covariance structure deviates from another can be interpreted much easier. Concerning your example: The point is that there is no useful model of "totally random data" (but see Bookstein 2015 Evol Biol). Complete statistical independence of shape coordinates is geometrically impossible and biologically absurd. Under which biological (null) model can two parts of a body, especially two traits on a single skeletal element such as the cranium, be complete uncorrelated? Clearly, semilandmarks are not always necessary, but making "cool pictures" can be quite important in its own right for making good biology, especially in exploratory settings. Isn't the visualization one of the primary strengths of geometric morphometrics? It is perhaps also worth noting that one can avoid a good deal of the additional covariance resulting from sliding. Sliding via minimizing bending energy introduces covariance in the position of the semilandmarks _along_ the curve/surface. In some of his analyses, Fred Bookstein just included the coordinate perpendicular to the curve/surface for the semilandmarks, thus discarding a large part of the covariance. Note also that sliding via minimizing Procrustes distance introduces only little covariance among semilandmarks because Procrustes distance is minimized independently for each semilandmark (but the homology function implied here is biologically not so appealing). Best, Philipp Am Dienstag, 6. November 2018 18:34:51 UTC+1 schrieb alcardini: > > Yes, but doesn't that also add more covariance that wasn't there in > the first place? > Neither least squares nor minimum bending energy, that we minimize for > sliding, are biological models: they will reduce variance but will do > it in ways that are totally biologically arbitrary. > > In the examples I showed sliding led to the appearance of patterns > from totally random data and that effect was much stronger than > without sliding. > I neither advocate sliding or not sliding. Semilandmarks are different > from landmarks and more is not necessarily better. There are > definitely some applications where I find them very useful but many > more where they seem to be there just to make cool pictures. > > As Mike said, we've already had this discussion. Besides different > views on what to measure and why, at that time I hadn't appreciated > the problem with p/n and the potential strength of the patterns > introduced by the covariance created by the superimposition (plus > sliding!). > > Cheers > > Andrea > > On 06/11/2018, F. James Rohlf > > wrote: > > I agree with Philipp but I would like to add that the way I think about > the > > justification for the sliding of semilandmarks is that if one were smart > > enough to know exactly where the most meaningful locations are along > some > > curve then one should just place the points along the curve and > > computationally treat them as fixed landmarks. However, if their exact > > positions are to some extend arbitrary (usually the case) although still > > along a defined curve then sliding makes sense to me as it minimizes the > > apparent differences among specimens (the sliding minimizes your measure > of > > how much specimens differ from each other or, usually, the mean shape. > > > > > > > > _ _ _ _ _ _ _ _ _ > > > > F. James Rohlf, Distinguished Prof. Emeritus > > > > > > > > Depts. of Anthropology and of Ecology & Evolution > > > > > > > > > > > > From: mitt...@univie.ac.at > > > Sent: Tuesday, November 6, 2018 9:09 AM > > To: MORPHMET > > > Subject: [MORPHMET] Re: semilandmarks in biology > > > > > > > > I agree only in part. > > > > > > > > Whether or not semilandmarks "really are needed" may be hard to say > > beforehand. If the signal is known well enough before the study, even a > > single linear distance or distance ratio may suffice. In fact, most > > geometric morphometric studies are characterized by an oversampling of > > (anatomical) landmarks as an exploratory strategy: it allows for > unexpected > > findings (and nice visualizations). > > > > > > > > Furthermore, there is a fundamental difference
[MORPHMET] Re: Are more semi landmarks better??
I'd like to respond to your question because it comes up so often. As noted by Carmelo in the other posting, a large number of variables relative to the number of cases can lead to statistical problems. But often it does not. In all analyses that treat each variable separately - including the computation of mean shapes and shape regressions - the number of variables does NOT matter! Also in principal component analysis (PCA) and between-group PCA there is NO restriction on the number of variables. However, the distribution of landmarks across the organism can influence the results. E.g., if one part - say the face - is covered only by a few anatomical landmarks, and another part - e.g., the neurocranium - by many semilandmarks, the latter one will dominate PCA results. But this holds true for all kinds of landmarks and variables, not only for semilandmarks. Analyses that involve the inversion of a covariance matrix - such as multiple regression, CVA, relative eigenanalysis, reduced rank regression, and parametric multivariate tests - require a clear excess of cases over variables. In any truly multivariate setting (such as geometric morphometrics), these analyses - if unavoidable - should ALWAYS be preceded by some sort of variable reduction and/or factor analysis. Again, this is not specific to semilandmarks. Partial least squares (PLS) is somewhat in-between these to groups. As shown in Bookstein's 2016 paper, the singular values (maximal covariances) in PLS can be strongly inflated if the number of variables is large compared to the number of cases. The singular vectors, however, are more stable. Essentially, the number of semilandmarks should be determined based on the anatomical details to be captured. More semilandmarks are not "harmful," perhaps just a waste of time. Best, Philipp Mitteroecker Am Montag, 5. November 2018 18:52:57 UTC+1 schrieb Diego Ardón: > > Good day everybody, I actually have twoI'd like to respond to your > question because it comes up so often. > > As noted by Carmelo in the other posting, a large number of variables > relative to the number of cases can lead to statistical problems. But often > it does not. > > In all analyses that treat each variable separately - including the > computation of mean shapes and shape regressions - the number of variables > does NOT matter! Also in principal component analysis (PCA) and > between-group PCA there is NO restriction on the number of variables. > However, the distribution of landmarks across the organism can influence > the results. E.g., if one part - say the face - is covered only by a few > anatomical landmarks, and another part - e.g., the neurocranium - by many > semilandmarks, the latter one will dominate PCA results. But this holds > true for all kinds of landmarks and variables, not only for semilandmarks. > > Analyses that involve the inversion of a covariance matrix - such as > multiple regression, CVA, relative eigenanalysis, reduced rank regression, > and parametric multivariate tests - require a clear excess of cases over > variables. In any truly multivariate setting (such as geometric > morphometrics), these analyses - if unavoidable - should ALWAYS be preceded > by some sort of variable reduction and/or factor analysis. Again, this is > not specific to semilandmarks. > > Partial least squares (PLS) is somewhat in-between these to groups. As > shown in Bookstein's 2016 paper, the singular values (maximal covariances) > in PLS can be strongly inflated if the number of variables is large > compared to the number of cases. The singular vectors, however, are more > stable. > > Essentially, the number of semilandmarks should be determined based on the > anatomical details to be captured. More semilandmarks are not "harmful", > perhaps just a waste of time. > > Best, > > Philipp Mitteroecker > > > > > > questions here regarding semi-landmarks: > > So, I was adviced to use semi-landmarks, I placed them with MakeFan8, > saved the files as images and then used TpsDig to place all landmarks, > however I didn't make any distinctions between landmarks and > semi-landmarks. What unsettles me is (1) that I've recently comed across > the term "sliding semi-landmarks", which leads me to believe semi-landmarks > should behave in a particular way. > > The second thing that unsettles me is whether "more semi-landmarks" means > a better analysis. I can understand that most people wouldn't use 65 > landmarks+semilandmarks because it's a painstaking job to digitize them, > however, in my recent reads I've comed across concepts like a "Variables to > specimen ratio", which one paper suggested specimens should be 5 times the > number of variables. I do have a a data set of nearly 400 specimens, but it > does come short if indeed I should have 65*2*5 specimens! > > Please, I'll appreciate some feedback :) > -- MORPHMET may be accessed via its webpage at
[MORPHMET] Re: semilandmarks in biology
I agree only in part. Whether or not semilandmarks "really are needed" may be hard to say beforehand. If the signal is known well enough before the study, even a single linear distance or distance ratio may suffice. In fact, most geometric morphometric studies are characterized by an oversampling of (anatomical) landmarks as an exploratory strategy: it allows for unexpected findings (and nice visualizations). Furthermore, there is a fundamental difference between sliding semilandmarks and other outline methods, including EFA. When establishing correspondence of semilandmarks across individuals, the minBE sliding algorithm takes the anatomical landmarks (and their stronger biological homology) into account, while standard EFA and related techniques cannot easily combine point homology with curve or surface homology. Clearly, when point homology exists, it should be parameterized accordingly. If smooth curves or surfaces exists, they should also be parameterized, whether or not this makes the analysis slightly more challenging. Anyway, different landmarks often convey different biological signals and different homology criteria. For instance, Type I and Type II landmarks (sensu Bookstein 1991) differ fundamentally in their notion of homology. Whereas Type I landmarks are defined in terms of local anatomy or histology, a Type II landmark is a purely geometric construct, which may or may not coincide with notions of anatomical/developmental homology. ANY reasonable morphometric analysis must be interpreted in the light of the correspondence function employed, and the some holds true for semilandmarks. For this, of course, one needs to understand the basic properties of sliding landmarks, much as the basic properties of Procrustes alignment, etc.. For instance, both the sliding algorithm and Procrustes alignment introduce correlations between shape coordinates (hence their reduced degrees of freedom). This is one of the reasons why I have warned for many years and in many publications about the biological interpretation of raw correlations (e.g., summarized in Mitteroecker et al. 2012 Evol Biol). Interpretations in terms of morphological integration or modularity are even more difficult because in most studies these concepts are not operationalized. They are either described by vague and biologically trivial narratives, or they are themselves defined as patterns of correlations, which is circular and makes most "hypotheses" untestable. The same criticism applies to the naive interpretation of PCA scree plots and derived statistics. An isotropic (circular) distribution of shape coordinates corresponds to no biological model or hypothesis whatsoever (e.g., Huttegger & Mitteroecker 2011, Bookstein & Mitteroecker 2014, and Bookstein 2015, all three in Evol Biol). Accordingly, a deviation from isometry does not itself inform about integration or modularity (in any reasonable biological sense). The multivariate distribution of shape coordinates, including "dominant directions of variation," depend on many arbitrary factors, including the spacing, superimposition, and sliding of landmarks as well as on the number of landmarks relative to the number of cases. But all of this applies to both anatomical landmarks and sliding semilandmarks. I don't understand how the fact that semilandmarks makes some of these issues more obvious is an argument against their use. Best, Philipp Am Dienstag, 6. November 2018 13:28:55 UTC+1 schrieb alcardini: > > As a biologist, for me, the question about whether or not to use > semilandmarks starts with whether I really need them and what they're > actually measuring. > > On this, among others, Klingenberg, O'Higgins and Oxnard have written some > very important easy-to-read papers that everyone doing morphometrics should > consider and carefully ponder. They can be found at: > https://preview.tinyurl.com/semilandmarks > > I've included there also an older criticism by O'Higgins on EFA and > related methods. As semilandmarks, EFA and similar methods for the analysis > of outlines measure curves (or surfaces) where landmarks might be few or > missing: if semilandmarks are OK because where the points map is > irrelevant, as long as they capture homologous curves or surfaces, the same > applies for EFAs and related methods; however, the opposite is also true > and, if there are problems with 'homology' in EFA etc., those problems are > there also using semilandmarks as a trick to discretize curves and > surfaces. > > Even with those problems, one could still have valid reasons to use > semilandmarks but it should be honestly acknowledged that they are the best > we can do (for now at least) in very difficult cases. Most of the studies I > know (certainly a minority from a now huge literature) seem to only provide > post-hoc justification of the putative importance of semilandmarks: there > were few 'good landmarks';
[MORPHMET] Comment and advice on bgPCA
Dear all, I also want to comment on the recent bgPCA postings. Andrea et al. and Fred are right that bgPCA produces ordination plots in which two or more groups are discriminated more (i.e., the groups overlap less) than they should, whenever p (number of variables) is large relative to n (sample size). Thanks Andrea for noticing that, or whoever figured it out first; it was not me, admittedly. In the case of samples from the same distribution (i.e., no "real" group differences), the samples can even appear to be distinct if p is larger than n. This phenomenon is much more severe in CVA than in bgPCA (as we showed in the 2011 paper), but we were not aware back then that it is also present in bgPCA. Please note that this does NOT mean that ALL results inferred from bgPCA are wrong, only those about group separation are biased; the relationship between group means in bgPCA is necessarily the same as in an ordinary PCA (but see below). I have two main comments and advices. 1) The simulations of identical independent noise for an increasing number of variables, as in Fred's current manuscript and in our 2011 paper, are not quite realistic because morphometric variables are highly correlated; the "real" degrees of freedom thus are much less than the number of variables. Put another way, if you set more and more landmarks on an sample of specimens, not every landmark introduces a new degree of freedom because its location may be predictable by the adjacent landmarks. Theoretically, there is a maximal number of degrees of freedoms in a given sample that reflects the actual spatial scale of the shape differences studied. If the given shape differences are captured well by the current landmark set, adding more landmarks will not add any further information and not increase the relevant degrees of freedom. For example, if shape variation comprises only affine shape variation (linear scaling and shearing), the relevant shape space has only two degrees of freedoms (two dimensions), regardless of how many landmarks were measured. As a result of this, most morphometric data, even those consisting of many landmarks, can be described well by a small number of principal components, as we all know. Ideally, these few PCs capture the "real" dimensionality of shape space (i.e., they are some rotation of the underlying factor structure), which is much less than the number of landmarks. In practice, the problem is that every landmarks entails some small independent measurement error, and hence the "cut-off" for the number of dimensions is not necessarily obvious. In the above example with only affine shape variation, for more than three landmarks there will still be more than two PCs with non-zero variance, but hopefully the first PCs are a good estimate of these non-affine components. Other methods than ordinary PCA may do a better job for this task, e.g. methods that take into account spatial scale, such as the spatially weighted relative warps in Bookstein's orange book or the relative intrinsic warps in Bookstein (2015). Blame Fred for these names ;-) Many multivariate statistical analyses - including bgPCA, CVA, relative PCA, and also the computation of shape distances or angles between shape trajectories, etc. - should be performed within this subspace (i.e., based on the first few PCs rather than on the original shape coordinates). bgPCA and CVA may be considered kinds of factor rotation within this subspace rather than methods of variable reduction. Hence, many of the problems described by Andrea et al. and Fred can be avoided by variable reduction (ordinary PCA) prior to bgPCA and related techniques. This requires a careful inspection of the scree plot and the corresponding PCs. The actual sample size must be large relative to the number of PCs retained (not necessarily relative to the number of landmarks). 2) Many applications of PCA or CVA aim to combine multiple analytical steps that are not necessarily commensurate: - Exploratory study of group mean differences - Relating multivariate mean differences across multiple groups by an ordination analysis - Discrimination analysis (studying if and to what degree groups overlap in their distribution of individual variation) - Perhaps even the estimation of a discrimination function, i.e., a combination of variables that maximally discriminates the groups. The value or burden of having many landmarks is different for each of these tasks. When "exploring" differences in average shape between groups, without strong prior expectations (i.e., without knowing where the signal is), it is clearly useful to measure as many landmarks as possible, as this increases spatial resolution. In contrast to Andrea, I think that "beautiful pictures" can be of value because morphology is a visual discipline, after all. For computing group means or shape regressions, p>n is no problem. The challenge in this step is to
[MORPHMET] Dennis Slice
Dear subscribers to morphmet, With the deepest grief we must inform you of the sudden death on June 13 of Prof. Dennis E. Slice, holder of the fourth Rohlf Award for Excellence in Morphometrics and tireless founder and moderator of this newsgroup, who suffered a heart attack in his home town of Tallahassee, Florida. Morphometrics will not be the same without him. Jim Rohlf, Fred Bookstein, Paul O'Higgins, Benedikt Hallgrimsson, June 15, 2019 -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.