Re: [MORPHMET] Re: number of landmarks and sample size
Hi Will, I think you meant to say that you are writing a study design paper presenting results of simulations and power analysis to determine appropriate sample sizes for multivariate analyses in geometric morphometrics. But I would think that would have already been settled by now, and possibly would be more relevant for certain clustering methods. The only parameterized PCA variant I am aware of is Kernel PCA, which is a nonlinear PCA method used for pattern analysis (e.g. used in image analysis), but that is not often employed in biological geometric morphometrics papers (at least, those that I frequently come across). When kernels are used they usually are meant to estimate densities of reduced-dimensionality data like CS, or PCs as shape variables. Best, Justin Justin C. Bagley, Ph.D. Postdoctoral Scholar Plant Evolutionary Genomics Laboratory Department of Biology Virginia Commonwealth University Richmond, VA 23284-2012 jcbag...@vcu.edu Senior/Postdoctoral Research Associate Departamento de Zoologia Universidade de Brasília Campus Universitário Darcy Ribeiro 70910-900 Brasília, DF, Brasil Website: http://www.justinbagley.org Lattes CV: http://lattes.cnpq.br/0028570120872581 On Wed, May 31, 2017 at 6:41 PM, William Gelnaw wrote: > I'm currently working on a paper that deals with the problem of > over-parameterizing PCA in morphometrics. The recommendations that I'm > making in the paper are that you should try to have at least 3 times as > many samples as variables. That means that if you have 10 2D landmarks, > you should have at least 60 specimens that you measure. Based on > simulations, if you have fewer than 3 specimens per variable, you quickly > start getting eigenvalues for a PCA that are very different from known true > eigenvalues. I did a literature survey and about a quarter of > morphometrics studies in the last decade haven't met that standard. A good > way to test if you have enough samples is to do a jackknife analysis. If > you cut out about 10% of your observations and still get the same > eigenvalues, then your results are probably stable. > I hope this helps. > - Will > > On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at < > mitte...@univie.ac.at> wrote: > >> Adding more (semi)landmarks inevitably increases the spatial resolution >> and thus allows one to capture finer anatomical details - whether relevant >> to the biological question or not. This can be advantageous for the >> reconstruction of shapes, especially when producing 3D morphs by warping >> dense surface representations. Basic developmental or evolutionary trends, >> group structures, etc., often are visible in an ordination analysis with a >> smaller set of relevant landmarks; finer anatomical resolution not >> necessarily affects these patterns. However, adding more landmarks cannot >> reduce or even remove any signals that were found with less landmarks, but >> it can make ordination analyses and the interpretation distances and angles >> in shape space more challenging. >> >> An excess of variables (landmarks) over specimens does NOT pose problems >> to statistical methods such as the computation of mean shapes and >> Procrustes distances, PCA, PLS, and the multivariate regression of shape >> coordinates on some independent variable (shape regression). These methods >> are based on averages or regressions computed for each variable separately, >> or on the decomposition of a covariance matrix. >> >> Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and >> relative eigenanalysis require the inversions of a full-rank covariance >> matrix, which implies an access of specimens over variables. The same >> applies to many multivariate parametric test statistics, such as >> Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full >> rank and thus can never be subjected to any of these methods without prior >> variable reduction. In fact, reliable results can only be obtained if there >> are manifold more specimens than variables, which usually requires variable >> reduction by PCA, PLS or other techniques, or the regularization of >> covariance matrices (which is more common in the bioinformatic community). >> >> For these reasons, I do not see any disadvantage of measuring a large >> number of landmarks, except for a waste of time perhaps. If life time is an >> issue, one can optimize landmark schemes as suggested by Jim or Aki. >> >> Best, >> >> Philipp >> >> -- >> MORPHMET may be accessed via its webpage at http://www.morphometrics.org >> --- >> You received this message because you are subscribed to the Google Groups >> "MORPHMET" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to morphmet+unsubscr...@morphometrics.org. >> > > -- > MORPHMET may be accessed via its webpage at http://www.morphometrics.org > --- > You received this message because you are subscribed to the Google Groups > "MORPHMET" group. > To unsubscri
[MORPHMET] Re: number of landmarks and sample size
Ilker, Philipp already defined well why - I think - this rationale is incorrect, if not dangerous, especially along the lines of statistical power. As he indicated, using Procrustes residuals as data means a covariance matrix will never be full rank, owing to the invariance in size, orientation, and position of landmark configurations following GPA. At most, the dimensions of the data space can be kp - g, where k is the number of landmark dimensions (2 or 3), p is the number of landmarks, and g is the number of invariant dimensions due to GPA (with or without sliding landmarks) or n - 1 if (kp - g) > n - 1. As he also pointed out, increasing landmarks can increase the spatial resolution, meaning that if n - 1 is the limited number of dimensions, the distances between specimens can increase in the n - 1 dimensional space that results from increasing p. If by “statistical power” one means an increased probability to reject a null hypothesis that population centroids (mean configurations) are the same, then increasing resolution should enhance one’s ability to reject a null hypothesis. I think tying the dimensionality of the space where the hypothesis is tested to the number of landmarks precludes appreciating Philipp’s comment about spatial resolution. I do not wish to necessarily advocate using a limited number of PCs as shape data, as a rule, but one can appreciate that given a choice between two configurations - one with seven fixed 2D landmarks (10 PCs after GPA) and one with the first 10 PCs obtained from configurations with hundreds of landmarks - the separation of groups in the latter case might be more prominent than in the former, hence increasing statistical power. Whether hundreds of landmarks are needed, or 50, or 20, or 10, or even only 7, or whether increasing statistical power is important, is a question that must be answered case by case with empirical results. However, placing an a priori limit on the number of landmarks one can define because of the size of samples one can collect is certain way to limit statistical power, especially when small samples are all that’s available. Cheers! Mike > On Jun 3, 2017, at 5:31 AM, Ilker ERCAN wrote: > > when we perform multivariate analysis, It must be n>p otherwise determinant > of Generalized variance equals to zero therefore it must be 2*l 3*l Best wishes > Ilker ERCAN > > > Gönderen: Norman MacLeod mailto:n.macl...@nhm.ac.uk>> > Gönderildi: 3 Haziran 2017 Cumartesi 11:18 > Kime: MORPHMET > Konu: Re: [MORPHMET] Re: number of landmarks and sample size > > In discussions like these it would be helpful if the writer could clarify > whether they are referring to the concepts of biological homology, > topological homology or "semantic homology". These aren't the same things and > the whole issue of “homology” in geometric morphometrics has always seemed, > at least to me, to be very confused. For example, refer to the definitions of > “homology” and “landmark” in the Glossary on the SB Morphometrics web site. > Because it means different things to different specialists homology isn't a > term to be thrown around as lightly as morphometricians seem prone to do. > Imprecise and/or ambiguous usage renders the meaning of sentences difficult > or impossible to understand for me and I suspect confuses others as well. > > Norm MacLeod > > > > On 3 Jun 2017, at 08:53, alcardini wrote: > > > > Hi Philipp, > > I am not worried about the number of variables (although I am not sure > > one needs thousands of highly correlated points on a relatively simple > > structure and seem to remember that Gunz and you suggest to start with > > many and then reduce as appropriate). > > > > Regardless of whether point homology makes sense, I am worried that > > many users believe that semilandmarks (maybe after sliding according > > to purely mathematical principles) are the same as "traditional > > landmarks" with a clear one-to-one correspondence. Even saying that > > what's "homologous" is the curve or surface is tricky, because at the > > end of the day that curve/surface is discretized using points, shape > > distances are based on those points and there are many ways of placing > > points with no clear "homology" (figure 7 of Oxnard & O'Higgins, > > 2009); indeed, in a ontogenetic study of the cranial vault, for > > instance, where sutures may become invisible in adults and therefore > > cannot be used as a "boundary", semilandmarks close to the sutures may > > end up on different bones in different stages/individuals. > > > > Semilandmarks are a fantastic tool, which I am happy to use when > > needed, but they
Ynt: [MORPHMET] Re: number of landmarks and sample size
when we perform multivariate analysis, It must be n>p otherwise determinant of Generalized variance equals to zero therefore it must be 2*l Gönderildi: 3 Haziran 2017 Cumartesi 11:18 Kime: MORPHMET Konu: Re: [MORPHMET] Re: number of landmarks and sample size In discussions like these it would be helpful if the writer could clarify whether they are referring to the concepts of biological homology, topological homology or "semantic homology". These aren't the same things and the whole issue of “homology” in geometric morphometrics has always seemed, at least to me, to be very confused. For example, refer to the definitions of “homology” and “landmark” in the Glossary on the SB Morphometrics web site. Because it means different things to different specialists homology isn't a term to be thrown around as lightly as morphometricians seem prone to do. Imprecise and/or ambiguous usage renders the meaning of sentences difficult or impossible to understand for me and I suspect confuses others as well. Norm MacLeod > On 3 Jun 2017, at 08:53, alcardini wrote: > > Hi Philipp, > I am not worried about the number of variables (although I am not sure > one needs thousands of highly correlated points on a relatively simple > structure and seem to remember that Gunz and you suggest to start with > many and then reduce as appropriate). > > Regardless of whether point homology makes sense, I am worried that > many users believe that semilandmarks (maybe after sliding according > to purely mathematical principles) are the same as "traditional > landmarks" with a clear one-to-one correspondence. Even saying that > what's "homologous" is the curve or surface is tricky, because at the > end of the day that curve/surface is discretized using points, shape > distances are based on those points and there are many ways of placing > points with no clear "homology" (figure 7 of Oxnard & O'Higgins, > 2009); indeed, in a ontogenetic study of the cranial vault, for > instance, where sutures may become invisible in adults and therefore > cannot be used as a "boundary", semilandmarks close to the sutures may > end up on different bones in different stages/individuals. > > Semilandmarks are a fantastic tool, which I am happy to use when > needed, but they have their own limitations, which one should be aware > of. > Cheers > > Andrea > > > > On 03/06/2017, mitte...@univie.ac.at wrote: >> I think a few topics get mixed up here. >> >> Of course, a sample can be too small to be representative (as in Andrea's >> example), and one should think carefully about the measures to take. It is >> also clear that an increase in sample size reduces standard errors of >> statistical estimates, including that of a covariance matrix and its >> eigenvalues. But, as mentioned by Dean, the standard errors of the >> eigenvalues are of secondary interest in PCA. >> >> If one has a clear expectation about the signal in the data - and if one >> does not aim at new discoveries - a few specific measurements may suffice, >> perhaps even a few distance measurements. But effective exploratory >> analyses have always been a major strength of geometric morphometrics, >> enabled by the powerful visualization methods together with the large >> number of measured variables. >> >> Andrea, I am actually curious what worries you if one "collects between >> 2700 and 10 400 homologous landmarks from each rib" (whatever the term >> "homologous" is supposed to mean here)? >> >> Compared to many other disciplines in contemporary biology and biomedicine, >> >> a few thousand variables are not particularly many. Consider, for instance, >> >> 2D and 3D image analysis, FEA, and all the "omics", with millions and >> billions of variables. In my opinion, the challenge with these "big data" >> is not statistical power in testing a signal, but finding the signal - the >> low-dimensional subspace of interest - in the fist place. But this applies >> to 50 or 100 variables as well, not only to thousands or millions. If no >> prior expectation about this signal existed (which the mere presence of so >> many variables usually implies), no hypothesis test should be performed at >> all. The ignorance of this rule is one of the main reasons why so many GWAS >> >> and voxel-based morphometry studies fail to be replicable. >> >> Best wishes, >> >> Philipp >> >> -- >> MORPHMET may be accessed via its webpage at http://www.morphometrics.org www.morphometrics.org<http://www.morphometrics.org/> www.morphometrics.org This is an i
Re: [MORPHMET] Re: number of landmarks and sample size
In discussions like these it would be helpful if the writer could clarify whether they are referring to the concepts of biological homology, topological homology or "semantic homology". These aren't the same things and the whole issue of “homology” in geometric morphometrics has always seemed, at least to me, to be very confused. For example, refer to the definitions of “homology” and “landmark” in the Glossary on the SB Morphometrics web site. Because it means different things to different specialists homology isn't a term to be thrown around as lightly as morphometricians seem prone to do. Imprecise and/or ambiguous usage renders the meaning of sentences difficult or impossible to understand for me and I suspect confuses others as well. Norm MacLeod > On 3 Jun 2017, at 08:53, alcardini wrote: > > Hi Philipp, > I am not worried about the number of variables (although I am not sure > one needs thousands of highly correlated points on a relatively simple > structure and seem to remember that Gunz and you suggest to start with > many and then reduce as appropriate). > > Regardless of whether point homology makes sense, I am worried that > many users believe that semilandmarks (maybe after sliding according > to purely mathematical principles) are the same as "traditional > landmarks" with a clear one-to-one correspondence. Even saying that > what's "homologous" is the curve or surface is tricky, because at the > end of the day that curve/surface is discretized using points, shape > distances are based on those points and there are many ways of placing > points with no clear "homology" (figure 7 of Oxnard & O'Higgins, > 2009); indeed, in a ontogenetic study of the cranial vault, for > instance, where sutures may become invisible in adults and therefore > cannot be used as a "boundary", semilandmarks close to the sutures may > end up on different bones in different stages/individuals. > > Semilandmarks are a fantastic tool, which I am happy to use when > needed, but they have their own limitations, which one should be aware > of. > Cheers > > Andrea > > > > On 03/06/2017, mitte...@univie.ac.at wrote: >> I think a few topics get mixed up here. >> >> Of course, a sample can be too small to be representative (as in Andrea's >> example), and one should think carefully about the measures to take. It is >> also clear that an increase in sample size reduces standard errors of >> statistical estimates, including that of a covariance matrix and its >> eigenvalues. But, as mentioned by Dean, the standard errors of the >> eigenvalues are of secondary interest in PCA. >> >> If one has a clear expectation about the signal in the data - and if one >> does not aim at new discoveries - a few specific measurements may suffice, >> perhaps even a few distance measurements. But effective exploratory >> analyses have always been a major strength of geometric morphometrics, >> enabled by the powerful visualization methods together with the large >> number of measured variables. >> >> Andrea, I am actually curious what worries you if one "collects between >> 2700 and 10 400 homologous landmarks from each rib" (whatever the term >> "homologous" is supposed to mean here)? >> >> Compared to many other disciplines in contemporary biology and biomedicine, >> >> a few thousand variables are not particularly many. Consider, for instance, >> >> 2D and 3D image analysis, FEA, and all the "omics", with millions and >> billions of variables. In my opinion, the challenge with these "big data" >> is not statistical power in testing a signal, but finding the signal - the >> low-dimensional subspace of interest - in the fist place. But this applies >> to 50 or 100 variables as well, not only to thousands or millions. If no >> prior expectation about this signal existed (which the mere presence of so >> many variables usually implies), no hypothesis test should be performed at >> all. The ignorance of this rule is one of the main reasons why so many GWAS >> >> and voxel-based morphometry studies fail to be replicable. >> >> Best wishes, >> >> Philipp >> >> -- >> MORPHMET may be accessed via its webpage at http://www.morphometrics.org >> --- >> You received this message because you are subscribed to the Google Groups >> "MORPHMET" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to morphmet+unsubscr...@morphometrics.org. >> > > > -- > > Dr. Andrea Cardini > Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università > di Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy > tel. 0039 059 2058472 > > Adjunct Associate Professor, School of Anatomy, Physiology and Human > Biology, The University of Western Australia, 35 Stirling Highway, > Crawley WA 6009, Australia > > E-mail address: alcard...@gmail.com, andrea.card...@unimore.it > WEBPAGE: https://sites.google.com/site/alcardini/home/main > > FREE Yellow BOOK on Geometric Morphometrics: > http://ww
Re: [MORPHMET] Re: number of landmarks and sample size
Hi Philipp, I am not worried about the number of variables (although I am not sure one needs thousands of highly correlated points on a relatively simple structure and seem to remember that Gunz and you suggest to start with many and then reduce as appropriate). Regardless of whether point homology makes sense, I am worried that many users believe that semilandmarks (maybe after sliding according to purely mathematical principles) are the same as "traditional landmarks" with a clear one-to-one correspondence. Even saying that what's "homologous" is the curve or surface is tricky, because at the end of the day that curve/surface is discretized using points, shape distances are based on those points and there are many ways of placing points with no clear "homology" (figure 7 of Oxnard & O'Higgins, 2009); indeed, in a ontogenetic study of the cranial vault, for instance, where sutures may become invisible in adults and therefore cannot be used as a "boundary", semilandmarks close to the sutures may end up on different bones in different stages/individuals. Semilandmarks are a fantastic tool, which I am happy to use when needed, but they have their own limitations, which one should be aware of. Cheers Andrea On 03/06/2017, mitte...@univie.ac.at wrote: > I think a few topics get mixed up here. > > Of course, a sample can be too small to be representative (as in Andrea's > example), and one should think carefully about the measures to take. It is > also clear that an increase in sample size reduces standard errors of > statistical estimates, including that of a covariance matrix and its > eigenvalues. But, as mentioned by Dean, the standard errors of the > eigenvalues are of secondary interest in PCA. > > If one has a clear expectation about the signal in the data - and if one > does not aim at new discoveries - a few specific measurements may suffice, > perhaps even a few distance measurements. But effective exploratory > analyses have always been a major strength of geometric morphometrics, > enabled by the powerful visualization methods together with the large > number of measured variables. > > Andrea, I am actually curious what worries you if one "collects between > 2700 and 10 400 homologous landmarks from each rib" (whatever the term > "homologous" is supposed to mean here)? > > Compared to many other disciplines in contemporary biology and biomedicine, > > a few thousand variables are not particularly many. Consider, for instance, > > 2D and 3D image analysis, FEA, and all the "omics", with millions and > billions of variables. In my opinion, the challenge with these "big data" > is not statistical power in testing a signal, but finding the signal - the > low-dimensional subspace of interest - in the fist place. But this applies > to 50 or 100 variables as well, not only to thousands or millions. If no > prior expectation about this signal existed (which the mere presence of so > many variables usually implies), no hypothesis test should be performed at > all. The ignorance of this rule is one of the main reasons why so many GWAS > > and voxel-based morphometry studies fail to be replicable. > > Best wishes, > > Philipp > > -- > MORPHMET may be accessed via its webpage at http://www.morphometrics.org > --- > You received this message because you are subscribed to the Google Groups > "MORPHMET" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to morphmet+unsubscr...@morphometrics.org. > -- Dr. Andrea Cardini Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università di Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy tel. 0039 059 2058472 Adjunct Associate Professor, School of Anatomy, Physiology and Human Biology, The University of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia E-mail address: alcard...@gmail.com, andrea.card...@unimore.it WEBPAGE: https://sites.google.com/site/alcardini/home/main FREE Yellow BOOK on Geometric Morphometrics: http://www.italian-journal-of-mammalogy.it/public/journals/3/issue_241_complete_100.pdf ESTIMATE YOUR GLOBAL FOOTPRINT: http://www.footprintnetwork.org/en/index.php/GFN/page/calculators/ -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
Re: [MORPHMET] Re: number of landmarks and sample size
Hello, I'm an archaeologist who works on artifacts in North America. There are not many of us that use LGM, but even we can't seem to agree on how many LMs are appropriate. Because I use discriminant function analysis as the workhorse for discriminating groups of artifacts, I worry about the misuse of that technique. One thing I've read (e.g., Qiao et al. 2009) in regards to DFA is that too many variables (LMs) can affect its discriminatory power through data piling or the related phenomenon of overfitting. I have seen this in my practice but have not tested it rigorously. By reducing the number of LMs, I can sometimes get better discrimination between groups. Numbers of artifacts (specimens) is not a problem. I'm about to embark on a regional analysis using 1000's. Does anyone who understands this phenomenon better than I do care to comment? Thanks, Dave Thulman On Fri, Jun 2, 2017 at 6:12 PM, mitte...@univie.ac.at wrote: > I think a few topics get mixed up here. > > Of course, a sample can be too small to be representative (as in Andrea's > example), and one should think carefully about the measures to take. It is > also clear that an increase in sample size reduces standard errors of > statistical estimates, including that of a covariance matrix and its > eigenvalues. But, as mentioned by Dean, the standard errors of the > eigenvalues are of secondary interest in PCA. > > If one has a clear expectation about the signal in the data - and if one > does not aim at new discoveries - a few specific measurements may suffice, > perhaps even a few distance measurements. But effective exploratory > analyses have always been a major strength of geometric morphometrics, > enabled by the powerful visualization methods together with the large > number of measured variables. > > Andrea, I am actually curious what worries you if one "collects between > 2700 and 10 400 homologous landmarks from each rib" (whatever the term > "homologous" is supposed to mean here)? > > Compared to many other disciplines in contemporary biology and > biomedicine, a few thousand variables are not particularly many. Consider, > for instance, 2D and 3D image analysis, FEA, and all the "omics", with > millions and billions of variables. In my opinion, the challenge with these > "big data" is not statistical power in testing a signal, but finding the > signal - the low-dimensional subspace of interest - in the fist place. But > this applies to 50 or 100 variables as well, not only to thousands or > millions. If no prior expectation about this signal existed (which the mere > presence of so many variables usually implies), no hypothesis test should > be performed at all. The ignorance of this rule is one of the main reasons > why so many GWAS and voxel-based morphometry studies fail to be replicable. > > Best wishes, > > Philipp > > -- > MORPHMET may be accessed via its webpage at http://www.morphometrics.org > --- > You received this message because you are subscribed to the Google Groups > "MORPHMET" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to morphmet+unsubscr...@morphometrics.org. > -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
Re: [MORPHMET] Re: number of landmarks and sample size
I think a few topics get mixed up here. Of course, a sample can be too small to be representative (as in Andrea's example), and one should think carefully about the measures to take. It is also clear that an increase in sample size reduces standard errors of statistical estimates, including that of a covariance matrix and its eigenvalues. But, as mentioned by Dean, the standard errors of the eigenvalues are of secondary interest in PCA. If one has a clear expectation about the signal in the data - and if one does not aim at new discoveries - a few specific measurements may suffice, perhaps even a few distance measurements. But effective exploratory analyses have always been a major strength of geometric morphometrics, enabled by the powerful visualization methods together with the large number of measured variables. Andrea, I am actually curious what worries you if one "collects between 2700 and 10 400 homologous landmarks from each rib" (whatever the term "homologous" is supposed to mean here)? Compared to many other disciplines in contemporary biology and biomedicine, a few thousand variables are not particularly many. Consider, for instance, 2D and 3D image analysis, FEA, and all the "omics", with millions and billions of variables. In my opinion, the challenge with these "big data" is not statistical power in testing a signal, but finding the signal - the low-dimensional subspace of interest - in the fist place. But this applies to 50 or 100 variables as well, not only to thousands or millions. If no prior expectation about this signal existed (which the mere presence of so many variables usually implies), no hypothesis test should be performed at all. The ignorance of this rule is one of the main reasons why so many GWAS and voxel-based morphometry studies fail to be replicable. Best wishes, Philipp -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
RE: [MORPHMET] Re: number of landmarks and sample size
Just to comment. While it is worthwhile to investigate these issues, in my experience same sizes are limited not because investigators are NOT willing to measure more specimens, but there are no additional specimens to include in the analysis, especially for studies based on natural populations, or historical collections. M From: William Gelnaw [mailto:wgel...@gmail.com] Sent: Wednesday, May 31, 2017 3:41 PM To: mitte...@univie.ac.at Cc: MORPHMET Subject: Re: [MORPHMET] Re: number of landmarks and sample size I'm currently working on a paper that deals with the problem of over-parameterizing PCA in morphometrics. The recommendations that I'm making in the paper are that you should try to have at least 3 times as many samples as variables. That means that if you have 10 2D landmarks, you should have at least 60 specimens that you measure. Based on simulations, if you have fewer than 3 specimens per variable, you quickly start getting eigenvalues for a PCA that are very different from known true eigenvalues. I did a literature survey and about a quarter of morphometrics studies in the last decade haven't met that standard. A good way to test if you have enough samples is to do a jackknife analysis. If you cut out about 10% of your observations and still get the same eigenvalues, then your results are probably stable. I hope this helps. - Will On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at<mailto:mitte...@univie.ac.at> mailto:mitte...@univie.ac.at>> wrote: Adding more (semi)landmarks inevitably increases the spatial resolution and thus allows one to capture finer anatomical details - whether relevant to the biological question or not. This can be advantageous for the reconstruction of shapes, especially when producing 3D morphs by warping dense surface representations. Basic developmental or evolutionary trends, group structures, etc., often are visible in an ordination analysis with a smaller set of relevant landmarks; finer anatomical resolution not necessarily affects these patterns. However, adding more landmarks cannot reduce or even remove any signals that were found with less landmarks, but it can make ordination analyses and the interpretation distances and angles in shape space more challenging. An excess of variables (landmarks) over specimens does NOT pose problems to statistical methods such as the computation of mean shapes and Procrustes distances, PCA, PLS, and the multivariate regression of shape coordinates on some independent variable (shape regression). These methods are based on averages or regressions computed for each variable separately, or on the decomposition of a covariance matrix. Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative eigenanalysis require the inversions of a full-rank covariance matrix, which implies an access of specimens over variables. The same applies to many multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full rank and thus can never be subjected to any of these methods without prior variable reduction. In fact, reliable results can only be obtained if there are manifold more specimens than variables, which usually requires variable reduction by PCA, PLS or other techniques, or the regularization of covariance matrices (which is more common in the bioinformatic community). For these reasons, I do not see any disadvantage of measuring a large number of landmarks, except for a waste of time perhaps. If life time is an issue, one can optimize landmark schemes as suggested by Jim or Aki. Best, Philipp -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>. -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>. -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
RE: [MORPHMET] Re: number of landmarks and sample size
Will, I’m not quite sure what over-parameterizing means in the case of PCA, as it is simply a rigid-rotation of the dataspace and does not provide parameters for statistical inference. As for the distribution of eigenvalues, this of course is based on the underlying covariance matrix for the traits, which in turn will be affected by sample size. However, when traits become even mildly correlated (as is certainly the case for landmark coordinates), the distribution of eigenvalues of the covariance matrix becomes much better behaved. Specifically, the eigenvalues associated with low and high PC axes are less extreme than is observed with uncorrelated traits. That implies greater stability in their estimation, as the covariance matrix is further from singular (see the large statistical literature on the condition of a covariance matrix and subsequent estimation issues for ill-behaved covariance matrices). Best, Dean Dr. Dean C. Adams Professor Department of Ecology, Evolution, and Organismal Biology Department of Statistics Iowa State University www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/> phone: 515-294-3834 From: William Gelnaw [mailto:wgel...@gmail.com] Sent: Wednesday, May 31, 2017 5:41 PM To: mitte...@univie.ac.at Cc: MORPHMET Subject: Re: [MORPHMET] Re: number of landmarks and sample size I'm currently working on a paper that deals with the problem of over-parameterizing PCA in morphometrics. The recommendations that I'm making in the paper are that you should try to have at least 3 times as many samples as variables. That means that if you have 10 2D landmarks, you should have at least 60 specimens that you measure. Based on simulations, if you have fewer than 3 specimens per variable, you quickly start getting eigenvalues for a PCA that are very different from known true eigenvalues. I did a literature survey and about a quarter of morphometrics studies in the last decade haven't met that standard. A good way to test if you have enough samples is to do a jackknife analysis. If you cut out about 10% of your observations and still get the same eigenvalues, then your results are probably stable. I hope this helps. - Will On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at<mailto:mitte...@univie.ac.at> mailto:mitte...@univie.ac.at>> wrote: Adding more (semi)landmarks inevitably increases the spatial resolution and thus allows one to capture finer anatomical details - whether relevant to the biological question or not. This can be advantageous for the reconstruction of shapes, especially when producing 3D morphs by warping dense surface representations. Basic developmental or evolutionary trends, group structures, etc., often are visible in an ordination analysis with a smaller set of relevant landmarks; finer anatomical resolution not necessarily affects these patterns. However, adding more landmarks cannot reduce or even remove any signals that were found with less landmarks, but it can make ordination analyses and the interpretation distances and angles in shape space more challenging. An excess of variables (landmarks) over specimens does NOT pose problems to statistical methods such as the computation of mean shapes and Procrustes distances, PCA, PLS, and the multivariate regression of shape coordinates on some independent variable (shape regression). These methods are based on averages or regressions computed for each variable separately, or on the decomposition of a covariance matrix. Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative eigenanalysis require the inversions of a full-rank covariance matrix, which implies an access of specimens over variables. The same applies to many multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full rank and thus can never be subjected to any of these methods without prior variable reduction. In fact, reliable results can only be obtained if there are manifold more specimens than variables, which usually requires variable reduction by PCA, PLS or other techniques, or the regularization of covariance matrices (which is more common in the bioinformatic community). For these reasons, I do not see any disadvantage of measuring a large number of landmarks, except for a waste of time perhaps. If life time is an issue, one can optimize landmark schemes as suggested by Jim or Aki. Best, Philipp -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>. -- MORPHMET may be accessed via its webpage at http://www.morphometrics.o
Re: [MORPHMET] Re: number of landmarks and sample size
I'm currently working on a paper that deals with the problem of over-parameterizing PCA in morphometrics. The recommendations that I'm making in the paper are that you should try to have at least 3 times as many samples as variables. That means that if you have 10 2D landmarks, you should have at least 60 specimens that you measure. Based on simulations, if you have fewer than 3 specimens per variable, you quickly start getting eigenvalues for a PCA that are very different from known true eigenvalues. I did a literature survey and about a quarter of morphometrics studies in the last decade haven't met that standard. A good way to test if you have enough samples is to do a jackknife analysis. If you cut out about 10% of your observations and still get the same eigenvalues, then your results are probably stable. I hope this helps. - Will On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at < mitte...@univie.ac.at> wrote: > Adding more (semi)landmarks inevitably increases the spatial resolution > and thus allows one to capture finer anatomical details - whether relevant > to the biological question or not. This can be advantageous for the > reconstruction of shapes, especially when producing 3D morphs by warping > dense surface representations. Basic developmental or evolutionary trends, > group structures, etc., often are visible in an ordination analysis with a > smaller set of relevant landmarks; finer anatomical resolution not > necessarily affects these patterns. However, adding more landmarks cannot > reduce or even remove any signals that were found with less landmarks, but > it can make ordination analyses and the interpretation distances and angles > in shape space more challenging. > > An excess of variables (landmarks) over specimens does NOT pose problems > to statistical methods such as the computation of mean shapes and > Procrustes distances, PCA, PLS, and the multivariate regression of shape > coordinates on some independent variable (shape regression). These methods > are based on averages or regressions computed for each variable separately, > or on the decomposition of a covariance matrix. > > Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and > relative eigenanalysis require the inversions of a full-rank covariance > matrix, which implies an access of specimens over variables. The same > applies to many multivariate parametric test statistics, such as > Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full > rank and thus can never be subjected to any of these methods without prior > variable reduction. In fact, reliable results can only be obtained if there > are manifold more specimens than variables, which usually requires variable > reduction by PCA, PLS or other techniques, or the regularization of > covariance matrices (which is more common in the bioinformatic community). > > For these reasons, I do not see any disadvantage of measuring a large > number of landmarks, except for a waste of time perhaps. If life time is an > issue, one can optimize landmark schemes as suggested by Jim or Aki. > > Best, > > Philipp > > -- > MORPHMET may be accessed via its webpage at http://www.morphometrics.org > --- > You received this message because you are subscribed to the Google Groups > "MORPHMET" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to morphmet+unsubscr...@morphometrics.org. > -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
[MORPHMET] Re: number of landmarks and sample size
Dear all, I completely agree with the general consensus that the research question should inform the landmark sampling. As the first step in a morphometric study, landmark sampling is definitely worth thinking about deeply since, as discussed here and in many previous studies, it can generate spurious and unintended artifacts in alignment and downstream analyses. It's important to consider "quality over quantity" at both the individual landmark level and also at the level of the overall landmark configuration. For example, landmarks should be fairly evenly distributed on a structure of interest to avoid "Pinocchio effect" where isolated landmarks from the centroid end up having more impact on the alignment because the optimalization is based on *squared* distance from the centroid. Alternatively, one can use the Resistant Fit alignment to mitigate this issue. Another note--my landmark sampling study shows that adding certain landmarks to a subsampled data set can sometimes decrease the overall fit to the full data set with complete set of landmarks. This result further supports the "quality over quantity" idea where choosing poor landmarks can lead to spurious characterization of shape variation, at least with respect to the full data set. Put in another way, adding more landmarks does not guarantee convergence to the full shape characterization (although it does typically converge from personal observation). Happy landmarking, Aki On Tuesday, May 9, 2017 at 12:26:04 PM UTC+1, Lea Wolter wrote: > > Hello everyone, > > I am new in the field of geometric morphometrics and have a question for > my bachelor thesis. > > I am not sure how many landmarks I should use at most in regard to the > sample size. I have a sample of about 22 individuals per population or > maybe a bit less (using sternum and epigyne of spiders) with 5 populations. > I have read a paper in which they use 18 landmarks with an even lower > sample size (3 populations with 20 individuals, 1 with 10). But I have also > heard that I should use twice as much individuals per population as land > marks... > > Maybe there is some mathematical formula for it to know if it would be > statistically significant? Could you recommend some paper? > > Because of the symmetry of the epigyne I am now thinking of using just one > half of it for setting landmarks (so I get 5 instead of 9 landmarks). For > the sternum I thought about 7 or 9 landmarks, so at most I would also get > 18 landmarks like in the paper. > > I would also like to use two type specimens in the analysis, but I have > just this one individual per population... would it be totally nonesens in > a statistical point of view? > > Thanks very much for your help! > > Best regards > Lea -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
[MORPHMET] Re: number of landmarks and sample size
Adding more (semi)landmarks inevitably increases the spatial resolution and thus allows one to capture finer anatomical details - whether relevant to the biological question or not. This can be advantageous for the reconstruction of shapes, especially when producing 3D morphs by warping dense surface representations. Basic developmental or evolutionary trends, group structures, etc., often are visible in an ordination analysis with a smaller set of relevant landmarks; finer anatomical resolution not necessarily affects these patterns. However, adding more landmarks cannot reduce or even remove any signals that were found with less landmarks, but it can make ordination analyses and the interpretation distances and angles in shape space more challenging. An excess of variables (landmarks) over specimens does NOT pose problems to statistical methods such as the computation of mean shapes and Procrustes distances, PCA, PLS, and the multivariate regression of shape coordinates on some independent variable (shape regression). These methods are based on averages or regressions computed for each variable separately, or on the decomposition of a covariance matrix. Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative eigenanalysis require the inversions of a full-rank covariance matrix, which implies an access of specimens over variables. The same applies to many multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full rank and thus can never be subjected to any of these methods without prior variable reduction. In fact, reliable results can only be obtained if there are manifold more specimens than variables, which usually requires variable reduction by PCA, PLS or other techniques, or the regularization of covariance matrices (which is more common in the bioinformatic community). For these reasons, I do not see any disadvantage of measuring a large number of landmarks, except for a waste of time perhaps. If life time is an issue, one can optimize landmark schemes as suggested by Jim or Aki. Best, Philipp -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.
Re: [MORPHMET] Re: number of landmarks and sample size
Dear All, I'd like to add a few comments on sampling (landmarks but also specimens). I hope that some of the other subscribers, who know much more than I do about morphometrics, will refine and correct my points. A very short one on my two papers. They make a very simple point: if one is landmarking just one side of a structure with object symmetry simply to speed up data collection, then mirror-reconstructing the missing side will make a nicer visualization and probably make shape data which are closer to those obtained by landmarking both sides. The difference may be tiny and I said "probably" because I am reporting results of empirical studies: out of 11-12 datasets, all but one had shape distances closer to those of the full bilateral landmark data after mirror-reconstructing the missing side. This did not work in one dataset which happened to have a very large amount of fluctuating asymmetry. To what extent these results are generalizable, I can't say but everyone can plan a small preliminary analysis to check it in her/his own data. I fully agree with Aki that, if time, money etc. are not a constraint, even when one is not interested in asymmetry, it is better to measure both sides. That's in fact true also for structures with matching symmetry. In terms of the choice of landmarks, I wish to stress (once more!) that quality may be more important than quantity: first one should think well about what she/he wants to measure, which will relate to the specific question being asked, and then decide about where and how many landmarks to use. There are at least two wonderful papers I suggested several times on this issue: Oxnard & O'Higgins, 2009, Biological Theory 4(1), 84–97. Klingenberg, 2008, Evol Biol 35:186–190 Then, especially for semilandmarks, I guess that as Aki (and others before) suggested, one can see what a good compromise is between information and the number of points (maybe considering also, but not principally, the visualization). For sample size, one should consider whether differences are presumably big (and a small sample might be OK...ish) or small (as in most microevolutionary studies, which generally require large N). I believe that Rohlf, already in the early days of geometric morphometrics, had written a software for exploring statistical power in shape data (TPSPower) but I am not sure if he kept developing it. In any case, power and sensitivity (to sampling) analyeses are certainly available in R. With small differences, although resampling methods may allow to perform tests even with tiny samples, power will be low and estimates (say, mean size and shape, variance and covariance etc.) will be likely inaccurate. Unfortunately, often, the most interesting taxa are rare populations (or fossils) for which specimens are difficult to find. A couple of people told me that there's an important paper coming out soon on sampling error in geometric morphometrics and it might suggest that one really needs huge samples. I would not be surprised and suspect that the few empirical studies we did (a couple of papers in Zoomorphology) were overoptimistic despite already suggesting (more or less) that one might need several dozens of specimens even when differences are relatively large and the number of landmarks was not particularly large. Again, they were empirical studies and one cannot say how generalizable they are. Anyway, I look forward to this new paper and hope it will be announced in MORPHMET, as well as I look forward to Aki's paper. Cheers Andrea On 29/05/17 18:35, Aki Watanabe wrote: Dear Lea, Unfortunately, there isn't (yet) a magic mathematical formula to determine whether you've sampled enough landmarks, but there are some exploratory approaches you can take to see if you're landmark sampling is converging to the "true" shape variation. One simple thing you can do is sample as many landmarks as you can on a representative sampling of specimens, then create a PC morphospace. Then, subsample the landmarks (e.g., 75%, 50%, 25% of the landmarks) and see if the PC morphospace from these subsampled datasets mirror the distribution of shapes of the full dataset. If the morphospaces begin deviating from the PC morphospace of the full dataset, then you have a visual cue that the subsampling is not adequately characterizing the shape variation of your specimens. In terms of a statistically significant test for landmark sampling, I suppose one can test for correlation between subsampled and full dataset, but because the subsampled and full dataset will be auto-correlated to some extent, the null would have to reflect this. Alternatively, I have a script that automatically subsamples the landmarks of a given dataset and creates a plot to see how well the subsampled datasets converge to the point distribution of the full dataset. If you are interested, I would be happy to describe the technique in more detail a
[MORPHMET] Re: number of landmarks and sample size
Dear Lea, Unfortunately, there isn't (yet) a magic mathematical formula to determine whether you've sampled enough landmarks, but there are some exploratory approaches you can take to see if you're landmark sampling is converging to the "true" shape variation. One simple thing you can do is sample as many landmarks as you can on a representative sampling of specimens, then create a PC morphospace. Then, subsample the landmarks (e.g., 75%, 50%, 25% of the landmarks) and see if the PC morphospace from these subsampled datasets mirror the distribution of shapes of the full dataset. If the morphospaces begin deviating from the PC morphospace of the full dataset, then you have a visual cue that the subsampling is not adequately characterizing the shape variation of your specimens. In terms of a statistically significant test for landmark sampling, I suppose one can test for correlation between subsampled and full dataset, but because the subsampled and full dataset will be auto-correlated to some extent, the null would have to reflect this. Alternatively, I have a script that automatically subsamples the landmarks of a given dataset and creates a plot to see how well the subsampled datasets converge to the point distribution of the full dataset. If you are interested, I would be happy to describe the technique in more detail and/or run the analysis on your dataset if you don't mind sending me the data. The script is currently under review for a journal, so it's not available yet to the public. Also, as you mention, having more shape variables (i.e., number of landmarks x 2 or 3 depending on 2-D or 3-D landmarks) than the number of specimens will generally reduce the power of statistical tests. There are ways to counter this issue (e.g., Q-mode approach recently proposed by Dean Adams). Now, concerning the sampling of bilateral landmarks, Andrea Cardini has recently written a nice pair of papers on the subject: Cardini, A. 2016. Left, right or both? Estimating and improving accuracy of one-side-only geometric morphometric analyses of cranial variation. J Zool Syst Evol Res. Cardini, A. 2016. Lost in the other half: improving accuracy in geometric morphometric analyses of one side of bilaterally symmetric structures. Syst Biol. These papers highlight the artifact that originates from performing Procrustes alignment on "one-side-only" datasets. At least for alignment purposes, I suggest sampling both sides of bilaterally symmetric structures. Hope this helps. All the best, Aki On Tuesday, May 9, 2017 at 12:26:04 PM UTC+1, Lea Wolter wrote: > > Hello everyone, > > I am new in the field of geometric morphometrics and have a question for > my bachelor thesis. > > I am not sure how many landmarks I should use at most in regard to the > sample size. I have a sample of about 22 individuals per population or > maybe a bit less (using sternum and epigyne of spiders) with 5 populations. > I have read a paper in which they use 18 landmarks with an even lower > sample size (3 populations with 20 individuals, 1 with 10). But I have also > heard that I should use twice as much individuals per population as land > marks... > > Maybe there is some mathematical formula for it to know if it would be > statistically significant? Could you recommend some paper? > > Because of the symmetry of the epigyne I am now thinking of using just one > half of it for setting landmarks (so I get 5 instead of 9 landmarks). For > the sternum I thought about 7 or 9 landmarks, so at most I would also get > 18 landmarks like in the paper. > > I would also like to use two type specimens in the analysis, but I have > just this one individual per population... would it be totally nonesens in > a statistical point of view? > > Thanks very much for your help! > > Best regards > Lea -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.