There was a similar concern (about arbitrary p-values in simulations) expressed
in a nice, recent Oikos paper, too:
White, J. W., A. Rassweiler, J. F. Samhouri, A. C. Stier, and C. White. 2014.
Ecologists should not use statistical significance tests to interpret
simulation model results. Oikos 123(4):385-388.
I think some of the concerns are also wise to heed for those doing resampling.
It's always worth doing a quick sensitivity analysis (running the resampling
algorithm for increasing numbers of replicates) to identify the point at which
p-values (or other stats you're interested in) become stable. But there's no
need to run huge replicates if they're not needed. See p. 40 in the paper below
(focused more on general paleontological applications than morphometrics) for
discussion and recommendations on this issue, with some advice from the
literature. The online supplement at
http://www.paleosoc.org/shortcourse2010/Resampling_KandN-G_Appendices_Oct11.doc
has some (clunky) sample R code (section 2.8) demonstrating this.
Kowalewski, M., and P. Novack-Gottshall. 2010. Resampling methods in
paleontology. Pp. 19-54. In J. Alroy, and G. Hunt, eds. Quantitative Methods in
Paleobiology. Short Courses in Paleontology 16. Paleontological Society and
Paleontological Research Institute, Ithaca, NY.
Cheers,
Phil
On 6/2/2015 11:47 PM, Tsung Fei Khang wrote:
Dear community,
Many thanks to everyone who responded with your opinions and also references. I
think the set.seed solves the reproducibility problem, and for practicality, I
would just set seed, make a single run at a high number of replicates such as
10,000, and then report a reasonable upper bound for the p-value (e.g. p-value
< 0.01 if I get something like 0.0068).
@Aki: Thank you. 1/#iterations is problematic because one could then get
arbitrarily small p-values... should converge to some value (however small) as
the number of iterations exceeds some threshold, which is dependent on data set.
On Tuesday, June 2, 2015 at 3:48:37 PM UTC+8, Tsung Fei Khang wrote:
Dear community,
I would like to share my experience with using some (really cool) computational
tools for phylogenetic signal and morphological integration analysis.
I am using physignal (geomorph R package) and the Phylo.Morphol.PLS function
provided in the paper by Adams and Felice (2014; PLoS ONE, 9:e94335) in my
work. I noticed that if the same analysis is rerun for a particular number of
iterations, the results may vary. Additionally, I observed that increasing the
number of iterations, up to some critical point, may push down the p-value,
depending on data set (didn't happen with the plethspecies (9 species) data,
but happened in my data set - 13 species, not salamanders). I attach runs (10
times) for both data sets for iterations of 100, 1000, 1 and 10 here
for Phylo.Morphol.PLS. Note that some kind of stable results is attained after
1000 iterations (default) for the plethspecies data, but for my case, which
needs 1.
I think the notion that p-values returned from a permutation method are
actually realizations of random variables with a certain mean and variance may
not be familiar to many biologists, who are accustomed to expect a reproducible
p-value when the same data set is rerun using common statistical tests. Perhaps
in a future version the authors of the code can implement a checker within the
functions that checks the number of iterations for attaining "convergence", so
that a more stable p-value is returned?
" PENAFIAN: E-mel ini dan apa-apa fail yang dikepilkan bersamanya ("Mesej")
adalah ditujukan hanya untuk kegunaan penerima(-penerima) yang termaklum di
atas dan mungkin mengandungi maklumat sulit. Anda dengan ini dimaklumkan bahawa
mengambil apa jua tindakan bersandarkan kepada, membuat penilaian, mengulang
hantar, menghebah, mengedar, mencetak, atau menyalin Mesej ini atau sebahagian
daripadanya oleh sesiapa selain daripada penerima(-penerima) yang termaklum di
atas adalah dilarang. Jika anda telah menerima Mesej ini kerana kesilapan, anda
mesti menghapuskan Mesej ini dengan segera dan memaklumkan kepada penghantar
Mesej ini menerusi balasan e-mel. Pendapat-pendapat, rumusan-rumusan, dan
sebarang maklumat lain di dalam Mesej ini yang tidak berkait dengan urusan
rasmi Universiti Malaya adalah difahami sebagai bukan dikeluar atau diperakui
oleh mana-mana pihak yang disebut.
DISCLAIMER: This e-mail and any files transmitted with it ("Message") is
intended only for the use of the recipient(s) named above and may contain
confidential information. You are hereby notified that the taking of any action
in reliance upon, or any review, retransmission, dissemination, distribution,
printing or copying of this Message or any part thereof by anyone other than
the intended recipient(s) is strictly prohibited. If you have received this
Message in error, you should delete this Message immediately and advise the