Before the next release we really should to sort out the bias issue in sample() reported by Ottoboni and Stark in https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and filed aa a bug report by Duncan Murdoch at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.
Here are two examples of bad behavior through current R-devel: set.seed(123) m <- (2/5) * 2^32 x <- sample(m, 1000000, replace = TRUE) table(x %% 2, x > m / 2) ## ## FALSE TRUE ## 0 300620 198792 ## 1 200196 300392 table(sample(2/7 * 2^32, 1000000, replace = TRUE) %% 2) ## ## 0 1 ## 429054 570946 I committed a modification to R_unif_index to address this by generating random bits (blocks of 16) and rejection sampling, but for now this is only enabled if the environment variable R_NEW_SAMPLE is set before the first call. Some things still needed: - someone to look over the change and see if there are any issues - adjustment of RNGkind to allowing the old behavior to be selected - make the new behavior the default - adjust documentation - ??? Unfortunately I don't have enough free cycles to do this, but I can help if someone else can take the lead. There are two other places I found that might suffer from the same issue, in walker_ProbSampleReplace (pointed out bu O & S) and in src/nmath/wilcox.c. Both can be addressed by using R_unif_index. I have done that for walker_ProbSampleReplace, but the wilcox change might need adjusting to support the standalone math library and I don't feel confident enough I'd get that right. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel