On 9/21/18 3:15 PM, Ralf Stubner wrote: > Right, the RNGs in R produce no more than 32bits, so the conversion to a > double can be reverted. If we ignore those RNGs that produce less than > 32bits for the moment, then the attached file contains a sample > implementation (without long vectors, weighted sampling or hashing). It > uses Rcpp for convenience, but I have tried to keep the C++ low.
I just realized that the mailing list had stripped the attachment. I am therefore including it at the end for reference. Meanwhile I have also cast this into a package (feedback welcome): https://www.daqana.org/dqsample/ cheerio ralf // [[Rcpp::plugins(cpp11)]] #include <cstdint> #include <Rcpp.h> /* Daniel Lemire Fast Random Integer Generation in an Interval https://arxiv.org/abs/1805.10941 */ uint32_t nearlydivisionless32(uint32_t s, uint32_t (*random32)(void)) { uint32_t x = random32(); uint64_t m = (uint64_t) x * (uint64_t) s; uint32_t l = (uint32_t) m; if (l < s) { uint32_t t = -s % s; while (l < t) { x = random32(); m = (uint64_t) x * (uint64_t) s; l = (uint32_t) m; } } return m >> 32; } uint32_t random32() { return R::runif(0, 1) * 4294967296; /* 2^32 */ } // [[Rcpp::export]] Rcpp::IntegerVector sample_int(int n, int size, bool replace = false) { Rcpp::IntegerVector result(Rcpp::no_init(size)); if (replace) { for (int i = 0; i < size; ++i) result[i] = nearlydivisionless32(n, random32) + 1; } else { Rcpp::IntegerVector tmp(Rcpp::no_init(n)); for (int i = 0; i < n; ++i) tmp[i] = i; for (int i = 0; i < size; ++i) { int j = nearlydivisionless32(n, random32); result[i] = tmp[j] + 1; tmp[j] = tmp[--n]; } } return result; } /*** R set.seed(42) sample.int(6, 10, replace = TRUE) sample.int(100, 10) set.seed(42) sample_int(6, 10, replace = TRUE) sample_int(100, 10) m <- ceiling((2/5)*2^32) set.seed(42) x <- sample.int(m, 1000000, replace = TRUE) table(x %% 2) set.seed(42) y <- sample_int(m, 1000000, replace = TRUE) table(y %% 2) set.seed(42) sample.int(m, 6, replace = TRUE) set.seed(42) sample_int(m, 6, replace = TRUE) bench::mark(orig = sample.int(m, 1000000, replace = TRUE), new = sample_int(m, 1000000, replace = TRUE), check = FALSE) bench::mark(orig = sample.int(6, 1000000, replace = TRUE), new = sample_int(6, 1000000, replace = TRUE), check = FALSE) */ -- Ralf Stubner Senior Software Engineer / Trainer daqana GmbH Dortustraße 48 14467 Potsdam T: +49 331 23 61 93 11 F: +49 331 23 61 93 90 M: +49 162 20 91 196 Mail: ralf.stub...@daqana.com Sitz: Potsdam Register: AG Potsdam HRB 27966 P Ust.-IdNr.: DE300072622 Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze
signature.asc
Description: OpenPGP digital signature
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel