Yes, I would think this behavior is intentionally, but obviously, I don't know for sure. Looking at the code:
> parallel::clusterSetRNGStream function (cl = NULL, iseed = NULL) { cl <- defaultCluster(cl) oldseed <- if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE)) get(".Random.seed", envir = .GlobalEnv, inherits = FALSE) else NULL RNGkind("L'Ecuyer-CMRG") if (!is.null(iseed)) set.seed(iseed) nc <- length(cl) seeds <- vector("list", nc) seeds[[1L]] <- .Random.seed You'll find that: 1. the stream of RNG seeds, originates from .Random.seed. 2a. 'iseed' is only applied if non-NULL, which changes starting .Random.seed. 2b. If iseed = NULL, then the .Random.seed is whatever it was when you called the function If you use iseed = NULL, then you need to forward the RNG state (=.Random.seed) yourself. Here's an example: set.seed(1) library(parallel) cl <- parallel::makeCluster(5) str(.Random.seed) # int [1:626] 10403 624 -169270483 -442010614 -603558397 -222347416 ... clusterSetRNGStream(cl, iseed = NULL) parSapply(cl, 1:5, function(i) sample(1:10, 1)) # [1] 7 4 2 10 10 str(.Random.seed) # int [1:626] 10403 624 -169270483 -442010614 -603558397 -222347416 ... clusterSetRNGStream(cl, iseed = NULL) parSapply(cl, 1:5, function(i) sample(1:10, 1)) # [1] 7 4 2 10 10 ## Forward RNG state sample.int(1) # [1] 1 str(.Random.seed) # int [1:626] 10403 1 1654269195 -1877109783 -961256264 1403523942 ... clusterSetRNGStream(cl, iseed = NULL) parSapply(cl, 1:5, function(i) sample(1:10, 1)) # [1] 8 6 1 7 5 FYI, you see a similar behavior with parallel::mclapply(): set.seed(1) library(parallel) RNGkind("L'Ecuyer-CMRG") unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE)) # [1] -1.2673735 0.9045952 1.9502072 unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE)) # [1] -1.2673735 0.9045952 1.9502072 ## Forward RNG state sample.int(1) # [1] 1 unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE)) # [1] -0.09117479 -1.07803714 0.13924063 I can see pros and cons with this behavior, but I think the default is risky. For instance, it's not hard to imagine an implementation resampling algorithm where you have to option to run it via lapply() or via parallel::mclapply() - there is a non-zero probability that such an implementation produces identical samples. Proper parallel RNG can be tricky /Henrik On Fri, Jun 7, 2019 at 7:09 AM Colin Gillespie <csgilles...@gmail.com> wrote: > > Dear All, > > Is the following expected behaviour? > > set.seed(1) > library(parallel) > cl = makeCluster(5) > clusterSetRNGStream(cl, iseed = NULL) > parSapply(cl, 1:5, function(i) sample(1:10, 1)) > # 7 4 2 10 10 > clusterSetRNGStream(cl, iseed = NULL) > # 7 4 2 10 10 > parSapply(cl, 1:5, function(i) sample(1:10, 1)) > stopCluster(cl) > > The documentation could be read either way, e.g. > > * iseed: An integer to be supplied to set.seed, or NULL not to set > reproducible seeds. > > From Details > > .... optionally setting the seed of the streams by set.seed(iseed) > (otherwise they are set from the current seed of the master process: > after selecting the L'Ecuyer generator). > > As may be guessed, this caught me out, since I was expecting the same > behaviour as set.seed(NULL). > > Thanks > > Colin > > ---------- > > R version 3.6.0 (2019-04-26) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 18.04.2 LTS > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel