Henrik, the whole point and only purpose of mc* functions is to fork. That's what the multicore package was about, so if you don't want to fork, don't use mc* functions - they don't have any other purpose. I really fail to see the point - if you use mc* functions you're very explicitly asking for forking - so your argument is like saying that print() should have an option to not print anything - it just makes no sense. If you have code that is fork-incompatilble, you clearly cannot use it in mcparallel - that's why there is a very explicit warning in the documentation. As I said, if you have some software that embeds R and has issue with forks, then that software should be use pthread_atfork() to control the behavior.
Cheers, Simon > On Jan 10, 2020, at 3:34 PM, Henrik Bengtsson <henrik.bengts...@gmail.com> > wrote: > > On Fri, Jan 10, 2020 at 11:23 AM Simon Urbanek > <simon.urba...@r-project.org> wrote: >> >> Henrik, >> >> the example from the post works just fine in CRAN R for me - the post was >> about homebrew build so it's conceivably a bug in their libraries. > > Thanks for ruling that example out. > >> That's exactly why I was proposing a more general solution where you can >> simply define a function in user-space that will issue a warning or stop on >> fork, it doesn't have to be part of core R, there are other packages that >> use fork() as well, so what I proposed is much safer than hacking the >> parallel package. > > I think this is worth pursuing and will help improve and stabilize > things. But issuing a warning or stop on fork will not allow end > users from running the pipeline, or am I missing something? > > I'm trying to argue that this is still a real problem that users and > developers run into on a regular basis. Since parallel::mclapply() is > such a common and readily available solution it is also a low hanging > fruit to make it possible to have those forking functions fall back to > sequential processing. The only(*) way to achieve this fall back > right now is to run the same pipeline on MS Windows - I just think it > would be very useful to have the same fallback option available on > Unix and macOS. Having this in base R could also serve as standard > for other parallel/forking packages/implementations who also wish to > have a fallback to sequential processing. > > ==> What would the disadvantages be to provide a mechanism/setting for > disabling forking in the parallel::mc*** API? <== > > (*) One can also somewhat disable forking in 'parallel' by using > 'cgroups' limiting the process to a single core (see also > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17641). That will > handle code that uses mc.cores = parallel::detectCores(), which there > is a lot of. I guess it will cause run-time error (on 'mc.cores' must > be >= 1) for code that uses the second most common used mc.cores = > parallel::detectCores() - 1, which is unfortunately also very common. > I find the use of hardcoded detectCores() unfortunate but that is a > slightly different topic. OTH, if there would a standardized option > in R for disabling all types of parallel processing by forcing a > single core, one could imagine other parallel APIs to implement > fallbacks to sequential processing as well. (I'm aware that not all > use cases of async processing is about parallelization, so it might > not apply everywhere). > > Cheers, > > Henrik > >> >> Cheers, >> Simon >> >> >> >>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson <henrik.bengts...@gmail.com> >>> wrote: >>> >>> The RStudio GUI was just one example. AFAIK, and please correct me if >>> I'm wrong, another example is where multi-threaded code is used in >>> forked processing and that's sometimes unstable. Yes another, which >>> might be multi-thread related or not, is >>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html: >>> >>> res <- parallel::mclapply(urls, function(url) { >>> download.file(url, basename(url)) >>> }) >>> >>> That was reported to fail on macOS with the default method="libcurl" >>> but not for method="curl" or method="wget". >>> >>> Further documentation is needed and would help but I don't believe >>> it's sufficient to solve everyday problems. The argument for >>> introducing an option/env var to disable forking is to give the end >>> user a quick workaround for newly introduced bugs. Neither the >>> develop nor the end user have full control of the R package stack, >>> which is always in flux. For instance, above mclapply() code might >>> have been in a package on CRAN and then all of a sudden >>> method="libcurl" became the new default in base R. The above >>> mclapply() code is now buggy on macOS, and not necessarily caught by >>> CRAN checks. The package developer might not notice this because they >>> are on Linux or Windows. It can take a very long time before this >>> problem is even noticed and even further before it is tracked down and >>> fixed. Similarly, as more and more code turn to native code and it >>> becomes easier and easier to implement multi-threading, more and more >>> of these bugs across package dependencies risk sneaking in the >>> backdoor wherever forked processing is in place. >>> >>> For the end user, but also higher-up upstream package developers, the >>> quickest workaround would be disable forking. If you're conservative, >>> you could even disable it all of your R processing. Being able to >>> quickly disable forking will also provide a mechanism for quickly >>> testing the hypothesis that forking is the underlying problem, i.e. >>> "Please retry with options(fork.allowed = FALSE)" will become handy >>> for troubleshooting. >>> >>> /Henrik >>> >>> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek >>> <simon.urba...@r-project.org> wrote: >>>> >>>> If I understand the thread correctly this is an RStudio issue and I would >>>> suggest that the developers consider using pthread_atfork() so RStudio can >>>> handle forking as they deem fit (bail out with an error or make RStudio >>>> work). Note that in principle the functionality requested here can be >>>> easily implemented in a package so R doesn’t need to be modified. >>>> >>>> Cheers, >>>> Simon >>>> >>>> Sent from my iPhone >>>> >>>>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalib...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote: >>>>>> I'd like to pick up this thread started on 2019-04-11 >>>>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html). >>>>>> Modulo all the other suggestions in this thread, would my proposal of >>>>>> being able to disable forked processing via an option or an >>>>>> environment variable make sense? >>>>> >>>>> I don't think R should be doing that. There are caveats with using fork, >>>>> and they are mentioned in the documentation of the parallel package, so >>>>> people can easily avoid functions that use it, and this all has been >>>>> discussed here recently. >>>>> >>>>> If it is the case, we can expand the documentation in parallel package, >>>>> add a warning against the use of forking with RStudio, but for that I it >>>>> would be good to know at least why it is not working. From the github >>>>> issue I have the impression that it is not really known why, whether it >>>>> could be fixed, and if so, where. The same github issue reflects also >>>>> that some people want to use forking for performance reasons, and even >>>>> with RStudio, at least on Linux. Perhaps it could be fixed? Perhaps it is >>>>> just some race condition somewhere? >>>>> >>>>> Tomas >>>>> >>>>>> I've prototyped a working patch that >>>>>> works like: >>>>>>> options(fork.allowed = FALSE) >>>>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid())) >>>>>> [1] 14058 14058 >>>>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid()) >>>>>> [1] 14058 14058 >>>>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10) >>>>>> [1] 14058.1 14058.2 >>>>>>> f <- parallel::mcparallel(Sys.getpid()) >>>>>> Error in allowFork(assert = TRUE) : >>>>>> Forked processing is not allowed per option ‘fork.allowed’ or >>>>>> environment variable ‘R_FORK_ALLOWED’ >>>>>>> cl <- parallel::makeForkCluster(1L) >>>>>> Error in allowFork(assert = TRUE) : >>>>>> Forked processing is not allowed per option ‘fork.allowed’ or >>>>>> environment variable ‘R_FORK_ALLOWED’ >>>>>> The patch is: >>>>>> Index: src/library/parallel/R/unix/forkCluster.R >>>>>> =================================================================== >>>>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648) >>>>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy) >>>>>> @@ -30,6 +30,7 @@ >>>>>> newForkNode <- function(..., options = defaultClusterOptions, rank) >>>>>> { >>>>>> + allowFork(assert = TRUE) >>>>>> options <- addClusterOptions(options, list(...)) >>>>>> outfile <- getClusterOption("outfile", options) >>>>>> port <- getClusterOption("port", options) >>>>>> Index: src/library/parallel/R/unix/mclapply.R >>>>>> =================================================================== >>>>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648) >>>>>> +++ src/library/parallel/R/unix/mclapply.R (working copy) >>>>>> @@ -28,7 +28,7 @@ >>>>>> stop("'mc.cores' must be >= 1") >>>>>> .check_ncores(cores) >>>>>> - if (isChild() && !isTRUE(mc.allow.recursive)) >>>>>> + if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive))) >>>>>> return(lapply(X = X, FUN = FUN, ...)) >>>>>> ## Follow lapply >>>>>> Index: src/library/parallel/R/unix/mcparallel.R >>>>>> =================================================================== >>>>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648) >>>>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy) >>>>>> @@ -20,6 +20,7 @@ >>>>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent = >>>>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE) >>>>>> { >>>>>> + allowFork(assert = TRUE) >>>>>> f <- mcfork(detached) >>>>>> env <- parent.frame() >>>>>> if (isTRUE(mc.set.seed)) mc.advance.stream() >>>>>> Index: src/library/parallel/R/unix/pvec.R >>>>>> =================================================================== >>>>>> --- src/library/parallel/R/unix/pvec.R (revision 77648) >>>>>> +++ src/library/parallel/R/unix/pvec.R (working copy) >>>>>> @@ -25,7 +25,7 @@ >>>>>> cores <- as.integer(mc.cores) >>>>>> if(cores < 1L) stop("'mc.cores' must be >= 1") >>>>>> - if(cores == 1L) return(FUN(v, ...)) >>>>>> + if(cores == 1L || !allowFork()) return(FUN(v, ...)) >>>>>> .check_ncores(cores) >>>>>> if(mc.set.seed) mc.reset.stream() >>>>>> with a new file src/library/parallel/R/unix/allowFork.R: >>>>>> allowFork <- function(assert = FALSE) { >>>>>> value <- Sys.getenv("R_FORK_ALLOWED") >>>>>> if (nzchar(value)) { >>>>>> value <- switch(value, >>>>>> "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE, >>>>>> "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE, >>>>>> stop(gettextf("invalid environment variable value: %s==%s", >>>>>> "R_FORK_ALLOWED", value))) >>>>>> value <- as.logical(value) >>>>>> } else { >>>>>> value <- TRUE >>>>>> } >>>>>> value <- getOption("fork.allowed", value) >>>>>> if (is.na(value)) { >>>>>> stop(gettextf("invalid option value: %s==%s", "fork.allowed", >>>>>> value)) >>>>>> } >>>>>> if (assert && !value) { >>>>>> stop(gettextf("Forked processing is not allowed per option %s or >>>>>> environment variable %s", sQuote("fork.allowed"), >>>>>> sQuote("R_FORK_ALLOWED"))) >>>>>> } >>>>>> value >>>>>> } >>>>>> /Henrik >>>>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera >>>>>>> <tomas.kalib...@gmail.com> wrote: >>>>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote: >>>>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera >>>>>>>> <tomas.kalib...@gmail.com> wrote: >>>>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote: >>>>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinus...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> I think it's worth saying that mclapply() works as documented >>>>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and >>>>>>>>>> memory >>>>>>>>>> overcommitment, and that this means that it may work nicely or fail >>>>>>>>>> spectacularly depending on whether, e.g., you operate on a long >>>>>>>>>> vector. >>>>>>>>> R cannot possibly replicate documentation of the underlying operating >>>>>>>>> systems. It clearly says that fork() is used and readers who may not >>>>>>>>> know what fork() is need to learn it from external sources. >>>>>>>>> Copy-on-write is an elementary property of fork(). >>>>>>>> Just to be precise, copy-on-write is an optimization widely deployed >>>>>>>> in most modern *nixes, particularly for the architectures in which R >>>>>>>> usually runs. But it is not an elementary property; it is not even >>>>>>>> possible without an MMU. >>>>>>> Yes, old Unix systems without virtual memory had fork eagerly copying. >>>>>>> Not relevant today, and certainly not for systems that run R, but indeed >>>>>>> people interested in OS internals can look elsewhere for more precise >>>>>>> information. >>>>>>> Tomas >>>>> >>>>> ______________________________________________ >>>>> R-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel