Henrik,

the whole point and only purpose of mc* functions is to fork. That's what the 
multicore package was about, so if you don't want to fork, don't use mc* 
functions - they don't have any other purpose. I really fail to see the point - 
if you use mc* functions you're very explicitly asking for forking - so your 
argument is like saying that print() should have an option to not print 
anything - it just makes no sense. If you have code that is fork-incompatilble, 
you clearly cannot use it in mcparallel - that's why there is a very explicit 
warning in the documentation. As I said, if you have some software that embeds 
R and has issue with forks, then that software should be use pthread_atfork() 
to control the behavior.

Cheers,
Simon



> On Jan 10, 2020, at 3:34 PM, Henrik Bengtsson <henrik.bengts...@gmail.com> 
> wrote:
> 
> On Fri, Jan 10, 2020 at 11:23 AM Simon Urbanek
> <simon.urba...@r-project.org> wrote:
>> 
>> Henrik,
>> 
>> the example from the post works just fine in CRAN R for me - the post was 
>> about homebrew build so it's conceivably a bug in their libraries.
> 
> Thanks for ruling that example out.
> 
>> That's exactly why I was proposing a more general solution where you can 
>> simply define a function in user-space that will issue a warning or stop on 
>> fork, it doesn't have to be part of core R, there are other packages that 
>> use fork() as well, so what I proposed is much safer than hacking the 
>> parallel package.
> 
> I think this is worth pursuing and will help improve and stabilize
> things.  But issuing a warning or stop on fork will not allow end
> users from running the pipeline, or am I missing something?
> 
> I'm trying to argue that this is still a real problem that users and
> developers run into on a regular basis.  Since parallel::mclapply() is
> such a common and readily available solution it is also a low hanging
> fruit to make it possible to have those forking functions fall back to
> sequential processing.  The only(*) way to achieve this fall back
> right now is to run the same pipeline on MS Windows - I just think it
> would be very useful to have the same fallback option available on
> Unix and macOS.  Having this in base R could also serve as standard
> for other parallel/forking packages/implementations who also wish to
> have a fallback to sequential processing.
> 
> ==> What would the disadvantages be to provide a mechanism/setting for
> disabling forking in the parallel::mc*** API? <==
> 
> (*) One can also somewhat disable forking in 'parallel' by using
> 'cgroups' limiting the process to a single core (see also
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17641).  That will
> handle code that uses mc.cores = parallel::detectCores(), which there
> is a lot of.  I guess it will cause run-time error (on 'mc.cores' must
> be >= 1) for code that uses the second most common used mc.cores =
> parallel::detectCores() - 1, which is unfortunately also very common.
> I find the use of hardcoded detectCores() unfortunate but that is a
> slightly different topic.  OTH, if there would a standardized option
> in R for disabling all types of parallel processing by forcing a
> single core, one could imagine other parallel APIs to implement
> fallbacks to sequential processing as well. (I'm aware that not all
> use cases of async processing is about parallelization, so it might
> not apply everywhere).
> 
> Cheers,
> 
> Henrik
> 
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Jan 10, 2020, at 10:58 AM, Henrik Bengtsson <henrik.bengts...@gmail.com> 
>>> wrote:
>>> 
>>> The RStudio GUI was just one example.  AFAIK, and please correct me if
>>> I'm wrong, another example is where multi-threaded code is used in
>>> forked processing and that's sometimes unstable.  Yes another, which
>>> might be multi-thread related or not, is
>>> https://stat.ethz.ch/pipermail/r-devel/2018-September/076845.html:
>>> 
>>> res <- parallel::mclapply(urls, function(url) {
>>> download.file(url, basename(url))
>>> })
>>> 
>>> That was reported to fail on macOS with the default method="libcurl"
>>> but not for method="curl" or method="wget".
>>> 
>>> Further documentation is needed and would help but I don't believe
>>> it's sufficient to solve everyday problems.  The argument for
>>> introducing an option/env var to disable forking is to give the end
>>> user a quick workaround for newly introduced bugs.  Neither the
>>> develop nor the end user have full control of the R package stack,
>>> which is always in flux.  For instance, above mclapply() code might
>>> have been in a package on CRAN and then all of a sudden
>>> method="libcurl" became the new default in base R.  The above
>>> mclapply() code is now buggy on macOS, and not necessarily caught by
>>> CRAN checks.  The package developer might not notice this because they
>>> are on Linux or Windows.  It can take a very long time before this
>>> problem is even noticed and even further before it is tracked down and
>>> fixed.   Similarly, as more and more code turn to native code and it
>>> becomes easier and easier to implement multi-threading, more and more
>>> of these bugs across package dependencies risk sneaking in the
>>> backdoor wherever forked processing is in place.
>>> 
>>> For the end user, but also higher-up upstream package developers, the
>>> quickest workaround would be disable forking.  If you're conservative,
>>> you could even disable it all of your R processing.  Being able to
>>> quickly disable forking will also provide a mechanism for quickly
>>> testing the hypothesis that forking is the underlying problem, i.e.
>>> "Please retry with options(fork.allowed = FALSE)" will become handy
>>> for troubleshooting.
>>> 
>>> /Henrik
>>> 
>>> On Fri, Jan 10, 2020 at 5:31 AM Simon Urbanek
>>> <simon.urba...@r-project.org> wrote:
>>>> 
>>>> If I understand the thread correctly this is an RStudio issue and I would 
>>>> suggest that the developers consider using pthread_atfork() so RStudio can 
>>>> handle forking as they deem fit (bail out with an error or make RStudio 
>>>> work).  Note that in principle the functionality requested here can be 
>>>> easily implemented in a package so R doesn’t need to be modified.
>>>> 
>>>> Cheers,
>>>> Simon
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>>> On Jan 10, 2020, at 04:34, Tomas Kalibera <tomas.kalib...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> On 1/10/20 7:33 AM, Henrik Bengtsson wrote:
>>>>>> I'd like to pick up this thread started on 2019-04-11
>>>>>> (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
>>>>>> Modulo all the other suggestions in this thread, would my proposal of
>>>>>> being able to disable forked processing via an option or an
>>>>>> environment variable make sense?
>>>>> 
>>>>> I don't think R should be doing that. There are caveats with using fork, 
>>>>> and they are mentioned in the documentation of the parallel package, so 
>>>>> people can easily avoid functions that use it, and this all has been 
>>>>> discussed here recently.
>>>>> 
>>>>> If it is the case, we can expand the documentation in parallel package, 
>>>>> add a warning against the use of forking with RStudio, but for that I it 
>>>>> would be good to know at least why it is not working. From the github 
>>>>> issue I have the impression that it is not really known why, whether it 
>>>>> could be fixed, and if so, where. The same github issue reflects also 
>>>>> that some people want to use forking for performance reasons, and even 
>>>>> with RStudio, at least on Linux. Perhaps it could be fixed? Perhaps it is 
>>>>> just some race condition somewhere?
>>>>> 
>>>>> Tomas
>>>>> 
>>>>>> I've prototyped a working patch that
>>>>>> works like:
>>>>>>> options(fork.allowed = FALSE)
>>>>>>> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
>>>>>> [1] 14058 14058
>>>>>>> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
>>>>>> [1] 14058 14058
>>>>>>> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
>>>>>> [1] 14058.1 14058.2
>>>>>>> f <- parallel::mcparallel(Sys.getpid())
>>>>>> Error in allowFork(assert = TRUE) :
>>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
>>>>>> environment variable ‘R_FORK_ALLOWED’
>>>>>>> cl <- parallel::makeForkCluster(1L)
>>>>>> Error in allowFork(assert = TRUE) :
>>>>>> Forked processing is not allowed per option ‘fork.allowed’ or
>>>>>> environment variable ‘R_FORK_ALLOWED’
>>>>>> The patch is:
>>>>>> Index: src/library/parallel/R/unix/forkCluster.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/forkCluster.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/forkCluster.R (working copy)
>>>>>> @@ -30,6 +30,7 @@
>>>>>> newForkNode <- function(..., options = defaultClusterOptions, rank)
>>>>>> {
>>>>>> +    allowFork(assert = TRUE)
>>>>>>   options <- addClusterOptions(options, list(...))
>>>>>>   outfile <- getClusterOption("outfile", options)
>>>>>>   port <- getClusterOption("port", options)
>>>>>> Index: src/library/parallel/R/unix/mclapply.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/mclapply.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/mclapply.R (working copy)
>>>>>> @@ -28,7 +28,7 @@
>>>>>>       stop("'mc.cores' must be >= 1")
>>>>>>   .check_ncores(cores)
>>>>>> -    if (isChild() && !isTRUE(mc.allow.recursive))
>>>>>> +    if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
>>>>>>       return(lapply(X = X, FUN = FUN, ...))
>>>>>>   ## Follow lapply
>>>>>> Index: src/library/parallel/R/unix/mcparallel.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/mcparallel.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/mcparallel.R (working copy)
>>>>>> @@ -20,6 +20,7 @@
>>>>>> mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
>>>>>> FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
>>>>>> {
>>>>>> +    allowFork(assert = TRUE)
>>>>>>   f <- mcfork(detached)
>>>>>>   env <- parent.frame()
>>>>>>   if (isTRUE(mc.set.seed)) mc.advance.stream()
>>>>>> Index: src/library/parallel/R/unix/pvec.R
>>>>>> ===================================================================
>>>>>> --- src/library/parallel/R/unix/pvec.R (revision 77648)
>>>>>> +++ src/library/parallel/R/unix/pvec.R (working copy)
>>>>>> @@ -25,7 +25,7 @@
>>>>>>   cores <- as.integer(mc.cores)
>>>>>>   if(cores < 1L) stop("'mc.cores' must be >= 1")
>>>>>> -    if(cores == 1L) return(FUN(v, ...))
>>>>>> +    if(cores == 1L || !allowFork()) return(FUN(v, ...))
>>>>>>   .check_ncores(cores)
>>>>>>   if(mc.set.seed) mc.reset.stream()
>>>>>> with a new file src/library/parallel/R/unix/allowFork.R:
>>>>>> allowFork <- function(assert = FALSE) {
>>>>>>  value <- Sys.getenv("R_FORK_ALLOWED")
>>>>>>  if (nzchar(value)) {
>>>>>>      value <- switch(value,
>>>>>>         "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
>>>>>>         "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
>>>>>>          stop(gettextf("invalid environment variable value: %s==%s",
>>>>>>         "R_FORK_ALLOWED", value)))
>>>>>> value <- as.logical(value)
>>>>>>  } else {
>>>>>>      value <- TRUE
>>>>>>  }
>>>>>>  value <- getOption("fork.allowed", value)
>>>>>>  if (is.na(value)) {
>>>>>>      stop(gettextf("invalid option value: %s==%s", "fork.allowed", 
>>>>>> value))
>>>>>>  }
>>>>>>  if (assert && !value) {
>>>>>>    stop(gettextf("Forked processing is not allowed per option %s or
>>>>>> environment variable %s", sQuote("fork.allowed"),
>>>>>> sQuote("R_FORK_ALLOWED")))
>>>>>>  }
>>>>>>  value
>>>>>> }
>>>>>> /Henrik
>>>>>>> On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera 
>>>>>>> <tomas.kalib...@gmail.com> wrote:
>>>>>>> On 4/15/19 11:02 AM, Iñaki Ucar wrote:
>>>>>>>> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera 
>>>>>>>> <tomas.kalib...@gmail.com> wrote:
>>>>>>>>> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
>>>>>>>>>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinus...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>> I think it's worth saying that mclapply() works as documented
>>>>>>>>>> Mostly, yes. But it says nothing about fork's copy-on-write and 
>>>>>>>>>> memory
>>>>>>>>>> overcommitment, and that this means that it may work nicely or fail
>>>>>>>>>> spectacularly depending on whether, e.g., you operate on a long
>>>>>>>>>> vector.
>>>>>>>>> R cannot possibly replicate documentation of the underlying operating
>>>>>>>>> systems. It clearly says that fork() is used and readers who may not
>>>>>>>>> know what fork() is need to learn it from external sources.
>>>>>>>>> Copy-on-write is an elementary property of fork().
>>>>>>>> Just to be precise, copy-on-write is an optimization widely deployed
>>>>>>>> in most modern *nixes, particularly for the architectures in which R
>>>>>>>> usually runs. But it is not an elementary property; it is not even
>>>>>>>> possible without an MMU.
>>>>>>> Yes, old Unix systems without virtual memory had fork eagerly copying.
>>>>>>> Not relevant today, and certainly not for systems that run R, but indeed
>>>>>>> people interested in OS internals can look elsewhere for more precise
>>>>>>> information.
>>>>>>> Tomas
>>>>> 
>>>>> ______________________________________________
>>>>> R-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>>> ______________________________________________
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to