This code actually happens to work for me on macOS, but I think in general you cannot rely on performing HTTP requests in fork clusters, i.e. with mclapply().
Fork clusters create worker processes by forking the R process and then _not_ executing another R binary. (Which is often convenient, because the new processes will inherit the memory image of the parent process.) Fork without exec is not supported by macOS, basically any calls to system libraries might crash. (Ie. not just HTTP-related calls.) For HTTP calls I have seen errors, crashes, and sometimes it works. Depends on the combination of libcurl version, macOS version and probably luck. It usually (always?) works on Linux, but I would not rely on that, either. So, yes, this is a known issue. Creating new processes to perform HTTP in parallel is very often bad practice, actually. Whenever you can, use I/O multiplexing instead, since the main R process is not doing anything, anyway, just waiting for the data to come in. So you don't need more processes, you need parallel I/O. Take a look at the curl::multi_add() etc. functions. Btw. download.file() can actually download files in parallel if the liburl method is used, just give it a list of URLs in a character vector. This API is very restricted, though, so I suggest to look at the curl package. GaborOn Thu, Sep 20, 2018 at 8:44 AM Seth Russell <seth.russ...@gmail.com> wrote: > > I have an lapply function call that I want to parallelize. Below is a very > simplified version of the code: > > url_base <- "https://cloud.r-project.org/src/contrib/" > files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz") > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s)) > > Instead of download a couple of files in parallel, I get a segfault per > process with a 'memory not mapped' message. I've been working with Henrik > Bengtsson on resolving this issue and he recommended I send a message to > the R-Devel mailing list. > > Here's the output: > > trying URL 'https://cloud.r-project.org/src/contrib/A3_1.0.0.tar.gz' > trying URL 'https://cloud.r-project.org/src/contrib/ABC.RAP_0.9.0.tar.gz' > > *** caught segfault *** > address 0x11575ba3a, cause 'memory not mapped' > > *** caught segfault *** > address 0x11575ba3a, cause 'memory not mapped' > > Traceback: > 1: download.file(paste0(url_base, s), s) > 2: FUN(X[[i]], ...) > 3: lapply(X = S, FUN = FUN, ...) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: tryCatchList(expr, classes, parentenv, handlers) > 7: tryCatch(expr, error = function(e) { call <- conditionCall(e) if > (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) > call <- sys.call(-4L) dcall <- deparse(call)[1L] > prefix <- paste("Error in", dcall, ": ") > LONG <- 75LTraceback: > sm <- strsplit(conditionMessage(e), "\n")[[1L]] 1: w <- 14L > + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) > download.file(paste0(url_base, s), s) w <- 14L + nchar(dcall, > type = "b") + nchar(sm[1L], > type = "b") if (w > LONG) 2: FUN(X[[i]], ...) > 3: lapply(X = S, FUN = FUN, ...) > 4: doTryCatch(return(expr), name, parentenv, handler) > 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 6: prefix <- paste0(prefix, "\n ")tryCatchList(expr, classes, > parentenv, handlers) > } else prefix <- "Error : " 7: msg <- paste0(prefix, > conditionMessage(e), "\n")tryCatch(expr, error = function(e) { > .Internal(seterrmessage(msg[1L])) call <- conditionCall(e) if > (!silent && isTRUE(getOption("show.error.messages"))) { if > (!is.null(call)) { cat(msg, file = outFile) if > (identical(call[[1L]], quote(doTryCatch))) > .Internal(printDeferredWarnings()) call <- sys.call(-4L) } > dcall <- deparse(call)[1L] invisible(structure(msg, class = > "try-error", condition = e)) prefix <- paste("Error in", dcall, ": > ")}) LONG <- 75L sm <- strsplit(conditionMessage(e), > "\n")[[1L]] > w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") > if (is.na(w)) 8: w <- 14L + nchar(dcall, type = "b") + > nchar(sm[1L], try(lapply(X = S, FUN = FUN, ...), silent = TRUE) > type = "b") > if (w > LONG) prefix <- paste0(prefix, "\n ") 9: > }sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) else > prefix <- "Error : " > msg <- paste0(prefix, conditionMessage(e), "\n") > .Internal(seterrmessage(msg[1L]))10: if (!silent && > isTRUE(getOption("show.error.messages"))) {FUN(X[[i]], ...) cat(msg, > file = outFile) > .Internal(printDeferredWarnings()) }11: > invisible(structure(msg, class = "try-error", condition = > e))lapply(seq_len(cores), inner.do)}) > > 12: 8: parallel::mclapply(files, function(s) > download.file(paste0(url_base, try(lapply(X = S, FUN = FUN, ...), silent = > TRUE) s), s)) > > 9: > sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))Possible > actions: > > 1: abort (with core dump, if enabled) > 2: normal R exit > 10: 3: exit R without saving workspace > FUN(X[[i]], ...)4: exit R saving workspace > > 11: lapply(seq_len(cores), inner.do) > 12: parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s)) > > Here's my sessionInfo() > > > sessionInfo() > R version 3.5.1 (2018-07-02) > Platform: x86_64-apple-darwin16.7.0 (64-bit) > Running under: macOS Sierra 10.12.6 > > Matrix products: default > BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.3/lib/libopenblasp-r0.3.3.dylib > > locale: > [1] en_US/en_US/en_US/C/en_US/en_US > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > loaded via a namespace (and not attached): > [1] compiler_3.5.1 > > My version of R I'm running was installed via homebrew with "brew install r > --with-java --with-openblas" > > Also, the provided example code works as expected on Linux. Also, if I > provide a non-default download method to the download.file() call such as: > > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s, method="wget")) > res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, > s), s, method="curl")) > > It works correctly - no segfault. If I use method="libcurl" it does > segfault. > > I'm not sure what steps to take to further narrow down the source of the > error. > > Is this a known bug? if not, is this a new bug or an unexpected feature? > > Thanks, > Seth > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel