The basic principle I would follow is to make sure your code only goes parallel with explicit permission from the end user. One way to do that is accept a cluster from the caller; another is to create and shut down your won cluster if a global option is set (via options() or a mechanism of your own).
If you create and shut down your own cluster you can do pretty much what you like. If you use one passed to you it would be best to leave it in the state you found it at least as far as the search path and global environment are concerned. So use foo::bar instead of library(). User can also set a default cluster. You can use getDefaultCluster to retrieve it; this returns NULL if no default cluster is set. You could assume that if one is set you are allowed to use it, but it might still be a good idea to look for explicit permission via an option or an argument. I would again try to leave the cluster used this way in as clean a state as you can. Best, luke On Sun, 24 May 2020, Ivan Krylov wrote:
Some of the packages I use make it possible to run some of the computations in parallel. For example, sNPLS::cv_snpls calls makeCluster() itself, makes sure that the package is loaded by workers, exports the necessary variables and stops the cluster after it is finished. On the other hand, multiway::parafac accepts arbitrary cluster objects supplied by user, but requires the user to manually preload the package on the workers. Both packages export and document the internal functions intended to run on the workers. Are there any guidelines for use of snow-style clusters in R packages? I remember reading somewhere that accepting arbitrary cluster objects from the user instead of makeCluster(detectCores()) is generally considered a good idea (for multiple reasons ranging from giving the user more control of CPU load to making it possible to run the code on a number of networked machines that the package code knows nothing about), but I couldn't find a reference for that in Writing R Extensions or parallel package documentation. What about preloading the package on the workers? Are there any downsides to the package code unconditionally running clusterEvalQ(cl, library(myself)) to avoid disappointing errors like "10 nodes produced errors; first error: could not find function"? Speaking of private functions intended to run by the package itself on the worker nodes, should they be exported? I have prepared a test package doing little more than the following: R/fun.R: private <- function(x) paste(x, Sys.getpid()) public <- function(cl, x) parallel::parLapply(cl, x, private) NAMESPACE: export(public) The package passes R CMD check --as-cran without warnings or errors, which seems to suggest that exporting worker functions is not required.
-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel