Re: [R-pkg-devel] [External] Guidelines on use of snow-style clusters in R packages?

luke-tierney Wed, 03 Jun 2020 07:11:01 -0700

The basic principle I would follow is to make sure your code only goes
parallel with explicit permission from the end user. One way to do
that is accept a cluster from the caller; another is to create
and shut down your won cluster if a global option is set (via options()
or a mechanism of your own).


If you create and shut down your own cluster you can do pretty much
what you like. If you use one passed to you it would be best to leave
it in the state you found it at least as far as the search path and
global environment are concerned. So use foo::bar instead of library().

User can also set a default cluster. You can use getDefaultCluster to
retrieve it; this returns NULL if no default cluster is set.  You
could assume that if one is set you are allowed to use it, but it
might still be a good idea to look for explicit permission via an
option or an argument. I would again try to leave the cluster used
this way in as clean a state as you can.

Best,

luke

On Sun, 24 May 2020, Ivan Krylov wrote:

Some of the packages I use make it possible to run some of the
computations in parallel. For example, sNPLS::cv_snpls calls
makeCluster() itself, makes sure that the package is loaded by workers,
exports the necessary variables and stops the cluster after it is
finished. On the other hand, multiway::parafac accepts arbitrary
cluster objects supplied by user, but requires the user to manually
preload the package on the workers. Both packages export and document
the internal functions intended to run on the workers.

Are there any guidelines for use of snow-style clusters in R packages? I
remember reading somewhere that accepting arbitrary cluster objects from
the user instead of makeCluster(detectCores()) is generally considered
a good idea (for multiple reasons ranging from giving the user more
control of CPU load to making it possible to run the code on a number
of networked machines that the package code knows nothing about), but I
couldn't find a reference for that in Writing R Extensions or parallel
package documentation.

What about preloading the package on the workers? Are there any
downsides to the package code unconditionally running clusterEvalQ(cl,
library(myself)) to avoid disappointing errors like "10 nodes produced
errors; first error: could not find function"?

Speaking of private functions intended to run by the package itself on
the worker nodes, should they be exported? I have prepared a test
package doing little more than the following:

R/fun.R:
private <- function(x) paste(x, Sys.getpid())
public <- function(cl, x) parallel::parLapply(cl, x, private)

NAMESPACE:
export(public)

The package passes R CMD check --as-cran without warnings or errors,
which seems to suggest that exporting worker functions is not required.


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] [External] Guidelines on use of snow-style clusters in R packages?

Reply via email to