Some of the packages I use make it possible to run some of the computations in parallel. For example, sNPLS::cv_snpls calls makeCluster() itself, makes sure that the package is loaded by workers, exports the necessary variables and stops the cluster after it is finished. On the other hand, multiway::parafac accepts arbitrary cluster objects supplied by user, but requires the user to manually preload the package on the workers. Both packages export and document the internal functions intended to run on the workers.
Are there any guidelines for use of snow-style clusters in R packages? I remember reading somewhere that accepting arbitrary cluster objects from the user instead of makeCluster(detectCores()) is generally considered a good idea (for multiple reasons ranging from giving the user more control of CPU load to making it possible to run the code on a number of networked machines that the package code knows nothing about), but I couldn't find a reference for that in Writing R Extensions or parallel package documentation. What about preloading the package on the workers? Are there any downsides to the package code unconditionally running clusterEvalQ(cl, library(myself)) to avoid disappointing errors like "10 nodes produced errors; first error: could not find function"? Speaking of private functions intended to run by the package itself on the worker nodes, should they be exported? I have prepared a test package doing little more than the following: R/fun.R: private <- function(x) paste(x, Sys.getpid()) public <- function(cl, x) parallel::parLapply(cl, x, private) NAMESPACE: export(public) The package passes R CMD check --as-cran without warnings or errors, which seems to suggest that exporting worker functions is not required. -- Best regards, Ivan ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel