Re: [Rd] [External] setting .libPaths() with parallel::clusterCall

2020-12-23 Thread Mark van der Loo
Dear Luke,

Thank you, this makes perfect sense.

I find it quite hard to express this issue in a way that is both compact
and understandable.
In any case, below you find a proposal for an update of the documentation.

Thank you again for all your work,
Mark



Index: src/library/parallel/man/clusterApply.Rd
===
--- src/library/parallel/man/clusterApply.Rd (revision 79673)
+++ src/library/parallel/man/clusterApply.Rd (working copy)
@@ -136,6 +136,15 @@
   more efficient than \code{parApply} but do less post-processing of the
   result.

+  Functions with a \code{fun} or \code{FUN} parameter send a serialized
+  copy of the argument from the main process to each worker node.
+  When the argument passed to \code{fun} or \code{FUN} is a function
+  this is equivalent to calling the same function on the worker node,
+  except when the function has an enclosing environment it modifies.
+  A notable example is \code{\link{.libPaths}}. To ensure that the
+  function local to each worker is called so it modifies its local
+  enclosing environment, pass the name of the function as a string.
+
   A chunk size of \code{0} with static scheduling uses the default (one
   chunk per node).  With dynamic scheduling, chunk size of \code{0} has the
   same effect as \code{1} (one invocation of \code{FUN}/\code{fun} per










On Tue, Dec 22, 2020 at 2:37 PM  wrote:

> On Tue, 22 Dec 2020, Mark van der Loo wrote:
>
> > Dear all,
> >
> > It is not possible to set library paths on worker nodes with
> > parallel::clusterCall (or snow::clusterCall) and I wonder if this is
> > intended behavior.
> >
> > Example.
> >
> > library(parallel)
> > libdir <- "./tmplib"
> > if (!dir.exists(libdir)) dir.create("./tmplib")
> >
> > cl <- makeCluster(2)
> > clusterCall(cl, .libPaths, c(libdir, .libPaths()) )
> >
> > The output is as expected with the extra libdir returned for each worker
> > node. However, running
> >
> > clusterEvalQ(cl, .libPaths())
> >
> > Shows that the library paths have not been set.
>
> Use this:
>
>  clusterCall(cl, ".libPaths", c(libdir, .libPaths()) )
>
> This will find the function .libPaths on the workers.
>
> Your clusterCall sends across a serialized copy of your process'
> .libPaths and calls that. Usually that is equivalent to calling the
> function found by the name you used on the workers, but not when the
> function has an enclosing environment that the function modifies by
> assignment.
>
> Alternate implementations of .libPaths that are more
> serialization-friendly are possible in principle but probably not
> practical given limitations of the base package.
>
> The distinction between providing a function value or a character
> string as the function argument to clusterCall and others could
> probably use a paragraph in the help file; happy to consider a patch
> if anyone wants to take a crack at it.
>
> Best,
>
> luke
>
> >
> > If this is indeed a bug, I'm happy to file it at bugzilla. Tested on R
> > 4.0.3 and r-devel.
> >
> > Best,
> > Mark
> > ps: a workaround is documented here:
> >
> https://www.markvanderloo.eu/yaRb/2020/12/17/how-to-set-library-path-on-a-parallel-r-cluster/
> >
> >
> >> sessionInfo()
> > R Under development (unstable) (2020-12-21 r79668)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 20.04.1 LTS
> >
> > Matrix products: default
> > BLAS:   /home/mark/projects/Rdev/R-devel/lib/libRblas.so
> > LAPACK: /home/mark/projects/Rdev/R-devel/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> > [3] LC_TIME=nl_NL.UTF-8LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=nl_NL.UTF-8LC_MESSAGES=en_US.UTF-8
> > [7] LC_PAPER=nl_NL.UTF-8   LC_NAME=C
> > [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel  stats graphics  grDevices utils datasets  methods
> > [8] base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_4.1.0
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] setting .libPaths() with parallel::clusterCall

2020-12-22 Thread luke-tierney

On Tue, 22 Dec 2020, Mark van der Loo wrote:


Dear all,

It is not possible to set library paths on worker nodes with
parallel::clusterCall (or snow::clusterCall) and I wonder if this is
intended behavior.

Example.

library(parallel)
libdir <- "./tmplib"
if (!dir.exists(libdir)) dir.create("./tmplib")

cl <- makeCluster(2)
clusterCall(cl, .libPaths, c(libdir, .libPaths()) )

The output is as expected with the extra libdir returned for each worker
node. However, running

clusterEvalQ(cl, .libPaths())

Shows that the library paths have not been set.


Use this:

clusterCall(cl, ".libPaths", c(libdir, .libPaths()) )

This will find the function .libPaths on the workers.

Your clusterCall sends across a serialized copy of your process'
.libPaths and calls that. Usually that is equivalent to calling the
function found by the name you used on the workers, but not when the
function has an enclosing environment that the function modifies by
assignment.

Alternate implementations of .libPaths that are more
serialization-friendly are possible in principle but probably not
practical given limitations of the base package.

The distinction between providing a function value or a character
string as the function argument to clusterCall and others could
probably use a paragraph in the help file; happy to consider a patch
if anyone wants to take a crack at it.

Best,

luke



If this is indeed a bug, I'm happy to file it at bugzilla. Tested on R
4.0.3 and r-devel.

Best,
Mark
ps: a workaround is documented here:
https://www.markvanderloo.eu/yaRb/2020/12/17/how-to-set-library-path-on-a-parallel-r-cluster/



sessionInfo()

R Under development (unstable) (2020-12-21 r79668)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/mark/projects/Rdev/R-devel/lib/libRblas.so
LAPACK: /home/mark/projects/Rdev/R-devel/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=nl_NL.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=nl_NL.UTF-8LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=nl_NL.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

loaded via a namespace (and not attached):
[1] compiler_4.1.0

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel