On 24/11/2016 07:30, Christian Krause wrote:
Dear all,

I’m working as an administrator of a High-Performance Computing (HPC) Cluster 
which runs on Linux. A lot of people are using R on this Linux cluster and, of 
course, the *parallel* package to speed up their computations.

It has been our collective experience, that using |makeForkCluster| yields an 
overall better experience /on Linux/ than the |makePSOCKcluster|, for whatever 
definition of better. Let me just summarize that it works smoother. I believe, 
other people working with *parallel* on Linux can share this experience

Usually, but not always. And the differences are mainly in initialization time, so small once workers are given a reasonable amount of work (tens of seconds each). However, as forked workers have a copy of the whole master process, forking workers can lead to excessive memory usage.

Also, we did really welcome the environment variable |MC_CORES|, to be able to 
specify (in job submit scripts) the amount of CPU cores a user has been 
granted, most importantly for /dynamic resource requests/ (see below for an 
example).

Hmm, MC_CORES is primarily for mclapply() and friends, not makeCluster(). makeForkCluster() is a 'friend' so uses it, but makePSOCKcluster() was designed for distributing across a cluster of machines (whereas makeForkCluster is restricted to a single multicore machine).

What we would also appreciate - and now we finally get to the feature request - 
is another environment variable to choose the used cluster, as in:

|export MC_CLUSTER_TYPE=FORK |

Do you think something like this could be implemented in future releases?

No.  (Not least as 'MC_' refers to the former 'multicore' package.)

PSOCK and Fork clusters are not interchangeable, and the author of the code has to check if Fork can be substituted for PSOCK (which starts with a clean R environment, and that may well be assumed).

So rather, you need to ask your users to implement this in their calls to parallel::makeCluster.





      Parallel R job submit script

This works with the Univa Grid Engine and should work with other * Grid Engine 
products:

|#!/bin/bash # request a "parallel environment" with 2 to 20 cores #$ -pe smp 
2-20 # set number of cores for the R cluster to the granted value (between 2 and 20) 
export MC_CORES=$NSLOTS # we want this: export MC_CLUSTER_TYPE=FORK Rscript 
/path/to/script.R |

Best Regards

​



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to