I've got allocations on a couple of shared memory supercomputers, which I use to run computationally-intensive scripts on multiple cores of the same node. I've got 24 cores on the one, and 48 on the other.

In both cases, there is a hard memory limit, which is shared among the cores in the node. In the latter, the limit is 255G. If my job requests more than that, the job gets aborted.

Now, I don't fully understand resource allocation in these sorts of systems. But I do get that the sort of "thread parallelism" done by e.g. the `parallel` package in R isn't identical to the sort of parallelism commonly done in lower-level languages. For example, when I request a node, I only ask for one of its cores. My R script then detects the number of cores on the node, and farms out tasks to the cores via the `foreach` package. My understanding is that lower-level languages need the number of cores to be specified in the shell script, and a particular job script is given directly to each worker.

My problem is that my parallel-calling R script is crashing the cluster, which terminates my script because the sum of the memory being requested by each thread is greater than what I'm allocated. I don't get this problem when running on my laptop's 4 cores, presumably because my laptop has a higher ratio of memory/core.

My question: how can I ensure that the total memory being requested by N workers remains below a certain threshold? Is this even possible? If not, is it possible to benchmark a process locally, collecting the maximum per-worker memory requested, and use this to back out the number of workers that I can request for a given node's memory limit?

Thanks in advance!

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to