[R] Thread parallelism and memory management on shared-memory supercomputers

Andrew Crane-Droesch Wed, 30 Dec 2015 09:38:35 -0800

I've got allocations on a couple of shared memory supercomputers, whichI use to run computationally-intensive scripts on multiple cores of thesame node. I've got 24 cores on the one, and 48 on the other.

In both cases, there is a hard memory limit, which is shared among thecores in the node. In the latter, the limit is 255G. If my job requestsmore than that, the job gets aborted.

Now, I don't fully understand resource allocation in these sorts ofsystems. But I do get that the sort of "thread parallelism" done bye.g. the `parallel` package in R isn't identical to the sort ofparallelism commonly done in lower-level languages. For example, when Irequest a node, I only ask for one of its cores. My R script thendetects the number of cores on the node, and farms out tasks to thecores via the `foreach` package. My understanding is that lower-levellanguages need the number of cores to be specified in the shell script,and a particular job script is given directly to each worker.

My problem is that my parallel-calling R script is crashing the cluster,which terminates my script because the sum of the memory being requestedby each thread is greater than what I'm allocated. I don't get thisproblem when running on my laptop's 4 cores, presumably because mylaptop has a higher ratio of memory/core.

My question: how can I ensure that the total memory being requested byN workers remains below a certain threshold? Is this even possible? Ifnot, is it possible to benchmark a process locally, collecting themaximum per-worker memory requested, and use this to back out the numberof workers that I can request for a given node's memory limit?


Thanks in advance!

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Thread parallelism and memory management on shared-memory supercomputers

Reply via email to