Hello, We have experienced something along those lines recently too. Slurm v2.5.3,cr_core_memory and the cgroup task plugin with taskaffinity=yes, constraining cores and RAM.
We did some experimentation and without affinity, openmp is able to run on all the allotted cores. Enabling affinity, restrains openmp to a single core. Posix threads can still run across the allotted cores. Disabling affinity, does not reset and openmp still runs on one core. For the time being, the solution for us is to reboot the node and run without affinity, but with numa effects on 4-way smp nodes if not taking the whole node. Hope it helps, Carlos On May 23, 2013 8:36 AM, "Loris Bennett" <[email protected]> wrote: > > "Loris Bennett" <[email protected]> > writes: > > > Hi, > > > > We are using SLURM 2.2.7 and occasionally get a situation in which > > multiple tasks end up on a single core. > > > > In the following, a 15-task job is scheduled to 2 12-core nodes. On one > > node I have 9 processes on 9 core: > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND > > 20094 loris 20 0 703m 119m 14m R 100.0 0.1 943:06.48 3 desmond > > 20086 loris 20 0 874m 162m 16m R 99.7 0.2 941:42.19 11 desmond > > 20087 loris 20 0 701m 114m 14m R 99.7 0.1 943:03.65 4 desmond > > 20088 loris 20 0 701m 114m 14m R 99.7 0.1 942:25.38 7 desmond > > 20090 loris 20 0 701m 115m 14m R 99.7 0.1 942:03.41 9 desmond > > 20092 loris 20 0 703m 116m 14m R 99.7 0.1 942:25.60 8 desmond > > 20093 loris 20 0 701m 115m 14m R 99.7 0.1 942:29.85 6 desmond > > 20089 loris 20 0 703m 116m 14m R 99.5 0.1 943:00.17 2 desmond > > 20091 loris 20 0 703m 116m 14m R 99.2 0.1 942:22.47 10 desmond > > > > On the other I have 6 processes on 1 core: > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND > > 60216 xxxxx 20 0 16.4g 15g 10m R 598.1 16.6 11094:58 5 > molpro.exe > > 39963 loris 20 0 703m 119m 14m R 16.6 0.1 157:46.38 6 desmond > > 39964 loris 20 0 701m 114m 14m R 16.6 0.1 157:46.38 6 desmond > > 39965 loris 20 0 703m 117m 14m R 16.6 0.1 157:46.39 6 desmond > > 39961 loris 20 0 704m 119m 14m R 16.3 0.1 157:46.38 6 desmond > > 39962 loris 20 0 702m 115m 14m R 16.3 0.1 157:46.38 6 desmond > > 39966 loris 20 0 702m 115m 14m R 16.3 0.1 157:46.38 6 desmond > > > > On this second node there is also an OpenMP job running 6 threads on 6 > > cores, so there are still 6 cores available for my job. However, only > > one is used. > > > > Is this a known problem? If so, has it been fixed in a more recent > > version of SLURM? > > > > Cheers, > > > > Loris > > > > -- > > Dr. Loris Bennett (Mr.) > > ZEDAT, Freie Universität Berlin Email [email protected] > > No thoughts on this? We are using the task/affinity plugin. > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email [email protected] >
