Hello,

We have experienced something along those lines recently too. Slurm
v2.5.3,cr_core_memory and the cgroup task plugin with taskaffinity=yes,
constraining cores and RAM.

We did some experimentation and without affinity, openmp is able to run on
all the allotted cores. Enabling affinity, restrains openmp to a single
core. Posix threads can still run across the allotted cores.
Disabling affinity, does not reset and openmp still runs on one core.

For the time being, the solution for us is to reboot the node and run
without affinity, but with numa effects on 4-way smp nodes if not taking
the whole node.

Hope it helps,
Carlos
On May 23, 2013 8:36 AM, "Loris Bennett" <[email protected]> wrote:

>
> "Loris Bennett" <[email protected]>
> writes:
>
> > Hi,
> >
> > We are using SLURM 2.2.7 and occasionally get a situation in which
> > multiple tasks end up on a single core.
> >
> > In the following, a 15-task job is scheduled to 2 12-core nodes.  On one
> > node I have 9 processes on 9 core:
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+   P COMMAND
> > 20094 loris     20   0  703m 119m  14m R 100.0  0.1 943:06.48  3 desmond
> > 20086 loris     20   0  874m 162m  16m R 99.7  0.2 941:42.19 11 desmond
> > 20087 loris     20   0  701m 114m  14m R 99.7  0.1 943:03.65  4 desmond
> > 20088 loris     20   0  701m 114m  14m R 99.7  0.1 942:25.38  7 desmond
> > 20090 loris     20   0  701m 115m  14m R 99.7  0.1 942:03.41  9 desmond
> > 20092 loris     20   0  703m 116m  14m R 99.7  0.1 942:25.60  8 desmond
> > 20093 loris     20   0  701m 115m  14m R 99.7  0.1 942:29.85  6 desmond
> > 20089 loris     20   0  703m 116m  14m R 99.5  0.1 943:00.17  2 desmond
> > 20091 loris     20   0  703m 116m  14m R 99.2  0.1 942:22.47 10 desmond
> >
> > On the other I have 6 processes on 1 core:
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+   P COMMAND
> > 60216 xxxxx     20   0 16.4g  15g  10m R 598.1 16.6  11094:58  5
> molpro.exe
> > 39963 loris     20   0  703m 119m  14m R 16.6  0.1 157:46.38  6 desmond
> > 39964 loris     20   0  701m 114m  14m R 16.6  0.1 157:46.38  6 desmond
> > 39965 loris     20   0  703m 117m  14m R 16.6  0.1 157:46.39  6 desmond
> > 39961 loris     20   0  704m 119m  14m R 16.3  0.1 157:46.38  6 desmond
> > 39962 loris     20   0  702m 115m  14m R 16.3  0.1 157:46.38  6 desmond
> > 39966 loris     20   0  702m 115m  14m R 16.3  0.1 157:46.38  6 desmond
> >
> > On this second node there is also an OpenMP job running 6 threads on 6
> > cores, so there are still 6 cores available for my job.  However, only
> > one is used.
> >
> > Is this a known problem?  If so, has it been fixed in a more recent
> > version of SLURM?
> >
> > Cheers,
> >
> > Loris
> >
> > --
> > Dr. Loris Bennett (Mr.)
> > ZEDAT, Freie Universität Berlin         Email [email protected]
>
> No thoughts on this?  We are using the task/affinity plugin.
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin         Email [email protected]
>

Reply via email to