On 9/12/17 4:54 am, Mike Cammilleri wrote:
I thought cgroups (which we are using) would prevent some of this behavior on the nodes (we are constraining CPU and RAM) -I'd like there to be no I/O wait times if possible. I would like it if either linux or slurm could constrain a job from grabbing more cores than assigned at submit time. Is there something else I should be configuring to safeguard against this behavior? If SLURM assigns 1 cpu to the task then no matter what craziness is in the code, 1 is all they're getting. Possible?
That is exactly what cgroups does, a process within a cgroup that only has a single core available to it will only be able to use that one core. If it fires up (for example) 8 threads or processes then they will all run, but they will all be contending for that single core. You can check the cgroup for a process with: cat /proc/$PID/cgroup From that you should be able to find the cgroup in the cpuset controller and see how many cores are available to it. You mention I/O wait times, that's going to be separate to the number of cores available to a code, could you elaborate a little on what you are seeing there? There is some support for this in current kernels, but I don't know when that landed and whether that will be in the kernel available to you. Also I don't remember seeing mention for support for that in Slurm. https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt Best of luck, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
