Hi, I’m curious which kernel you are running on your el6 clusters that have cgroups enabled in slurm. I have an issue where some workloads cause 100’s-1000’s of flocks to occur relating to the memory cleanup portion in the cgroup. On the schedmd slurm site, I see the mention of this:
* There can be a serious performance problem with memory cgroups on conventional multi-socket, multi-core nodes in kernels prior to 2.6.38 due to contention between processors for a spinlock. This problem seems to have been completely fixed in the 2.6.38 kernel. Anyone know what the kernel bug # was so I can find the kernel where this is fixed? I’m thinking this is what I’m seeing, can anyone confirm? I have kernel 2.6.32-504.3.3.el6 , and slurm version: 15.08.4. I’d like to see who has seen this issue and what they did to resolve it. Upgrade to newer kernel? If so which? Is there a fix in the el6 2.6.32 series? Thanks! Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167