Hi,

I’m curious which kernel you are running on your el6 clusters that have cgroups 
enabled in slurm.  I have an issue where some workloads cause 100’s-1000’s of 
flocks to occur relating to the memory cleanup portion in the cgroup.  On the 
schedmd slurm site, I see the mention of this:

* There can be a serious performance problem with memory cgroups on 
conventional multi-socket, multi-core nodes in kernels prior to 2.6.38 due to 
contention between processors for a spinlock. This problem seems to have been 
completely fixed in the 2.6.38 kernel.

Anyone know what the kernel bug # was so I can find the kernel where this is 
fixed?

I’m thinking this is what I’m seeing, can anyone confirm?  I have kernel 
2.6.32-504.3.3.el6 , and slurm version: 15.08.4. 


I’d like to see who has seen this issue and what they did to resolve it.  
Upgrade to newer kernel?  If so which? Is there a fix in the el6 2.6.32 series? 
Thanks!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167




Reply via email to