Hi,
it used to logged at debug level in 2.6 and now it is an error. This
seems to be an issue with cgroups which does not allow that path to be
deleted from slurmstepd even if no processes are accessing it.
The release agent removes it later.
Perhaps we can put it at debug level as before as it may concern users.
On 11/06/2014 04:37 PM, Christopher Samuel wrote:
Hi folks,
We're just about to let users back onto our systems after RHEL 6.6
upgrades and moving from Slurm 2.6.x to 14.03.10.
However, running NAMD with Open-MPI 1.6.x and mpirun leads to this
error at the end of the output (which appears totally cosmetic).
[...]
The last velocity output (seq=-2) takes 0.029 seconds, 980.234 MB of memory in
use
====================================================
WallClock: 117.003998 CPUTime: 117.003998 Memory: 980.234375 MB
End of program
slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path
/cgroup/freezer/slurm/uid_500/job_2497190/step_batch: Device or resource busy
Now I've checked the cgroup release agent config and it's all set
up correctly looking at:
http://slurm.schedmd.com/cgroups.html#cleanup
Anyone got any ideas?
PS: No I can't use srun directly as we get poor scaling, the next
thing in the list (after SC14) is to migrate to Open-MPI 1.8.4 which
is due out shortly which should address this.
cheers,
Chris
--
Thanks,
/David/Bigagli
www.schedmd.com