Hi,
it used to logged at debug level in 2.6 and now it is an error. This seems to be an issue with cgroups which does not allow that path to be deleted from slurmstepd even if no processes are accessing it.
The release agent removes it later.

Perhaps we can put it at debug level as before as it may concern users.

On 11/06/2014 04:37 PM, Christopher Samuel wrote:

Hi folks,

We're just about to let users back onto our systems after RHEL 6.6
upgrades and moving from Slurm 2.6.x to 14.03.10.

However, running NAMD with Open-MPI 1.6.x and mpirun leads to this
error at the end of the output (which appears totally cosmetic).

[...]
The last velocity output (seq=-2) takes 0.029 seconds, 980.234 MB of memory in 
use
====================================================

WallClock: 117.003998  CPUTime: 117.003998  Memory: 980.234375 MB
End of program
slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path 
/cgroup/freezer/slurm/uid_500/job_2497190/step_batch: Device or resource busy


Now I've checked the cgroup release agent config and it's all set
up correctly looking at:

http://slurm.schedmd.com/cgroups.html#cleanup

Anyone got any ideas?

PS: No I can't use srun directly as we get poor scaling, the next
thing in the list (after SC14) is to migrate to Open-MPI 1.8.4 which
is due out shortly which should address this.

cheers,
Chris


--

Thanks,
      /David/Bigagli

www.schedmd.com

Reply via email to