As far as I've experienced, those error messages are harmless. We updated from 14.03.6 to 14.03.10 and one motivation was those errors are no longer in user job output. It may have been 14.03.9 that the solved that issue of outputting the error to user output. I still see it in my slurmd logs, but appears to not be causing problems. This is on 14.03.10 with CentOS 6.5.
- Trey On Nov 18, 2014 3:38 AM, "Bjørn-Helge Mevik" <[email protected]> wrote: > > We are running slurm 14.03.7, and using cgroups to limit memory usage. > > We see this type of message in slurmd.log after each job: > > [2014-11-18T03:32:15.316] [8990773] _slurm_cgroup_destroy: problem > deleting step cgroup path > /dev/cgroup/freezer/slurm/uid_126634/job_8990773/step_batch: Device or > resource busy > > Recently, one of our users reported seeing the following message in the > slurm-NNN.out file: > > slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path > /dev/cgroup/freezer/slurm/uid_168662/job_8985631/step_batch: Device or > resource busy > > When checking afterwards, it seems the .../uid_NNN directories are > removed. > > Does anyone know what these messages mean? Should we just ignore them? > > -- > Regards, > Bjørn-Helge Mevik, dr. scient, > Department for Research Computing, University of Oslo
