Your getting that with 14.03.10? I am testing 14.03.10 and one of the bugs that was fixed (currently on 14.03.06 in production) was that "Device or resource busy" line. I still get it in the slurmd logs but the user output no longer has that in the stdout or stderr files as a result of upgrading to 14.03.10.
- Trey ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: [email protected] Jabber: [email protected] On Thu, Nov 6, 2014 at 6:38 PM, Christopher Samuel <[email protected]> wrote: > > Hi folks, > > We're just about to let users back onto our systems after RHEL 6.6 > upgrades and moving from Slurm 2.6.x to 14.03.10. > > However, running NAMD with Open-MPI 1.6.x and mpirun leads to this > error at the end of the output (which appears totally cosmetic). > > [...] > The last velocity output (seq=-2) takes 0.029 seconds, 980.234 MB of > memory in use > ==================================================== > > WallClock: 117.003998 CPUTime: 117.003998 Memory: 980.234375 MB > End of program > slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path > /cgroup/freezer/slurm/uid_500/job_2497190/step_batch: Device or resource > busy > > > Now I've checked the cgroup release agent config and it's all set > up correctly looking at: > > http://slurm.schedmd.com/cgroups.html#cleanup > > Anyone got any ideas? > > PS: No I can't use srun directly as we get poor scaling, the next > thing in the list (after SC14) is to migrate to Open-MPI 1.8.4 which > is due out shortly which should address this. > > cheers, > Chris > -- > Christopher Samuel Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: [email protected] Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci >
