Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-23 Thread Shawn Bobbin
e.conf TaskPluginProctrackTypeJobAcctGatherType -Kevin PS Looking for similar style jobs, We have >1 day gpu users inside of cgroup, but not multi-tenant currently. 17.11.5, CentOS6 From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Shawn Bobbin <sabob...@umiacs.umd.edu>Reply-To: Slurm User

[slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-12 Thread Shawn Bobbin
Hi, We’re running slurm 17.11.5 on RHEL 7 and have been having issues with jobs escaping there cgroup controls on GPU devices. For example we have the following steps running: # ps auxn | grep [s]lurmstepd 0 2380 0.0 0.0 538436 3700 ?Sl 07:22 0:02 slurmstepd: [46609.0]