Matthew BETTINGER writes:
> Just curious if this option or oom setting (which we use) can leave
> the nodes in CG "completing" state.
I don't think so. As far as I know, jobs go into completing state when
Slurm is cancelling them or when they exit on their own, and stays in
that state until
Just curious if this option or oom setting (which we use) can leave the nodes
in CG "completing" state. We have CG states quite often and only way is to
reboot the node. I believe it occurs when parent process dies or gets killed
or Z? Thanks.
MB
On 10/8/19, 6:11 AM, "slurm-users on
- Mail original -
> Maybe I missed something else...
That's right. Thank to Bjørn-Helge who help me.
You must enable swapaccount in the kernel as shown here:
https://unix.stackexchange.com/questions/531480/what-does-swapaccount-1-in-grub-cmdline-linux-default-do
By default, this is
Marcus Boden writes:
> you're looking for KillOnBadExit in the slurm.conf:
> KillOnBadExit
[...]
> this should terminate the job if a step or a process gets oom-killed.
That is a good tip!
But as I read the documentation (I haven't tested it), it will only kill
the job step itself, it will
Juergen Salk writes:
> that is interesting. We have a very similar setup as well. However, in
> our Slurm test cluster I have noticed that it is not the *job* that
> gets killed. Instead, the OOM killer terminates one (or more)
> *processes*
Yes, that is how the kernel OOM killer works.
This
Hello, thanks for you answers,
> - Does it work if you remove the space in "TaskPlugin=task/affinity,
> task/cgroup"? (Slurm can be quite picky when reading slurm.conf).
It was the case, I make a mistake when I copy/cut... So, I haven't space here.
>
> - See in slurmd.log on the node(s) of
> On 19-10-08 10:36, Juergen Salk wrote:
> > * Bjørn-Helge Mevik [191008 08:34]:
> > > Jean-mathieu CHANTREIN writes:
> > >
> > > > I tried using, in slurm.conf
> > > > TaskPlugin=task/affinity, task/cgroup
> > > > SelectTypeParameters=CR_CPU_Memory
> > > > MemLimitEnforce=yes
> > > >
> > >
Hi Jürgen,
you're looking for KillOnBadExit in the slurm.conf:
KillOnBadExit
If set to 1, a step will be terminated immediately if any task is crashed
or aborted, as indicated by a non-zero exit code. With the default value of 0,
if one of the processes is crashed or aborted the other
* Bjørn-Helge Mevik [191008 08:34]:
> Jean-mathieu CHANTREIN writes:
>
> > I tried using, in slurm.conf
> > TaskPlugin=task/affinity, task/cgroup
> > SelectTypeParameters=CR_CPU_Memory
> > MemLimitEnforce=yes
> >
> > and in cgroup.conf:
> > CgroupAutomount=yes
> > ConstrainCores=yes
> >
Jean-mathieu CHANTREIN writes:
> I tried using, in slurm.conf
> TaskPlugin=task/affinity, task/cgroup
> SelectTypeParameters=CR_CPU_Memory
> MemLimitEnforce=yes
>
> and in cgroup.conf:
> CgroupAutomount=yes
> ConstrainCores=yes
> ConstrainRAMSpace=yes
> ConstrainSwapSpace=yes
>
Our cgroup settings are quite a bit different, and we don’t allow jobs to swap,
but the following works to limit memory here (I know, because I get emails
frequent emails from users who don’t change their jobs from the default 2 GB
per CPU that we use):
CgroupMountpoint="/sys/fs/cgroup"
Hello,
I tried using, in slurm.conf
TaskPlugin=task/affinity, task/cgroup
SelectTypeParameters=CR_CPU_Memory
MemLimitEnforce=yes
and in cgroup.conf:
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
MaxSwapPercent=10
TaskAffinity=no
But when the job
12 matches
Mail list logo