Felip Moll <lip...@gmail.com> writes:

> On one hand I understand that cgroups computes a process memory as the sum
> of RSS+Cached+Semaphores+Shared segments+Swap,

One small correction: It does not count/limit swap unless you specify

ConstrainSwapSpace=yes

in cgroup.conf.  Also, cached data is not going to get the process
killed, because the kernel will free the cache when the total usage hits
the limit (it might affect performance, of course).

By default, I believe slurm will count shared as part of RSS (this can
be turned off with JobAcctGatherParams=NoShared), and if your job shares
the same data between several processes, the shared space will be
counted once for each process(!).  Cgroups seems to count the shared
data only once.  So if a process is killed by oom instead of by slurm,
it is probably not due to shared data.

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

Reply via email to