Felip Moll <lip...@gmail.com> writes: > On one hand I understand that cgroups computes a process memory as the sum > of RSS+Cached+Semaphores+Shared segments+Swap,
One small correction: It does not count/limit swap unless you specify ConstrainSwapSpace=yes in cgroup.conf. Also, cached data is not going to get the process killed, because the kernel will free the cache when the total usage hits the limit (it might affect performance, of course). By default, I believe slurm will count shared as part of RSS (this can be turned off with JobAcctGatherParams=NoShared), and if your job shares the same data between several processes, the shared space will be counted once for each process(!). Cgroups seems to count the shared data only once. So if a process is killed by oom instead of by slurm, it is probably not due to shared data. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo