This is an interesting information indeed! 

However, I might add our experience to this. We have been using:
ProctrackType=proctrack/linuxproc
TaskPlugin=task/none
on a simple SLURM cluster of desktop machines, which also have (slow, 
HDD-based) swap partitions. 

>From my experience, it seems that "linuxproc" actually enforces the memory 
>limit by "polling" procfs regularly, 
and killing the job if the limit is exceeded (as I would also expect). 
This leads to a problem if a user submits a job which allocates a huge amount 
of memory exceeding the foreseen memory limit quickly, i.e. in between the 
"polling" intervals. 

In our case, this led to heavy swapping of the desktop machines slowing them 
down to crawl before slurm could kill those jobs. Of course, this is even worse 
if a user submits a full job array showing such nasty behaviour. 
So I would still consider the cgroup enforcement much more safe from the 
cluster operator point of view, at least if you have users developing custom 
code (and them not giving it a good testing beforehand). 

Cheers, 
        Oliver

Am 22.01.2016 um 11:33 schrieb Felip Moll:
> Finally I solved the issue in big part thanks to Carlos Fenoy tips.
> 
> The issue was due to NFS filesystem. This filesystem, as CF said, caches data 
> while other file systems does not. Cgroups takes into account cached data and 
> our user jobs use NFS filesystem intensivelly.
> 
> I switched from:
> ProctrackType=proctrack/cgroup
> TaskPlugin=task/cgroup
> TaskPluginParam=
> 
> To:
> ProctrackType=proctrack/linuxproc
> TaskPlugin=task/affinity
> TaskPluginParam=Sched
> 
> 
> And in the following 11 days I didn't receive a single oom kill and 
> everythink is working perfectly.
> 
> Best regards and thanks to all of you.
> Felip M
> 
> 
> *
> --
> Felip Moll Marquès*
> Computer Science Engineer
> E-Mail - lip...@gmail.com <mailto:lip...@gmail.com>
> WebPage - http://lipix.ciutadella.es
> 
> 2015-12-18 15:09 GMT+01:00 Bjørn-Helge Mevik <b.h.me...@usit.uio.no 
> <mailto:b.h.me...@usit.uio.no>>:
> 
> 
>     Carlos Fenoy <mini...@gmail.com <mailto:mini...@gmail.com>> writes:
> 
>     > Barbara, I don't think that is the issue here. The killer is the OOM not
>     > Slurm, so Slurm is not accounting incorrectly the amount of memory, but 
> it
>     > seems that the cached memory is also accounted in the cgroup and it is 
> what
>     > is causing the OOM to kill gzip.
> 
>     I've seen cases where the job has copied a set of large files, which
>     makes the cgroup memory usage go right up to the limit.  I guess that is
>     cached data.  Then the job starts computing, without the job getting
>     killed.  My interpretatin is that the kernel will flush the cache when a
>     process needs more memory instead of killing the process.  If I'm
>     correct, oom will _not_ kill a job due to cached data.
> 
>     --
>     Regards,
>     Bjørn-Helge Mevik, dr. scient,
>     Department for Research Computing, University of Oslo
> 
> 

Reply via email to