Hi, We had some users complaining for the same behaviour but for resident memory. What we did was modify the accounting plugin to consider the proportional set size (PSS) instead of RSS. This way the shared memory is accounted only one, but proportionally for each process, so if 2 processed share a 4MB segment each is accounted 2MB. Maybe a similar approach can be used in this case.
Regards, Carles Fenoy On Fri, Sep 19, 2014 at 6:01 PM, Riccardo Murri <[email protected]> wrote: > > Hello, > > we are having an issue with SLURM killing jobs because of virtual > memory limits:: > > slurmstepd[46530]: error: Job 784 exceeded virtual memory limit > (416329820 > 211812352), being killed > > The problem is that the job above has actually negligible heap use, > *but* it allocates a SysV shared memory segment of about 100GB. It > seems that the size of this shared memory segment is counted towards > *all* 4 processes in the job, instead of being counted just once. > > Is this expected, or did we misconfigure something? > > We are running 14.03.2. Possibly relevant configuration items:: > > # slurm.conf > JobAcctGatherType=jobacct_gather/linux > JobCompType=jobcomp/none > MpiDefault=none > ProctrackType=proctrack/pgid > PropagateResourceLimitsExcept=CPU > SelectType=select/cons_res > SelectTypeParameters=CR_Core_Memory > TaskPlugin=task/cgroup > VSizeFactor=101 > > # cgroup.conf > ConstrainCores=yes > > Thanks for any suggestion! > > Kind regards, > Riccardo > > -- > Riccardo Murri > http://www.s3it.uzh.ch/about/team/ > > S3IT: Services and Support for Science IT > University of Zurich > Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) > Tel: +41 44 635 4222 > Fax: +41 44 635 6888 -- -- Carles Fenoy
