Hello,

we are having an issue with SLURM killing jobs because of virtual
memory limits::

    slurmstepd[46530]: error: Job 784 exceeded virtual memory limit
(416329820 > 211812352), being killed

The problem is that the job above has actually negligible heap use,
*but* it allocates a SysV shared memory segment of about 100GB.  It
seems that the size of this shared memory segment is counted towards
*all* 4 processes in the job, instead of being counted just once.

Is this expected, or did we misconfigure something?

We are running 14.03.2. Possibly relevant configuration items::

    # slurm.conf
    JobAcctGatherType=jobacct_gather/linux
    JobCompType=jobcomp/none
    MpiDefault=none
    ProctrackType=proctrack/pgid
    PropagateResourceLimitsExcept=CPU
    SelectType=select/cons_res
    SelectTypeParameters=CR_Core_Memory
    TaskPlugin=task/cgroup
    VSizeFactor=101

    # cgroup.conf
    ConstrainCores=yes

Thanks for any suggestion!

Kind regards,
Riccardo

--
Riccardo Murri
http://www.s3it.uzh.ch/about/team/

S3IT: Services and Support for Science IT
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4222
Fax: +41 44 635 6888

Reply via email to