Riccardo, Your configuration is very close to ours, and this is an issue we're facing too. The VSizeFactor=101 is also what we use, and is something we may set back to "0" because of issues with how SLURM treats memory. We are on 14.03.6 and so far it seems that SLURM does not handle memory (for scheduling, preemption, etc) as well as it does for CPUs. For example, we use swap on our compute nodes to handle jobs being preempted via SUSPEND, but SLURM would only look at the node's available memory (ignoring swap). We've had to hack the code (still cleaning up for proper pull request) to add an option to the scheduling parameters for "assume_swap". This allows Preemption of SUSPEND (using partition preemption) even if a node has all its memory allocated.
Below is our config. You may try using the cgroup ProctrackType. Are you using ConstrainRAMSpace in cgroup.conf? I still plan to experiment with ConstrainSwapSpace=yes and setting our previous value for VSizeFactor to be used in MaxSwapPercent. - Trey # slurm.conf JobAcctGatherType=jobacct_gather/linux JobCompType=jobcomp/none MpiDefault=none ProctrackType=proctrack/cgroup PropagateResourceLimits=NONE SchedulerParameters=assume_swap # our local hack SchedulerTimeSlice=30 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK TaskPlugin=task/cgroup TaskPluginParam=Sched VSizeFactor=101 # cgroup.conf CgroupMountpoint=/cgroup CgroupAutomount=yes CgroupReleaseAgentDir="/home/slurm/cgroup" ConstrainCores=yes TaskAffinity=yes AllowedRAMSpace=100 AllowedSwapSpace=0 ConstrainRAMSpace=yes ConstrainSwapSpace=no MaxRAMPercent=100 MaxSwapPercent=100 MinRAMSpace=30 ConstrainDevices=no AllowedDevicesFile=/home/slurm/conf/cgroup_allowed_devices_file.conf ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: [email protected] Jabber: [email protected] ----- Original Message ----- > From: "Riccardo Murri" <[email protected]> > To: "slurm-dev" <[email protected]> > Sent: Friday, September 19, 2014 11:02:15 AM > Subject: [slurm-dev] overcounting of SysV shared memory segments? > > > Hello, > > we are having an issue with SLURM killing jobs because of virtual > memory limits:: > > slurmstepd[46530]: error: Job 784 exceeded virtual memory limit > (416329820 > 211812352), being killed > > The problem is that the job above has actually negligible heap use, > *but* it allocates a SysV shared memory segment of about 100GB. It > seems that the size of this shared memory segment is counted towards > *all* 4 processes in the job, instead of being counted just once. > > Is this expected, or did we misconfigure something? > > We are running 14.03.2. Possibly relevant configuration items:: > > # slurm.conf > JobAcctGatherType=jobacct_gather/linux > JobCompType=jobcomp/none > MpiDefault=none > ProctrackType=proctrack/pgid > PropagateResourceLimitsExcept=CPU > SelectType=select/cons_res > SelectTypeParameters=CR_Core_Memory > TaskPlugin=task/cgroup > VSizeFactor=101 > > # cgroup.conf > ConstrainCores=yes > > Thanks for any suggestion! > > Kind regards, > Riccardo > > -- > Riccardo Murri > http://www.s3it.uzh.ch/about/team/ > > S3IT: Services and Support for Science IT > University of Zurich > Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) > Tel: +41 44 635 4222 > Fax: +41 44 635 6888
