Hi, Something doesn't seem to be working right with MaxMemPerCPU and --mem-per-cpu increasing the CPU limits when MaxMemPerCPU is exceeded.
For reference the man page says this: Note that if the job's --mem-per-cpu value exceeds the configured MaxMemPerCPU, then the user's limit will be treated as a memory limit per task; --mem-per-cpu will be reduced to a value no larger than MaxMemPerCPU; --cpus-per-task will be set and value of --cpus-per-task multiplied by the new --mem-per-cpu value will equal the original --mem-per-cpu value specified by the user. I can't get that to happen with --mem-per-cpu, but that does happen when I use --mem. I have a partition named mic with DefMemPerCPU=2000 and MaxMemPerCPU=200 set. I get this with --mem-per-cpu=2100 $ srun -p mic --mem-per-cpu=2100 ls srun: error: Unable to allocate resources: Memory required by task is not available But this works and increases the number of cpus: $ srun -p mic --mem=2100 ls With --mem it outputs this to the debug: [2014-05-30T14:35:23.001] debug: Setting job's pn_min_cpus to 2 due to memory limit JobId=8792976 Name=ls UserId=wettstein(891783663) GroupId=wettstein(891783663) Priority=111812 Account=rcc-staff QOS=mic JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0 RunTime=00:00:03 TimeLimit=1-12:00:00 TimeMin=N/A SubmitTime=2014-05-30T14:35:23 EligibleTime=2014-05-30T14:35:23 StartTime=2014-05-30T14:35:23 EndTime=2014-05-30T14:35:26 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=mic AllocNode:Sid=midway-login2:27313 ReqNodeList=(null) ExcNodeList=(null) NodeList=midway-mic01 BatchHost=midway-mic01 NumNodes=1 NumCPUs=2 CPUs/Task=1 ReqS:C:T=*:*:* MinCPUsNode=2 MinMemoryNode=2100M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/bin/ls WorkDir=/software/src/slurm I guess either the documentation is incorrect and this should be described with the --mem option or there is a bug in the logic. I basically want to use this to make users get charged for the whole node instead of just requesting 1 cpu and all of the memory on the node. Andy -- andy wettstein hpc system administrator research computing center university of chicago 773.702.1104
