You sure it worked as you expected? I always think of CPUs & RAM is independent things that need to be manually requested independently.
On Tue, Feb 14, 2017 at 10:07 AM, Julien Collas <jul.col...@gmail.com> wrote: > Hello, > > I made some tests on a simple environment and it seems that this > functionality is working fine until 15.08.13 (included). > With versions 16.05.6, 16.05.9, and 17.02.0-0rc1 I'm not able to see what I > would expect to see. > > # scontrol show part > PartitionName=short > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=YES QoS=N/A > DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > Hidden=NO > MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO > MaxCPUsPerNode=UNLIMITED > Nodes=dhcpvm4-191 > PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO > OverSubscribe=NO > OverTimeLimit=NONE PreemptMode=OFF > State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=NONE > DefMemPerCPU=200 MaxMemPerCPU=200 > > # srun --mem 600 sleep 5 && scontrol show job > JobId=25 JobName=sleep > UserId=root(0) GroupId=root(0) MCS_label=N/A > Priority=4294901754 Nice=0 Account=(null) QOS=(null) > JobState=COMPLETED Reason=None Dependency=(null) > Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 > RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A > SubmitTime=2017-02-14T15:38:41 EligibleTime=2017-02-14T15:38:41 > StartTime=2017-02-14T15:38:41 EndTime=2017-02-14T15:38:46 Deadline=N/A > PreemptTime=None SuspendTime=None SecsPreSuspend=0 > Partition=short AllocNode:Sid=dhcpvm4-174:5130 > ReqNodeList=(null) ExcNodeList=(null) > NodeList=dhcpvm4-191 > BatchHost=dhcpvm4-191 > NumNodes=1 NumCPUs=1 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > TRES=cpu=1,mem=600M,node=1 > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > MinCPUsNode=1 MinMemoryNode=600M MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > Gres=(null) Reservation=(null) > OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) > Command=sleep > WorkDir=/root > Power= > > It doesn't help me a lot with my problem but ... > > Best regards, > > Julien > > 2017-02-02 8:53 GMT+01:00 Julien Collas <jul.col...@gmail.com>: >> >> Hi, >> >> It seems that my MaxMemPerCpu is not working as I would have expected >> (increase cpu if mem or mem-per-cpu exceed that limit) >> >> Here is my partition definition >> >> $ scontrol show part short >> PartitionName=short >> AllowGroups=ALL DenyAccounts=data AllowQos=ALL >> AllocNodes=ALL Default=YES QoS=N/A >> DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 >> Hidden=NO >> MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1 LLN=NO >> MaxCPUsPerNode=UNLIMITED >> Nodes=srv0029[73-80,87-95,98-99] >> PriorityJobFactor=1000 PriorityTier=1000 RootOnly=NO ReqResv=NO >> OverSubscribe=NO PreemptMode=OFF >> State=UP TotalCPUs=560 TotalNodes=28 SelectTypeParameters=NONE >> DefMemPerCPU=19000 MaxMemPerCPU=19000 >> >> >> $ srun --partition=short --mem=40000 sleep 10 >> $ srun --partition=short --mem-per-cpu=40000 sleep 10 >> $ sacct >> JobID User Account JobName Priority NTasks >> AllocCPUS ReqCPUS MaxRSS MaxVMSize ReqMem State >> ------------ --------- ---------- ---------- ---------- -------- >> ---------- -------- ---------- ---------- ---------- ---------- >> 19522383 jcollas admin sleep 994 1 >> 1 1 92K 203980K 40000Mn COMPLETED >> 19522384 jcollas admin sleep 994 1 >> 1 1 92K 203980K 40000Mc COMPLETED >> >> For theses 2 jobs, I would have expected AllocCPUS to 3. >> >> $ scontrol show conf >> ... >> DefMemPerNode = UNLIMITED >> MaxMemPerNode = UNLIMITED >> MemLimitEnforce = Yes >> SelectTypeParameters = CR_CPU_MEMORY >> ... >> AccountingStorageBackupHost = (null) >> AccountingStorageEnforce = associations,limits >> AccountingStorageHost = stor089 >> AccountingStorageLoc = N/A >> AccountingStoragePort = 6819 >> AccountingStorageTRES = cpu,mem,energy,node >> AccountingStorageType = accounting_storage/slurmdbd >> AccountingStorageUser = N/A >> AccountingStoreJobComment = Yes >> AcctGatherEnergyType = acct_gather_energy/none >> AcctGatherFilesystemType = acct_gather_filesystem/none >> AcctGatherInfinibandType = acct_gather_infiniband/none >> AcctGatherNodeFreq = 0 sec >> AcctGatherProfileType = acct_gather_profile/none >> JobAcctGatherFrequency = 10 >> JobAcctGatherType = jobacct_gather/cgroup >> JobAcctGatherParams = (null) >> ... >> >> We are currently running with version 16.05.6 >> >> Is there something I am missing ? >> >> >> Regards, >> >> Julien > >