You sure it worked as you expected? I always think of CPUs & RAM is
independent things that need to be manually requested independently.

On Tue, Feb 14, 2017 at 10:07 AM, Julien Collas <jul.col...@gmail.com> wrote:
> Hello,
>
> I made some tests on a simple environment and it seems that this
> functionality is working fine until 15.08.13 (included).
> With versions 16.05.6, 16.05.9, and 17.02.0-0rc1 I'm not able to see what I
> would expect to see.
>
> # scontrol show part
> PartitionName=short
>    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
>    AllocNodes=ALL Default=YES QoS=N/A
>    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> Hidden=NO
>    MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO
> MaxCPUsPerNode=UNLIMITED
>    Nodes=dhcpvm4-191
>    PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
> OverSubscribe=NO
>    OverTimeLimit=NONE PreemptMode=OFF
>    State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=NONE
>    DefMemPerCPU=200 MaxMemPerCPU=200
>
> # srun --mem 600 sleep 5 && scontrol show job
> JobId=25 JobName=sleep
>    UserId=root(0) GroupId=root(0) MCS_label=N/A
>    Priority=4294901754 Nice=0 Account=(null) QOS=(null)
>    JobState=COMPLETED Reason=None Dependency=(null)
>    Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
>    RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A
>    SubmitTime=2017-02-14T15:38:41 EligibleTime=2017-02-14T15:38:41
>    StartTime=2017-02-14T15:38:41 EndTime=2017-02-14T15:38:46 Deadline=N/A
>    PreemptTime=None SuspendTime=None SecsPreSuspend=0
>    Partition=short AllocNode:Sid=dhcpvm4-174:5130
>    ReqNodeList=(null) ExcNodeList=(null)
>    NodeList=dhcpvm4-191
>    BatchHost=dhcpvm4-191
>    NumNodes=1 NumCPUs=1 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>    TRES=cpu=1,mem=600M,node=1
>    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>    MinCPUsNode=1 MinMemoryNode=600M MinTmpDiskNode=0
>    Features=(null) DelayBoot=00:00:00
>    Gres=(null) Reservation=(null)
>    OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
>    Command=sleep
>    WorkDir=/root
>    Power=
>
> It doesn't help me a lot with my problem but ...
>
> Best regards,
>
> Julien
>
> 2017-02-02 8:53 GMT+01:00 Julien Collas <jul.col...@gmail.com>:
>>
>> Hi,
>>
>> It seems that my MaxMemPerCpu is not working as I would have expected
>> (increase cpu if mem or mem-per-cpu exceed that limit)
>>
>> Here is my partition definition
>>
>> $ scontrol show part short
>> PartitionName=short
>>    AllowGroups=ALL DenyAccounts=data AllowQos=ALL
>>    AllocNodes=ALL Default=YES QoS=N/A
>>    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
>> Hidden=NO
>>    MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1 LLN=NO
>> MaxCPUsPerNode=UNLIMITED
>>    Nodes=srv0029[73-80,87-95,98-99]
>>    PriorityJobFactor=1000 PriorityTier=1000 RootOnly=NO ReqResv=NO
>> OverSubscribe=NO PreemptMode=OFF
>>    State=UP TotalCPUs=560 TotalNodes=28 SelectTypeParameters=NONE
>>    DefMemPerCPU=19000 MaxMemPerCPU=19000
>>
>>
>> $ srun --partition=short --mem=40000 sleep 10
>> $ srun --partition=short --mem-per-cpu=40000 sleep 10
>> $ sacct
>>        JobID      User    Account    JobName   Priority   NTasks
>> AllocCPUS  ReqCPUS     MaxRSS  MaxVMSize     ReqMem      State
>> ------------ --------- ---------- ---------- ---------- --------
>> ---------- -------- ---------- ---------- ---------- ----------
>> 19522383       jcollas      admin      sleep        994        1
>> 1        1        92K    203980K    40000Mn  COMPLETED
>> 19522384       jcollas      admin      sleep        994        1
>> 1        1        92K    203980K    40000Mc  COMPLETED
>>
>> For theses 2 jobs, I would have expected AllocCPUS to 3.
>>
>> $ scontrol show conf
>> ...
>> DefMemPerNode               = UNLIMITED
>> MaxMemPerNode               = UNLIMITED
>> MemLimitEnforce             = Yes
>> SelectTypeParameters        = CR_CPU_MEMORY
>> ...
>> AccountingStorageBackupHost = (null)
>> AccountingStorageEnforce    = associations,limits
>> AccountingStorageHost       = stor089
>> AccountingStorageLoc        = N/A
>> AccountingStoragePort       = 6819
>> AccountingStorageTRES       = cpu,mem,energy,node
>> AccountingStorageType       = accounting_storage/slurmdbd
>> AccountingStorageUser       = N/A
>> AccountingStoreJobComment   = Yes
>> AcctGatherEnergyType        = acct_gather_energy/none
>> AcctGatherFilesystemType    = acct_gather_filesystem/none
>> AcctGatherInfinibandType    = acct_gather_infiniband/none
>> AcctGatherNodeFreq          = 0 sec
>> AcctGatherProfileType       = acct_gather_profile/none
>> JobAcctGatherFrequency      = 10
>> JobAcctGatherType           = jobacct_gather/cgroup
>> JobAcctGatherParams         = (null)
>> ...
>>
>> We are currently running with version 16.05.6
>>
>> Is there something I am missing ?
>>
>>
>> Regards,
>>
>> Julien
>
>

Reply via email to