Hi all,
Generally new to Slurm here, so please forgive any ignorance...
We have a test cluster (three compute nodes) running Slurm 16.05.4 in
operation, with the ‘multifactor’ scheduler in use. We have set up slurmdb, and
have set up associations for the users on partitions of the cluster, as follows:
[root@ml43 ~]# sacctmgr show associations
Cluster Account User Partition Share GrpJobs GrpTRES
GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode
MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ------- -------------
--------- ----------- ------------- ------- ------------- --------------
--------- ----------- ------------- -------------------- --------- -------------
ml-cluster root 1
normal
ml-cluster root root 1
normal
ml-cluster ml 1
normal
ml-cluster ml alex scavenger 1
normal
ml-cluster ml alex batch 1
normal
ml-cluster ml alex long 1
1
normal
ml-cluster ml iain scavenger 1
normal
ml-cluster ml iain batch 1
normal
ml-cluster ml iain long 1
normal
As you may notice, we have set up a “MaxJobs” limit of “1" for the ‘alex’ user
on the ‘long’ partition. What we want to do is enforce a maximum of one job
running at a time per user for the ‘long’ partition. However, when the user
‘alex’ submitted a number of jobs to this partition, all of them ran:
[root@ml43 ~]# squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
324 long tmp.sh alex PD 0:00 1 (Resources)
321 long tmp.sh alex R 1:56 1 ml46
323 long tmp.sh alex R 0:33 1 ml53
322 long tmp.sh alex R 0:36 1 ml48
From the output of “share” we verified the right queue got the job:
[root@ml43 ~]# sshare -am
Account User Partition RawShares NormShares RawUsage
EffectvUsage FairShare
-------------------- ---------- ------------ ---------- ----------- -----------
------------- ----------
root 1.000000 7977
1.000000 0.500000
root root 1 0.500000 0
0.000000 1.000000
ml 1 0.500000 7977
1.000000 0.250000
ml alex scavenger 1 0.083333 0
0.166667 0.250000
ml alex batch 1 0.083333 0
0.166667 0.250000
ml alex long 1 0.083333 7977
1.000000 0.000244
ml iain scavenger 1 0.083333 0
0.166667 0.250000
ml iain batch 1 0.083333 0
0.166667 0.250000
ml iain long 1 0.083333 0
0.166667 0.250000
Why doesn’t the “MaxJobs” limit prevent the running of more than one job at a
time for this user?
Thanks,
Will
