Hi all,
Generally new to Slurm here, so please forgive any ignorance... We have a test cluster (three compute nodes) running Slurm 16.05.4 in operation, with the ‘multifactor’ scheduler in use. We have set up slurmdb, and have set up associations for the users on partitions of the cluster, as follows: [root@ml43 ~]# sacctmgr show associations Cluster Account User Partition Share GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin ---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- ml-cluster root 1 normal ml-cluster root root 1 normal ml-cluster ml 1 normal ml-cluster ml alex scavenger 1 normal ml-cluster ml alex batch 1 normal ml-cluster ml alex long 1 1 normal ml-cluster ml iain scavenger 1 normal ml-cluster ml iain batch 1 normal ml-cluster ml iain long 1 normal As you may notice, we have set up a “MaxJobs” limit of “1" for the ‘alex’ user on the ‘long’ partition. What we want to do is enforce a maximum of one job running at a time per user for the ‘long’ partition. However, when the user ‘alex’ submitted a number of jobs to this partition, all of them ran: [root@ml43 ~]# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 324 long tmp.sh alex PD 0:00 1 (Resources) 321 long tmp.sh alex R 1:56 1 ml46 323 long tmp.sh alex R 0:33 1 ml53 322 long tmp.sh alex R 0:36 1 ml48 From the output of “share” we verified the right queue got the job: [root@ml43 ~]# sshare -am Account User Partition RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ------------ ---------- ----------- ----------- ------------- ---------- root 1.000000 7977 1.000000 0.500000 root root 1 0.500000 0 0.000000 1.000000 ml 1 0.500000 7977 1.000000 0.250000 ml alex scavenger 1 0.083333 0 0.166667 0.250000 ml alex batch 1 0.083333 0 0.166667 0.250000 ml alex long 1 0.083333 7977 1.000000 0.000244 ml iain scavenger 1 0.083333 0 0.166667 0.250000 ml iain batch 1 0.083333 0 0.166667 0.250000 ml iain long 1 0.083333 0 0.166667 0.250000 Why doesn’t the “MaxJobs” limit prevent the running of more than one job at a time for this user? Thanks, Will