Hi Paddy, Why don't you add new QoS's and add them as partition QoS for each partition, and then set the defaults on those partition QoS?
Like sacctmgr add qos cloud PartitionName=cloud Nodes=node[1-6] Default=YES MaxTime=30-0 DefaultTime=0:10:0 State=DOWN QoS=cloud That way you could have different QoS names for all the partitions across all of your clusters, and set the limits on the QoS? Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Sat, 20 Jun 2020 at 07:24, Paddy Doyle <pa...@tchpc.tcd.ie> wrote: > UoM notice: External email. Be cautious of links, attachments, or > impersonation attempts. > > Hi all, > > I've been trying to understand how to properly set a limit on the number of > cores a user (or an association is fine either) can have in use at any one > time. > > Ideally, I'd like to be able to set a default value once for the cluster, > and then have it inherit down to lots of associations and users. And there > are multiple clusters that need such a limit. > > Our setup has a single shared Slurmdbd, with multiple clusters connected > back to it (I think that's important for QOS-based solutions). > > Most of the previous mails about this on the list (I know it's come up many > times before) talk about QOS-based solutions, but the problem is that the > QOS limits are global across all clusters, and so we can't use them like > that. > > I've tried lots of different sacctmgr options on a test cluster, and can't > seem to get it right. Any help would be really appreciated! > > > I'll go through what I've tried: > > > MaxJobs: this is not right, as it limits the jobs, not the number of cores. > So a user can have lots of high-core-count jobs. > > > sacctmgr update qos normal set maxtresperuser=cpu=32 > > That will work.. except that QOS is global across all of the > slurmdbd-connected clusters. So unless every cluster is of the same size > and the policies need to be the same, it won't work in practice. > > > sacctmgr update account cluster=C1 set MaxTRES=cpu=32 where account=A1 > > That limit is per-job, not per user. > > > sacctmgr update account cluster=C1 set GrpTRES=cpu=32 > > That limits a max of 32 cores in use over the entire cluster, so that's not > right. > > > sacctmgr update account cluster=C1 set GrpTRES=cpu=32 where account=A1 > > That will work alright for *that* account. > > But the idea of having to do this for many 10s of accounts doesn't leave me > too happy. And we would have to make it part of a new account workflow. And > any future policy changes would have to be reset individually for all > existing accounts. > > > Is there some other way that I've missed? > > Thanks! > > Paddy > > -- > Paddy Doyle > Research IT / Trinity Centre for High Performance Computing, > Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. > Phone: +353-1-896-3725 > https://www.tchpc.tcd.ie/ > >