Hi Paddy,

Why don't you add new QoS's and add them as partition QoS for each
partition, and then set the defaults on those partition QoS?

Like

sacctmgr add qos cloud

PartitionName=cloud Nodes=node[1-6] Default=YES MaxTime=30-0
DefaultTime=0:10:0 State=DOWN  QoS=cloud

That way you could have different QoS names for all the partitions across
all of your clusters, and set the limits on the QoS?

Sean

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Sat, 20 Jun 2020 at 07:24, Paddy Doyle <pa...@tchpc.tcd.ie> wrote:

> UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts.
>
> Hi all,
>
> I've been trying to understand how to properly set a limit on the number of
> cores a user (or an association is fine either) can have in use at any one
> time.
>
> Ideally, I'd like to be able to set a default value once for the cluster,
> and then have it inherit down to lots of associations and users. And there
> are multiple clusters that need such a limit.
>
> Our setup has a single shared Slurmdbd, with multiple clusters connected
> back to it (I think that's important for QOS-based solutions).
>
> Most of the previous mails about this on the list (I know it's come up many
> times before) talk about QOS-based solutions, but the problem is that the
> QOS limits are global across all clusters, and so we can't use them like
> that.
>
> I've tried lots of different sacctmgr options on a test cluster, and can't
> seem to get it right. Any help would be really appreciated!
>
>
> I'll go through what I've tried:
>
>
> MaxJobs: this is not right, as it limits the jobs, not the number of cores.
> So a user can have lots of high-core-count jobs.
>
>
>   sacctmgr update qos normal set maxtresperuser=cpu=32
>
> That will work.. except that QOS is global across all of the
> slurmdbd-connected clusters. So unless every cluster is of the same size
> and the policies need to be the same, it won't work in practice.
>
>
>   sacctmgr update account cluster=C1 set MaxTRES=cpu=32 where account=A1
>
> That limit is per-job, not per user.
>
>
>   sacctmgr update account cluster=C1  set GrpTRES=cpu=32
>
> That limits a max of 32 cores in use over the entire cluster, so that's not
> right.
>
>
>   sacctmgr update account cluster=C1  set GrpTRES=cpu=32 where account=A1
>
> That will work alright for *that* account.
>
> But the idea of having to do this for many 10s of accounts doesn't leave me
> too happy. And we would have to make it part of a new account workflow. And
> any future policy changes would have to be reset individually for all
> existing accounts.
>
>
> Is there some other way that I've missed?
>
> Thanks!
>
> Paddy
>
> --
> Paddy Doyle
> Research IT / Trinity Centre for High Performance Computing,
> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> Phone: +353-1-896-3725
> https://www.tchpc.tcd.ie/
>
>

Reply via email to