Hi Amjad,

AccountingStorageUser is the user used to connect to the accounting database. 
If you have it defined in slurm.conf, it is ignored.

>From the output you showed, it says the user cjr13geu in the cluster 
>uea_cluster has access to the QoS.

How are you adding the QoS to other users? The way you would do it would be

sacctmgr modify account <accountname> user=<username> set qos+=gpu-rtx-reserved

or

sacctmgr modify account <accountname> set qos+=gpu-rtx-reserved

if you want to give it to every user in <accountname>

Sean
________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Amjad 
Syed <amjad...@gmail.com>
Sent: Tuesday, 31 August 2021 17:46
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] [EXT] User association with partition and Qos

External email: Please exercise caution

________________________________
Hi Sean

Here is the output for gpu-rtx-reserved qos


sacctmgr show account withassoc -p | grep gpu-rtx-reserved


default|default|default|uea_cluster||cjr13geu|1|||||||||||||||gpu,gpu-k40-1,gpu-rtx,gpu-rtx-reserved,hmem,ht,uea_def_qos|





sontrol show part gpu-rtx6000-2

PartitionName=gpu-rtx6000-2

   AllowGroups=ALL AllowAccounts=ALL AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea

   AllocNodes=ALL Default=NO QoS=N/A

   DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 
Hidden=NO

   MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED

   Nodes=g[15-29]

   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO

   OverTimeLimit=NONE PreemptMode=GANG,SUSPEND

   State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE

   JobDefaults=(null)

   DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED




On a different note we have the following in  slurm.conf


AccountingStorageUser=slurm


But we have been adding qos and assigning users as root ? Can this be an issue




Amjad

On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby 
<scro...@unimelb.edu.au<mailto:scro...@unimelb.edu.au>> wrote:
What does sacctmgr show for the user you added to have access to the QoS, and 
what does Slurm show for the partition config?

sacctmgr show account withassoc -p
scontrol show part gpu-rtx6000-2

Sean
________________________________
From: slurm-users 
<slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>>
 on behalf of Amjad Syed <amjad...@gmail.com<mailto:amjad...@gmail.com>>
Sent: Tuesday, 31 August 2021 17:03
To: Slurm User Community List 
<slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>>
Subject: Re: [slurm-users] [EXT] User association with partition and Qos

External email: Please exercise caution

________________________________
Hello me again

Just found out that when our slurmctld restarts all qos are gone.

I mean users who have association with the qos can not submit job with sbatch, 
they get error as

sbatch: error: Batch job submission failed: Invalid qos specification


Do we need to make anymore changes in slurm.conf so that qos becomes permanent ?

Amjad

On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed 
<amjad...@gmail.com<mailto:amjad...@gmail.com>> wrote:
Hi Sean,

Thanks for the suggestion, seems to work now.

Majid

On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby 
<scro...@unimelb.edu.au<mailto:scro...@unimelb.edu.au>> wrote:
Hi Amjad,

Make sure you have qos in the config entry AccountingStorageEnforce

e.g.

AccountingStorageEnforce=associations,limits,qos,safe

Sean

________________________________
From: slurm-users 
<slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>>
 on behalf of Amjad Syed <amjad...@gmail.com<mailto:amjad...@gmail.com>>
Sent: Friday, 27 August 2021 20:28
To: slurm-us...@schedmd.com<mailto:slurm-us...@schedmd.com> 
<slurm-us...@schedmd.com<mailto:slurm-us...@schedmd.com>>
Subject: [EXT] [slurm-users] User association with partition and Qos

External email: Please exercise caution

________________________________
Hello all

We are having an issue understanding user association and partition.

Currently we have a partition with 30 GPU cards .

We have defined a qos gpu-rtx that allows user to reserve 2 cards


sacctmgr show qos gpu-rtx format=MaxTRESPU%60

                                                   MaxTRESPU

       -----------------------------------------------------

                                           cpu=96,gres/gpu=2




We have defined a user test that is assoc with this qos


sacctmgr show assoc user=test format=user,qos


Qos

gpu-rtx



Now we define another qos  gpu-rtx-reserved  that allows gpu=8


sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60

                                                   MaxTRESPU

       -----------------------------------------------------

                                           cpu=192,gres/gpu=8

User test is not associated with gpu-rtx-reserved qos. So he should not be able 
to use more then gpu=2 .
Both of these qos are now in slurm.conf for the partition


parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 
MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved



But we found out that even though user is not assoc with gpu-rtx-reserved if 
the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8 gpu cards


So our question is , can the users assoc with one partition qos can use the 
other qos in the partition  even if they are not associated with it . or in 
other words , we can only define one partition qos and not more then one.?


Hope i was able to explain ?


Any advice if we want partition to use more then one qos with different limits 
and users associated with one qos should not use other qos ?


Majid



Reply via email to