Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-31 Thread Amjad Syed
Just a correction

We use
sacctmgr modify user= set qos+=gpu-rtx6000-2

Amjad

On Tue, Aug 31, 2021 at 10:17 AM Amjad Syed  wrote:

> Hi Sean
>
> We have been adding by using the following command
>
> sacctmgr modify user set qos+=gpu-rtx-reserved
>
> We have a single account that is associated with all our users and root
> account for admin
>
>
>
> Is that the issue, we need to associate user with account?
>
>
> On Tue, Aug 31, 2021 at 9:38 AM Sean Crosby 
> wrote:
>
>> Hi Amjad,
>>
>> AccountingStorageUser is the user used to connect to the accounting
>> database. If you have it defined in slurm.conf, it is ignored.
>>
>> From the output you showed, it says the user cjr13geu in the cluster
>> uea_cluster has access to the QoS.
>>
>> How are you adding the QoS to other users? The way you would do it would
>> be
>>
>> sacctmgr modify account  user= set qos+=
>> gpu-rtx-reserved
>>
>> or
>>
>> sacctmgr modify account  set qos+=gpu-rtx-reserved
>>
>> if you want to give it to every user in 
>>
>> Sean
>> --------------
>> *From:* slurm-users  on behalf of
>> Amjad Syed 
>> *Sent:* Tuesday, 31 August 2021 17:46
>> *To:* Slurm User Community List 
>> *Subject:* Re: [slurm-users] [EXT] User association with partition and
>> Qos
>>
>> * External email: Please exercise caution *
>> --
>> Hi Sean
>>
>> Here is the output for gpu-rtx-reserved qos
>>
>> sacctmgr show account withassoc -p | grep gpu-rtx-reserved
>>
>>
>>
>> default|default|default|uea_cluster||cjr13geu|1|||gpu,gpu-k40-1,gpu-rtx,
>> *gpu-rtx-reserved*,hmem,ht,uea_def_qos|
>>
>>
>>
>>
>>
>> sontrol show part gpu-rtx6000-2
>>
>> PartitionName=gpu-rtx6000-2
>>
>>AllowGroups=ALL AllowAccounts=ALL
>> AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea
>>
>>AllocNodes=ALL Default=NO QoS=N/A
>>
>>DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO
>> GraceTime=0 Hidden=NO
>>
>>MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO
>> MaxCPUsPerNode=UNLIMITED
>>
>>Nodes=g[15-29]
>>
>>PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
>> OverSubscribe=NO
>>
>>OverTimeLimit=NONE PreemptMode=GANG,SUSPEND
>>
>>State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE
>>
>>JobDefaults=(null)
>>
>>DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED
>>
>>
>>
>>
>> On a different note we have the following in  slurm.conf
>>
>>
>> AccountingStorageUser=slurm
>>
>>
>> But we have been adding qos and assigning users as root ? Can this be an
>> issue
>>
>>
>>
>>
>> Amjad
>>
>> On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby 
>> wrote:
>>
>> What does sacctmgr show for the user you added to have access to the QoS,
>> and what does Slurm show for the partition config?
>>
>> sacctmgr show account withassoc -p
>> scontrol show part gpu-rtx6000-2
>>
>> Sean
>> --
>> *From:* slurm-users  on behalf of
>> Amjad Syed 
>> *Sent:* Tuesday, 31 August 2021 17:03
>> *To:* Slurm User Community List 
>> *Subject:* Re: [slurm-users] [EXT] User association with partition and
>> Qos
>>
>> * External email: Please exercise caution *
>> --
>> Hello me again
>>
>> Just found out that when our slurmctld restarts all qos are gone.
>>
>> I mean users who have association with the qos can not submit job with
>> sbatch, they get error as
>>
>> sbatch: error: Batch job submission failed: Invalid qos specification
>>
>>
>> Do we need to make anymore changes in slurm.conf so that qos becomes
>> permanent ?
>>
>> Amjad
>>
>> On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed  wrote:
>>
>> Hi Sean,
>>
>> Thanks for the suggestion, seems to work now.
>>
>> Majid
>>
>> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby 
>> wrote:
>>
>> Hi Amjad,
>>
>> Make sure you have qos in the config entry AccountingStorageEnforce
>>
>> e.g.
>>
>> AccountingStorageEnforce=associations,limits,qos,safe
>>
>> Sean
>>
>> --
>> *From:* slurm-users  on behalf of
>> Amjad Syed 
>> *Sent:* Friday, 27 August 202

Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-31 Thread Amjad Syed
Hi Sean

We have been adding by using the following command

sacctmgr modify user set qos+=gpu-rtx-reserved

We have a single account that is associated with all our users and root
account for admin



Is that the issue, we need to associate user with account?


On Tue, Aug 31, 2021 at 9:38 AM Sean Crosby  wrote:

> Hi Amjad,
>
> AccountingStorageUser is the user used to connect to the accounting
> database. If you have it defined in slurm.conf, it is ignored.
>
> From the output you showed, it says the user cjr13geu in the cluster
> uea_cluster has access to the QoS.
>
> How are you adding the QoS to other users? The way you would do it would be
>
> sacctmgr modify account  user= set qos+=
> gpu-rtx-reserved
>
> or
>
> sacctmgr modify account  set qos+=gpu-rtx-reserved
>
> if you want to give it to every user in 
>
> Sean
> --
> *From:* slurm-users  on behalf of
> Amjad Syed 
> *Sent:* Tuesday, 31 August 2021 17:46
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos
>
> * External email: Please exercise caution *
> --
> Hi Sean
>
> Here is the output for gpu-rtx-reserved qos
>
> sacctmgr show account withassoc -p | grep gpu-rtx-reserved
>
>
>
> default|default|default|uea_cluster||cjr13geu|1|||gpu,gpu-k40-1,gpu-rtx,
> *gpu-rtx-reserved*,hmem,ht,uea_def_qos|
>
>
>
>
>
> sontrol show part gpu-rtx6000-2
>
> PartitionName=gpu-rtx6000-2
>
>AllowGroups=ALL AllowAccounts=ALL
> AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea
>
>AllocNodes=ALL Default=NO QoS=N/A
>
>DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> Hidden=NO
>
>MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO
> MaxCPUsPerNode=UNLIMITED
>
>Nodes=g[15-29]
>
>PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
> OverSubscribe=NO
>
>OverTimeLimit=NONE PreemptMode=GANG,SUSPEND
>
>State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE
>
>JobDefaults=(null)
>
>DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED
>
>
>
>
> On a different note we have the following in  slurm.conf
>
>
> AccountingStorageUser=slurm
>
>
> But we have been adding qos and assigning users as root ? Can this be an
> issue
>
>
>
>
> Amjad
>
> On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby 
> wrote:
>
> What does sacctmgr show for the user you added to have access to the QoS,
> and what does Slurm show for the partition config?
>
> sacctmgr show account withassoc -p
> scontrol show part gpu-rtx6000-2
>
> Sean
> --
> *From:* slurm-users  on behalf of
> Amjad Syed 
> *Sent:* Tuesday, 31 August 2021 17:03
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos
>
> * External email: Please exercise caution *
> --
> Hello me again
>
> Just found out that when our slurmctld restarts all qos are gone.
>
> I mean users who have association with the qos can not submit job with
> sbatch, they get error as
>
> sbatch: error: Batch job submission failed: Invalid qos specification
>
>
> Do we need to make anymore changes in slurm.conf so that qos becomes
> permanent ?
>
> Amjad
>
> On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed  wrote:
>
> Hi Sean,
>
> Thanks for the suggestion, seems to work now.
>
> Majid
>
> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby 
> wrote:
>
> Hi Amjad,
>
> Make sure you have qos in the config entry AccountingStorageEnforce
>
> e.g.
>
> AccountingStorageEnforce=associations,limits,qos,safe
>
> Sean
>
> --
> *From:* slurm-users  on behalf of
> Amjad Syed 
> *Sent:* Friday, 27 August 2021 20:28
> *To:* slurm-us...@schedmd.com 
> *Subject:* [EXT] [slurm-users] User association with partition and Qos
>
> * External email: Please exercise caution *
> --
> Hello all
>
> We are having an issue understanding user association and partition.
>
> Currently we have a partition with 30 GPU cards .
>
> We have defined a qos gpu-rtx that allows user to reserve 2 cards
>
> sacctmgr show qos gpu-rtx format=MaxTRESPU%60
>
>MaxTRESPU
>
>-
>cpu=96,gres/gpu=2
>
>
>
>
> We have defined a user test that is assoc with this qos
>
>
> sacctmgr show assoc user=test format=us

Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-31 Thread Sean Crosby
Hi Amjad,

AccountingStorageUser is the user used to connect to the accounting database. 
If you have it defined in slurm.conf, it is ignored.

>From the output you showed, it says the user cjr13geu in the cluster 
>uea_cluster has access to the QoS.

How are you adding the QoS to other users? The way you would do it would be

sacctmgr modify account  user= set qos+=gpu-rtx-reserved

or

sacctmgr modify account  set qos+=gpu-rtx-reserved

if you want to give it to every user in 

Sean

From: slurm-users  on behalf of Amjad 
Syed 
Sent: Tuesday, 31 August 2021 17:46
To: Slurm User Community List 
Subject: Re: [slurm-users] [EXT] User association with partition and Qos

External email: Please exercise caution


Hi Sean

Here is the output for gpu-rtx-reserved qos


sacctmgr show account withassoc -p | grep gpu-rtx-reserved


default|default|default|uea_cluster||cjr13geu|1|||gpu,gpu-k40-1,gpu-rtx,gpu-rtx-reserved,hmem,ht,uea_def_qos|





sontrol show part gpu-rtx6000-2

PartitionName=gpu-rtx6000-2

   AllowGroups=ALL AllowAccounts=ALL AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea

   AllocNodes=ALL Default=NO QoS=N/A

   DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 
Hidden=NO

   MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED

   Nodes=g[15-29]

   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO

   OverTimeLimit=NONE PreemptMode=GANG,SUSPEND

   State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE

   JobDefaults=(null)

   DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED




On a different note we have the following in  slurm.conf


AccountingStorageUser=slurm


But we have been adding qos and assigning users as root ? Can this be an issue




Amjad

On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby 
mailto:scro...@unimelb.edu.au>> wrote:
What does sacctmgr show for the user you added to have access to the QoS, and 
what does Slurm show for the partition config?

sacctmgr show account withassoc -p
scontrol show part gpu-rtx6000-2

Sean

From: slurm-users 
mailto:slurm-users-boun...@lists.schedmd.com>>
 on behalf of Amjad Syed mailto:amjad...@gmail.com>>
Sent: Tuesday, 31 August 2021 17:03
To: Slurm User Community List 
mailto:slurm-users@lists.schedmd.com>>
Subject: Re: [slurm-users] [EXT] User association with partition and Qos

External email: Please exercise caution


Hello me again

Just found out that when our slurmctld restarts all qos are gone.

I mean users who have association with the qos can not submit job with sbatch, 
they get error as

sbatch: error: Batch job submission failed: Invalid qos specification


Do we need to make anymore changes in slurm.conf so that qos becomes permanent ?

Amjad

On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed 
mailto:amjad...@gmail.com>> wrote:
Hi Sean,

Thanks for the suggestion, seems to work now.

Majid

On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby 
mailto:scro...@unimelb.edu.au>> wrote:
Hi Amjad,

Make sure you have qos in the config entry AccountingStorageEnforce

e.g.

AccountingStorageEnforce=associations,limits,qos,safe

Sean


From: slurm-users 
mailto:slurm-users-boun...@lists.schedmd.com>>
 on behalf of Amjad Syed mailto:amjad...@gmail.com>>
Sent: Friday, 27 August 2021 20:28
To: slurm-us...@schedmd.com<mailto:slurm-us...@schedmd.com> 
mailto:slurm-us...@schedmd.com>>
Subject: [EXT] [slurm-users] User association with partition and Qos

External email: Please exercise caution


Hello all

We are having an issue understanding user association and partition.

Currently we have a partition with 30 GPU cards .

We have defined a qos gpu-rtx that allows user to reserve 2 cards


sacctmgr show qos gpu-rtx format=MaxTRESPU%60

   MaxTRESPU

   -

   cpu=96,gres/gpu=2




We have defined a user test that is assoc with this qos


sacctmgr show assoc user=test format=user,qos


Qos

gpu-rtx



Now we define another qos  gpu-rtx-reserved  that allows gpu=8


sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60

   MaxTRESPU

   -

   cpu=192,gres/gpu=8

User test is not associated with gpu-rtx-reserved qos. So he should not be able 
to use more then gpu=2 .
Both of these qos are now in slurm.conf for the partition


parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 
MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved



But we found out that even though user is not assoc with gpu-rtx-reserved if 
the user uses 

Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-31 Thread Amjad Syed
Hi Sean

Here is the output for gpu-rtx-reserved qos

sacctmgr show account withassoc -p | grep gpu-rtx-reserved


default|default|default|uea_cluster||cjr13geu|1|||gpu,gpu-k40-1,gpu-rtx,
*gpu-rtx-reserved*,hmem,ht,uea_def_qos|





sontrol show part gpu-rtx6000-2

PartitionName=gpu-rtx6000-2

   AllowGroups=ALL AllowAccounts=ALL
AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea

   AllocNodes=ALL Default=NO QoS=N/A

   DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO

   MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED

   Nodes=g[15-29]

   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
OverSubscribe=NO

   OverTimeLimit=NONE PreemptMode=GANG,SUSPEND

   State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE

   JobDefaults=(null)

   DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED




On a different note we have the following in  slurm.conf


AccountingStorageUser=slurm


But we have been adding qos and assigning users as root ? Can this be an
issue




Amjad

On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby  wrote:

> What does sacctmgr show for the user you added to have access to the QoS,
> and what does Slurm show for the partition config?
>
> sacctmgr show account withassoc -p
> scontrol show part gpu-rtx6000-2
>
> Sean
> --
> *From:* slurm-users  on behalf of
> Amjad Syed 
> *Sent:* Tuesday, 31 August 2021 17:03
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos
>
> * External email: Please exercise caution *
> --
> Hello me again
>
> Just found out that when our slurmctld restarts all qos are gone.
>
> I mean users who have association with the qos can not submit job with
> sbatch, they get error as
>
> sbatch: error: Batch job submission failed: Invalid qos specification
>
>
> Do we need to make anymore changes in slurm.conf so that qos becomes
> permanent ?
>
> Amjad
>
> On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed  wrote:
>
> Hi Sean,
>
> Thanks for the suggestion, seems to work now.
>
> Majid
>
> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby 
> wrote:
>
> Hi Amjad,
>
> Make sure you have qos in the config entry AccountingStorageEnforce
>
> e.g.
>
> AccountingStorageEnforce=associations,limits,qos,safe
>
> Sean
>
> --
> *From:* slurm-users  on behalf of
> Amjad Syed 
> *Sent:* Friday, 27 August 2021 20:28
> *To:* slurm-us...@schedmd.com 
> *Subject:* [EXT] [slurm-users] User association with partition and Qos
>
> * External email: Please exercise caution *
> --
> Hello all
>
> We are having an issue understanding user association and partition.
>
> Currently we have a partition with 30 GPU cards .
>
> We have defined a qos gpu-rtx that allows user to reserve 2 cards
>
> sacctmgr show qos gpu-rtx format=MaxTRESPU%60
>
>MaxTRESPU
>
>-
>cpu=96,gres/gpu=2
>
>
>
>
> We have defined a user test that is assoc with this qos
>
>
> sacctmgr show assoc user=test format=user,qos
>
>
> Qos
>
> gpu-rtx
>
>
>
> Now we define another qos  gpu-rtx-reserved  that allows gpu=8
>
>
> sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60
>
>MaxTRESPU
>
>-
>cpu=192,gres/gpu=8
>
> User test is not associated with gpu-rtx-reserved qos. So he should not be
> able to use more then gpu=2 .
> Both of these qos are now in slurm.conf for the partition
>
> parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9
> MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved
>
>
>
> But we found out that even though user is not assoc with gpu-rtx-reserved
> if the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8
> gpu cards
>
>
> So our question is , can the users assoc with one partition qos can use
> the other qos in the partition  even if they are not associated with it .
> or in other words , we can only define one partition qos and not more then
> one.?
>
>
> Hope i was able to explain ?
>
>
> Any advice if we want partition to use more then one qos with different
> limits and users associated with one qos should not use other qos ?
>
>
> Majid
>
>
>
>
>


Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-31 Thread Sean Crosby
What does sacctmgr show for the user you added to have access to the QoS, and 
what does Slurm show for the partition config?

sacctmgr show account withassoc -p
scontrol show part gpu-rtx6000-2

Sean

From: slurm-users  on behalf of Amjad 
Syed 
Sent: Tuesday, 31 August 2021 17:03
To: Slurm User Community List 
Subject: Re: [slurm-users] [EXT] User association with partition and Qos

External email: Please exercise caution


Hello me again

Just found out that when our slurmctld restarts all qos are gone.

I mean users who have association with the qos can not submit job with sbatch, 
they get error as

sbatch: error: Batch job submission failed: Invalid qos specification


Do we need to make anymore changes in slurm.conf so that qos becomes permanent ?

Amjad

On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed 
mailto:amjad...@gmail.com>> wrote:
Hi Sean,

Thanks for the suggestion, seems to work now.

Majid

On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby 
mailto:scro...@unimelb.edu.au>> wrote:
Hi Amjad,

Make sure you have qos in the config entry AccountingStorageEnforce

e.g.

AccountingStorageEnforce=associations,limits,qos,safe

Sean


From: slurm-users 
mailto:slurm-users-boun...@lists.schedmd.com>>
 on behalf of Amjad Syed mailto:amjad...@gmail.com>>
Sent: Friday, 27 August 2021 20:28
To: slurm-us...@schedmd.com<mailto:slurm-us...@schedmd.com> 
mailto:slurm-us...@schedmd.com>>
Subject: [EXT] [slurm-users] User association with partition and Qos

External email: Please exercise caution


Hello all

We are having an issue understanding user association and partition.

Currently we have a partition with 30 GPU cards .

We have defined a qos gpu-rtx that allows user to reserve 2 cards


sacctmgr show qos gpu-rtx format=MaxTRESPU%60

   MaxTRESPU

   -

   cpu=96,gres/gpu=2




We have defined a user test that is assoc with this qos


sacctmgr show assoc user=test format=user,qos


Qos

gpu-rtx



Now we define another qos  gpu-rtx-reserved  that allows gpu=8


sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60

   MaxTRESPU

   -

   cpu=192,gres/gpu=8

User test is not associated with gpu-rtx-reserved qos. So he should not be able 
to use more then gpu=2 .
Both of these qos are now in slurm.conf for the partition


parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 
MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved



But we found out that even though user is not assoc with gpu-rtx-reserved if 
the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8 gpu cards


So our question is , can the users assoc with one partition qos can use the 
other qos in the partition  even if they are not associated with it . or in 
other words , we can only define one partition qos and not more then one.?


Hope i was able to explain ?


Any advice if we want partition to use more then one qos with different limits 
and users associated with one qos should not use other qos ?


Majid





Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-31 Thread Amjad Syed
Hello me again

Just found out that when our slurmctld restarts all qos are gone.

I mean users who have association with the qos can not submit job with
sbatch, they get error as

sbatch: error: Batch job submission failed: Invalid qos specification


Do we need to make anymore changes in slurm.conf so that qos becomes
permanent ?

Amjad

On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed  wrote:

> Hi Sean,
>
> Thanks for the suggestion, seems to work now.
>
> Majid
>
> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby 
> wrote:
>
>> Hi Amjad,
>>
>> Make sure you have qos in the config entry AccountingStorageEnforce
>>
>> e.g.
>>
>> AccountingStorageEnforce=associations,limits,qos,safe
>>
>> Sean
>>
>> --
>> *From:* slurm-users  on behalf of
>> Amjad Syed 
>> *Sent:* Friday, 27 August 2021 20:28
>> *To:* slurm-us...@schedmd.com 
>> *Subject:* [EXT] [slurm-users] User association with partition and Qos
>>
>> * External email: Please exercise caution *
>> --
>> Hello all
>>
>> We are having an issue understanding user association and partition.
>>
>> Currently we have a partition with 30 GPU cards .
>>
>> We have defined a qos gpu-rtx that allows user to reserve 2 cards
>>
>> sacctmgr show qos gpu-rtx format=MaxTRESPU%60
>>
>>MaxTRESPU
>>
>>-
>>cpu=96,gres/gpu=2
>>
>>
>>
>>
>> We have defined a user test that is assoc with this qos
>>
>>
>> sacctmgr show assoc user=test format=user,qos
>>
>>
>> Qos
>>
>> gpu-rtx
>>
>>
>>
>> Now we define another qos  gpu-rtx-reserved  that allows gpu=8
>>
>>
>> sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60
>>
>>MaxTRESPU
>>
>>-
>>cpu=192,gres/gpu=8
>>
>> User test is not associated with gpu-rtx-reserved qos. So he should not
>> be able to use more then gpu=2 .
>> Both of these qos are now in slurm.conf for the partition
>>
>> parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9
>> MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved
>>
>>
>>
>> But we found out that even though user is not assoc with gpu-rtx-reserved
>> if the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8
>> gpu cards
>>
>>
>> So our question is , can the users assoc with one partition qos can use
>> the other qos in the partition  even if they are not associated with it .
>> or in other words , we can only define one partition qos and not more then
>> one.?
>>
>>
>> Hope i was able to explain ?
>>
>>
>> Any advice if we want partition to use more then one qos with different
>> limits and users associated with one qos should not use other qos ?
>>
>>
>> Majid
>>
>>
>>
>>
>>


Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-27 Thread Amjad Syed
Hi Sean,

Thanks for the suggestion, seems to work now.

Majid

On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby  wrote:

> Hi Amjad,
>
> Make sure you have qos in the config entry AccountingStorageEnforce
>
> e.g.
>
> AccountingStorageEnforce=associations,limits,qos,safe
>
> Sean
>
> --
> *From:* slurm-users  on behalf of
> Amjad Syed 
> *Sent:* Friday, 27 August 2021 20:28
> *To:* slurm-us...@schedmd.com 
> *Subject:* [EXT] [slurm-users] User association with partition and Qos
>
> * External email: Please exercise caution *
> --
> Hello all
>
> We are having an issue understanding user association and partition.
>
> Currently we have a partition with 30 GPU cards .
>
> We have defined a qos gpu-rtx that allows user to reserve 2 cards
>
> sacctmgr show qos gpu-rtx format=MaxTRESPU%60
>
>MaxTRESPU
>
>-
>cpu=96,gres/gpu=2
>
>
>
>
> We have defined a user test that is assoc with this qos
>
>
> sacctmgr show assoc user=test format=user,qos
>
>
> Qos
>
> gpu-rtx
>
>
>
> Now we define another qos  gpu-rtx-reserved  that allows gpu=8
>
>
> sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60
>
>MaxTRESPU
>
>-
>cpu=192,gres/gpu=8
>
> User test is not associated with gpu-rtx-reserved qos. So he should not be
> able to use more then gpu=2 .
> Both of these qos are now in slurm.conf for the partition
>
> parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9
> MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved
>
>
>
> But we found out that even though user is not assoc with gpu-rtx-reserved
> if the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8
> gpu cards
>
>
> So our question is , can the users assoc with one partition qos can use
> the other qos in the partition  even if they are not associated with it .
> or in other words , we can only define one partition qos and not more then
> one.?
>
>
> Hope i was able to explain ?
>
>
> Any advice if we want partition to use more then one qos with different
> limits and users associated with one qos should not use other qos ?
>
>
> Majid
>
>
>
>
>


Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-27 Thread Sean Crosby
Hi Amjad,

Make sure you have qos in the config entry AccountingStorageEnforce

e.g.

AccountingStorageEnforce=associations,limits,qos,safe

Sean


From: slurm-users  on behalf of Amjad 
Syed 
Sent: Friday, 27 August 2021 20:28
To: slurm-us...@schedmd.com 
Subject: [EXT] [slurm-users] User association with partition and Qos

External email: Please exercise caution


Hello all

We are having an issue understanding user association and partition.

Currently we have a partition with 30 GPU cards .

We have defined a qos gpu-rtx that allows user to reserve 2 cards


sacctmgr show qos gpu-rtx format=MaxTRESPU%60

   MaxTRESPU

   -

   cpu=96,gres/gpu=2




We have defined a user test that is assoc with this qos


sacctmgr show assoc user=test format=user,qos


Qos

gpu-rtx



Now we define another qos  gpu-rtx-reserved  that allows gpu=8


sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60

   MaxTRESPU

   -

   cpu=192,gres/gpu=8

User test is not associated with gpu-rtx-reserved qos. So he should not be able 
to use more then gpu=2 .
Both of these qos are now in slurm.conf for the partition


parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 
MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved



But we found out that even though user is not assoc with gpu-rtx-reserved if 
the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8 gpu cards


So our question is , can the users assoc with one partition qos can use the 
other qos in the partition  even if they are not associated with it . or in 
other words , we can only define one partition qos and not more then one.?


Hope i was able to explain ?


Any advice if we want partition to use more then one qos with different limits 
and users associated with one qos should not use other qos ?


Majid