Re: [slurm-users] [EXT] User association with partition and Qos
Just a correction We use sacctmgr modify user= set qos+=gpu-rtx6000-2 Amjad On Tue, Aug 31, 2021 at 10:17 AM Amjad Syed wrote: > Hi Sean > > We have been adding by using the following command > > sacctmgr modify user set qos+=gpu-rtx-reserved > > We have a single account that is associated with all our users and root > account for admin > > > > Is that the issue, we need to associate user with account? > > > On Tue, Aug 31, 2021 at 9:38 AM Sean Crosby > wrote: > >> Hi Amjad, >> >> AccountingStorageUser is the user used to connect to the accounting >> database. If you have it defined in slurm.conf, it is ignored. >> >> From the output you showed, it says the user cjr13geu in the cluster >> uea_cluster has access to the QoS. >> >> How are you adding the QoS to other users? The way you would do it would >> be >> >> sacctmgr modify account user= set qos+= >> gpu-rtx-reserved >> >> or >> >> sacctmgr modify account set qos+=gpu-rtx-reserved >> >> if you want to give it to every user in >> >> Sean >> -------------- >> *From:* slurm-users on behalf of >> Amjad Syed >> *Sent:* Tuesday, 31 August 2021 17:46 >> *To:* Slurm User Community List >> *Subject:* Re: [slurm-users] [EXT] User association with partition and >> Qos >> >> * External email: Please exercise caution * >> -- >> Hi Sean >> >> Here is the output for gpu-rtx-reserved qos >> >> sacctmgr show account withassoc -p | grep gpu-rtx-reserved >> >> >> >> default|default|default|uea_cluster||cjr13geu|1|||gpu,gpu-k40-1,gpu-rtx, >> *gpu-rtx-reserved*,hmem,ht,uea_def_qos| >> >> >> >> >> >> sontrol show part gpu-rtx6000-2 >> >> PartitionName=gpu-rtx6000-2 >> >>AllowGroups=ALL AllowAccounts=ALL >> AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea >> >>AllocNodes=ALL Default=NO QoS=N/A >> >>DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO >> GraceTime=0 Hidden=NO >> >>MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO >> MaxCPUsPerNode=UNLIMITED >> >>Nodes=g[15-29] >> >>PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO >> OverSubscribe=NO >> >>OverTimeLimit=NONE PreemptMode=GANG,SUSPEND >> >>State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE >> >>JobDefaults=(null) >> >>DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED >> >> >> >> >> On a different note we have the following in slurm.conf >> >> >> AccountingStorageUser=slurm >> >> >> But we have been adding qos and assigning users as root ? Can this be an >> issue >> >> >> >> >> Amjad >> >> On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby >> wrote: >> >> What does sacctmgr show for the user you added to have access to the QoS, >> and what does Slurm show for the partition config? >> >> sacctmgr show account withassoc -p >> scontrol show part gpu-rtx6000-2 >> >> Sean >> -- >> *From:* slurm-users on behalf of >> Amjad Syed >> *Sent:* Tuesday, 31 August 2021 17:03 >> *To:* Slurm User Community List >> *Subject:* Re: [slurm-users] [EXT] User association with partition and >> Qos >> >> * External email: Please exercise caution * >> -- >> Hello me again >> >> Just found out that when our slurmctld restarts all qos are gone. >> >> I mean users who have association with the qos can not submit job with >> sbatch, they get error as >> >> sbatch: error: Batch job submission failed: Invalid qos specification >> >> >> Do we need to make anymore changes in slurm.conf so that qos becomes >> permanent ? >> >> Amjad >> >> On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed wrote: >> >> Hi Sean, >> >> Thanks for the suggestion, seems to work now. >> >> Majid >> >> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby >> wrote: >> >> Hi Amjad, >> >> Make sure you have qos in the config entry AccountingStorageEnforce >> >> e.g. >> >> AccountingStorageEnforce=associations,limits,qos,safe >> >> Sean >> >> -- >> *From:* slurm-users on behalf of >> Amjad Syed >> *Sent:* Friday, 27 August 202
Re: [slurm-users] [EXT] User association with partition and Qos
Hi Sean We have been adding by using the following command sacctmgr modify user set qos+=gpu-rtx-reserved We have a single account that is associated with all our users and root account for admin Is that the issue, we need to associate user with account? On Tue, Aug 31, 2021 at 9:38 AM Sean Crosby wrote: > Hi Amjad, > > AccountingStorageUser is the user used to connect to the accounting > database. If you have it defined in slurm.conf, it is ignored. > > From the output you showed, it says the user cjr13geu in the cluster > uea_cluster has access to the QoS. > > How are you adding the QoS to other users? The way you would do it would be > > sacctmgr modify account user= set qos+= > gpu-rtx-reserved > > or > > sacctmgr modify account set qos+=gpu-rtx-reserved > > if you want to give it to every user in > > Sean > -- > *From:* slurm-users on behalf of > Amjad Syed > *Sent:* Tuesday, 31 August 2021 17:46 > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos > > * External email: Please exercise caution * > -- > Hi Sean > > Here is the output for gpu-rtx-reserved qos > > sacctmgr show account withassoc -p | grep gpu-rtx-reserved > > > > default|default|default|uea_cluster||cjr13geu|1|||gpu,gpu-k40-1,gpu-rtx, > *gpu-rtx-reserved*,hmem,ht,uea_def_qos| > > > > > > sontrol show part gpu-rtx6000-2 > > PartitionName=gpu-rtx6000-2 > >AllowGroups=ALL AllowAccounts=ALL > AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea > >AllocNodes=ALL Default=NO QoS=N/A > >DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > Hidden=NO > >MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO > MaxCPUsPerNode=UNLIMITED > >Nodes=g[15-29] > >PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO > OverSubscribe=NO > >OverTimeLimit=NONE PreemptMode=GANG,SUSPEND > >State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE > >JobDefaults=(null) > >DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED > > > > > On a different note we have the following in slurm.conf > > > AccountingStorageUser=slurm > > > But we have been adding qos and assigning users as root ? Can this be an > issue > > > > > Amjad > > On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby > wrote: > > What does sacctmgr show for the user you added to have access to the QoS, > and what does Slurm show for the partition config? > > sacctmgr show account withassoc -p > scontrol show part gpu-rtx6000-2 > > Sean > -- > *From:* slurm-users on behalf of > Amjad Syed > *Sent:* Tuesday, 31 August 2021 17:03 > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos > > * External email: Please exercise caution * > -- > Hello me again > > Just found out that when our slurmctld restarts all qos are gone. > > I mean users who have association with the qos can not submit job with > sbatch, they get error as > > sbatch: error: Batch job submission failed: Invalid qos specification > > > Do we need to make anymore changes in slurm.conf so that qos becomes > permanent ? > > Amjad > > On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed wrote: > > Hi Sean, > > Thanks for the suggestion, seems to work now. > > Majid > > On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby > wrote: > > Hi Amjad, > > Make sure you have qos in the config entry AccountingStorageEnforce > > e.g. > > AccountingStorageEnforce=associations,limits,qos,safe > > Sean > > -- > *From:* slurm-users on behalf of > Amjad Syed > *Sent:* Friday, 27 August 2021 20:28 > *To:* slurm-us...@schedmd.com > *Subject:* [EXT] [slurm-users] User association with partition and Qos > > * External email: Please exercise caution * > -- > Hello all > > We are having an issue understanding user association and partition. > > Currently we have a partition with 30 GPU cards . > > We have defined a qos gpu-rtx that allows user to reserve 2 cards > > sacctmgr show qos gpu-rtx format=MaxTRESPU%60 > >MaxTRESPU > >- >cpu=96,gres/gpu=2 > > > > > We have defined a user test that is assoc with this qos > > > sacctmgr show assoc user=test format=us
Re: [slurm-users] [EXT] User association with partition and Qos
Hi Amjad, AccountingStorageUser is the user used to connect to the accounting database. If you have it defined in slurm.conf, it is ignored. >From the output you showed, it says the user cjr13geu in the cluster >uea_cluster has access to the QoS. How are you adding the QoS to other users? The way you would do it would be sacctmgr modify account user= set qos+=gpu-rtx-reserved or sacctmgr modify account set qos+=gpu-rtx-reserved if you want to give it to every user in Sean From: slurm-users on behalf of Amjad Syed Sent: Tuesday, 31 August 2021 17:46 To: Slurm User Community List Subject: Re: [slurm-users] [EXT] User association with partition and Qos External email: Please exercise caution Hi Sean Here is the output for gpu-rtx-reserved qos sacctmgr show account withassoc -p | grep gpu-rtx-reserved default|default|default|uea_cluster||cjr13geu|1|||gpu,gpu-k40-1,gpu-rtx,gpu-rtx-reserved,hmem,ht,uea_def_qos| sontrol show part gpu-rtx6000-2 PartitionName=gpu-rtx6000-2 AllowGroups=ALL AllowAccounts=ALL AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea AllocNodes=ALL Default=NO QoS=N/A DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=g[15-29] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=GANG,SUSPEND State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED On a different note we have the following in slurm.conf AccountingStorageUser=slurm But we have been adding qos and assigning users as root ? Can this be an issue Amjad On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby mailto:scro...@unimelb.edu.au>> wrote: What does sacctmgr show for the user you added to have access to the QoS, and what does Slurm show for the partition config? sacctmgr show account withassoc -p scontrol show part gpu-rtx6000-2 Sean From: slurm-users mailto:slurm-users-boun...@lists.schedmd.com>> on behalf of Amjad Syed mailto:amjad...@gmail.com>> Sent: Tuesday, 31 August 2021 17:03 To: Slurm User Community List mailto:slurm-users@lists.schedmd.com>> Subject: Re: [slurm-users] [EXT] User association with partition and Qos External email: Please exercise caution Hello me again Just found out that when our slurmctld restarts all qos are gone. I mean users who have association with the qos can not submit job with sbatch, they get error as sbatch: error: Batch job submission failed: Invalid qos specification Do we need to make anymore changes in slurm.conf so that qos becomes permanent ? Amjad On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed mailto:amjad...@gmail.com>> wrote: Hi Sean, Thanks for the suggestion, seems to work now. Majid On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby mailto:scro...@unimelb.edu.au>> wrote: Hi Amjad, Make sure you have qos in the config entry AccountingStorageEnforce e.g. AccountingStorageEnforce=associations,limits,qos,safe Sean From: slurm-users mailto:slurm-users-boun...@lists.schedmd.com>> on behalf of Amjad Syed mailto:amjad...@gmail.com>> Sent: Friday, 27 August 2021 20:28 To: slurm-us...@schedmd.com<mailto:slurm-us...@schedmd.com> mailto:slurm-us...@schedmd.com>> Subject: [EXT] [slurm-users] User association with partition and Qos External email: Please exercise caution Hello all We are having an issue understanding user association and partition. Currently we have a partition with 30 GPU cards . We have defined a qos gpu-rtx that allows user to reserve 2 cards sacctmgr show qos gpu-rtx format=MaxTRESPU%60 MaxTRESPU - cpu=96,gres/gpu=2 We have defined a user test that is assoc with this qos sacctmgr show assoc user=test format=user,qos Qos gpu-rtx Now we define another qos gpu-rtx-reserved that allows gpu=8 sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 MaxTRESPU - cpu=192,gres/gpu=8 User test is not associated with gpu-rtx-reserved qos. So he should not be able to use more then gpu=2 . Both of these qos are now in slurm.conf for the partition parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved But we found out that even though user is not assoc with gpu-rtx-reserved if the user uses
Re: [slurm-users] [EXT] User association with partition and Qos
Hi Sean Here is the output for gpu-rtx-reserved qos sacctmgr show account withassoc -p | grep gpu-rtx-reserved default|default|default|uea_cluster||cjr13geu|1|||gpu,gpu-k40-1,gpu-rtx, *gpu-rtx-reserved*,hmem,ht,uea_def_qos| sontrol show part gpu-rtx6000-2 PartitionName=gpu-rtx6000-2 AllowGroups=ALL AllowAccounts=ALL AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea AllocNodes=ALL Default=NO QoS=N/A DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=g[15-29] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=GANG,SUSPEND State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED On a different note we have the following in slurm.conf AccountingStorageUser=slurm But we have been adding qos and assigning users as root ? Can this be an issue Amjad On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby wrote: > What does sacctmgr show for the user you added to have access to the QoS, > and what does Slurm show for the partition config? > > sacctmgr show account withassoc -p > scontrol show part gpu-rtx6000-2 > > Sean > -- > *From:* slurm-users on behalf of > Amjad Syed > *Sent:* Tuesday, 31 August 2021 17:03 > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos > > * External email: Please exercise caution * > -- > Hello me again > > Just found out that when our slurmctld restarts all qos are gone. > > I mean users who have association with the qos can not submit job with > sbatch, they get error as > > sbatch: error: Batch job submission failed: Invalid qos specification > > > Do we need to make anymore changes in slurm.conf so that qos becomes > permanent ? > > Amjad > > On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed wrote: > > Hi Sean, > > Thanks for the suggestion, seems to work now. > > Majid > > On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby > wrote: > > Hi Amjad, > > Make sure you have qos in the config entry AccountingStorageEnforce > > e.g. > > AccountingStorageEnforce=associations,limits,qos,safe > > Sean > > -- > *From:* slurm-users on behalf of > Amjad Syed > *Sent:* Friday, 27 August 2021 20:28 > *To:* slurm-us...@schedmd.com > *Subject:* [EXT] [slurm-users] User association with partition and Qos > > * External email: Please exercise caution * > -- > Hello all > > We are having an issue understanding user association and partition. > > Currently we have a partition with 30 GPU cards . > > We have defined a qos gpu-rtx that allows user to reserve 2 cards > > sacctmgr show qos gpu-rtx format=MaxTRESPU%60 > >MaxTRESPU > >- >cpu=96,gres/gpu=2 > > > > > We have defined a user test that is assoc with this qos > > > sacctmgr show assoc user=test format=user,qos > > > Qos > > gpu-rtx > > > > Now we define another qos gpu-rtx-reserved that allows gpu=8 > > > sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 > >MaxTRESPU > >- >cpu=192,gres/gpu=8 > > User test is not associated with gpu-rtx-reserved qos. So he should not be > able to use more then gpu=2 . > Both of these qos are now in slurm.conf for the partition > > parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 > MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved > > > > But we found out that even though user is not assoc with gpu-rtx-reserved > if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 > gpu cards > > > So our question is , can the users assoc with one partition qos can use > the other qos in the partition even if they are not associated with it . > or in other words , we can only define one partition qos and not more then > one.? > > > Hope i was able to explain ? > > > Any advice if we want partition to use more then one qos with different > limits and users associated with one qos should not use other qos ? > > > Majid > > > > >
Re: [slurm-users] [EXT] User association with partition and Qos
What does sacctmgr show for the user you added to have access to the QoS, and what does Slurm show for the partition config? sacctmgr show account withassoc -p scontrol show part gpu-rtx6000-2 Sean From: slurm-users on behalf of Amjad Syed Sent: Tuesday, 31 August 2021 17:03 To: Slurm User Community List Subject: Re: [slurm-users] [EXT] User association with partition and Qos External email: Please exercise caution Hello me again Just found out that when our slurmctld restarts all qos are gone. I mean users who have association with the qos can not submit job with sbatch, they get error as sbatch: error: Batch job submission failed: Invalid qos specification Do we need to make anymore changes in slurm.conf so that qos becomes permanent ? Amjad On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed mailto:amjad...@gmail.com>> wrote: Hi Sean, Thanks for the suggestion, seems to work now. Majid On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby mailto:scro...@unimelb.edu.au>> wrote: Hi Amjad, Make sure you have qos in the config entry AccountingStorageEnforce e.g. AccountingStorageEnforce=associations,limits,qos,safe Sean From: slurm-users mailto:slurm-users-boun...@lists.schedmd.com>> on behalf of Amjad Syed mailto:amjad...@gmail.com>> Sent: Friday, 27 August 2021 20:28 To: slurm-us...@schedmd.com<mailto:slurm-us...@schedmd.com> mailto:slurm-us...@schedmd.com>> Subject: [EXT] [slurm-users] User association with partition and Qos External email: Please exercise caution Hello all We are having an issue understanding user association and partition. Currently we have a partition with 30 GPU cards . We have defined a qos gpu-rtx that allows user to reserve 2 cards sacctmgr show qos gpu-rtx format=MaxTRESPU%60 MaxTRESPU - cpu=96,gres/gpu=2 We have defined a user test that is assoc with this qos sacctmgr show assoc user=test format=user,qos Qos gpu-rtx Now we define another qos gpu-rtx-reserved that allows gpu=8 sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 MaxTRESPU - cpu=192,gres/gpu=8 User test is not associated with gpu-rtx-reserved qos. So he should not be able to use more then gpu=2 . Both of these qos are now in slurm.conf for the partition parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved But we found out that even though user is not assoc with gpu-rtx-reserved if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 gpu cards So our question is , can the users assoc with one partition qos can use the other qos in the partition even if they are not associated with it . or in other words , we can only define one partition qos and not more then one.? Hope i was able to explain ? Any advice if we want partition to use more then one qos with different limits and users associated with one qos should not use other qos ? Majid
Re: [slurm-users] [EXT] User association with partition and Qos
Hello me again Just found out that when our slurmctld restarts all qos are gone. I mean users who have association with the qos can not submit job with sbatch, they get error as sbatch: error: Batch job submission failed: Invalid qos specification Do we need to make anymore changes in slurm.conf so that qos becomes permanent ? Amjad On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed wrote: > Hi Sean, > > Thanks for the suggestion, seems to work now. > > Majid > > On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby > wrote: > >> Hi Amjad, >> >> Make sure you have qos in the config entry AccountingStorageEnforce >> >> e.g. >> >> AccountingStorageEnforce=associations,limits,qos,safe >> >> Sean >> >> -- >> *From:* slurm-users on behalf of >> Amjad Syed >> *Sent:* Friday, 27 August 2021 20:28 >> *To:* slurm-us...@schedmd.com >> *Subject:* [EXT] [slurm-users] User association with partition and Qos >> >> * External email: Please exercise caution * >> -- >> Hello all >> >> We are having an issue understanding user association and partition. >> >> Currently we have a partition with 30 GPU cards . >> >> We have defined a qos gpu-rtx that allows user to reserve 2 cards >> >> sacctmgr show qos gpu-rtx format=MaxTRESPU%60 >> >>MaxTRESPU >> >>- >>cpu=96,gres/gpu=2 >> >> >> >> >> We have defined a user test that is assoc with this qos >> >> >> sacctmgr show assoc user=test format=user,qos >> >> >> Qos >> >> gpu-rtx >> >> >> >> Now we define another qos gpu-rtx-reserved that allows gpu=8 >> >> >> sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 >> >>MaxTRESPU >> >>- >>cpu=192,gres/gpu=8 >> >> User test is not associated with gpu-rtx-reserved qos. So he should not >> be able to use more then gpu=2 . >> Both of these qos are now in slurm.conf for the partition >> >> parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 >> MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved >> >> >> >> But we found out that even though user is not assoc with gpu-rtx-reserved >> if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 >> gpu cards >> >> >> So our question is , can the users assoc with one partition qos can use >> the other qos in the partition even if they are not associated with it . >> or in other words , we can only define one partition qos and not more then >> one.? >> >> >> Hope i was able to explain ? >> >> >> Any advice if we want partition to use more then one qos with different >> limits and users associated with one qos should not use other qos ? >> >> >> Majid >> >> >> >> >>
Re: [slurm-users] [EXT] User association with partition and Qos
Hi Sean, Thanks for the suggestion, seems to work now. Majid On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby wrote: > Hi Amjad, > > Make sure you have qos in the config entry AccountingStorageEnforce > > e.g. > > AccountingStorageEnforce=associations,limits,qos,safe > > Sean > > -- > *From:* slurm-users on behalf of > Amjad Syed > *Sent:* Friday, 27 August 2021 20:28 > *To:* slurm-us...@schedmd.com > *Subject:* [EXT] [slurm-users] User association with partition and Qos > > * External email: Please exercise caution * > -- > Hello all > > We are having an issue understanding user association and partition. > > Currently we have a partition with 30 GPU cards . > > We have defined a qos gpu-rtx that allows user to reserve 2 cards > > sacctmgr show qos gpu-rtx format=MaxTRESPU%60 > >MaxTRESPU > >- >cpu=96,gres/gpu=2 > > > > > We have defined a user test that is assoc with this qos > > > sacctmgr show assoc user=test format=user,qos > > > Qos > > gpu-rtx > > > > Now we define another qos gpu-rtx-reserved that allows gpu=8 > > > sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 > >MaxTRESPU > >- >cpu=192,gres/gpu=8 > > User test is not associated with gpu-rtx-reserved qos. So he should not be > able to use more then gpu=2 . > Both of these qos are now in slurm.conf for the partition > > parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 > MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved > > > > But we found out that even though user is not assoc with gpu-rtx-reserved > if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 > gpu cards > > > So our question is , can the users assoc with one partition qos can use > the other qos in the partition even if they are not associated with it . > or in other words , we can only define one partition qos and not more then > one.? > > > Hope i was able to explain ? > > > Any advice if we want partition to use more then one qos with different > limits and users associated with one qos should not use other qos ? > > > Majid > > > > >
Re: [slurm-users] [EXT] User association with partition and Qos
Hi Amjad, Make sure you have qos in the config entry AccountingStorageEnforce e.g. AccountingStorageEnforce=associations,limits,qos,safe Sean From: slurm-users on behalf of Amjad Syed Sent: Friday, 27 August 2021 20:28 To: slurm-us...@schedmd.com Subject: [EXT] [slurm-users] User association with partition and Qos External email: Please exercise caution Hello all We are having an issue understanding user association and partition. Currently we have a partition with 30 GPU cards . We have defined a qos gpu-rtx that allows user to reserve 2 cards sacctmgr show qos gpu-rtx format=MaxTRESPU%60 MaxTRESPU - cpu=96,gres/gpu=2 We have defined a user test that is assoc with this qos sacctmgr show assoc user=test format=user,qos Qos gpu-rtx Now we define another qos gpu-rtx-reserved that allows gpu=8 sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 MaxTRESPU - cpu=192,gres/gpu=8 User test is not associated with gpu-rtx-reserved qos. So he should not be able to use more then gpu=2 . Both of these qos are now in slurm.conf for the partition parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved But we found out that even though user is not assoc with gpu-rtx-reserved if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 gpu cards So our question is , can the users assoc with one partition qos can use the other qos in the partition even if they are not associated with it . or in other words , we can only define one partition qos and not more then one.? Hope i was able to explain ? Any advice if we want partition to use more then one qos with different limits and users associated with one qos should not use other qos ? Majid