Re: [slurm-users] Drain a single user's jobs
Ah-ha! Figured out what I did wrong: "sacctmgr modify user foo set qos=drain" This set the list of qos available to the user. The user inherited a default qos job setting of "normal", which wasn't allowed - hence the InvalidQOS. I needed to override the default qos for foo's jobs: "sacctmgr modify user foo set qos=drain defaultqos=drain" And then update the qos on all of foo's waiting jobs. I'll be using David's GrpSubmitJobs=0 suggestion instead. Thanks for everyone's help, Mark On Wed, 1 Apr 2020, Mark Dixon wrote: Hi Ahmet, Another way to do it! Many thanks - very useful :) But does anyone know why the a user association with my qos stopped jobs running with InvalidQOS? I can imagine using a user qos to override a partition qos being useful for other things, so would be nice to know what I've done wrong. Best, Mark On Wed, 1 Apr 2020, mercan wrote: Hi; If you have working job_submit.lua script, you can put a block new jobs of the spesific user: if job_desc.user_name == "baduser" then return 2045 end thats all! Regards; Ahmet M. 1.04.2020 16:22 tarihinde Mark Dixon yazdı: Hi David, Thanks for this, it sounds like I've not been trying crazy methods - but they don't work for me: - "sacctmgr modify user foo set qos=drain" did set up the association ("sacctmgr show associations" showed that QoS changed from "normal" to "drain"), but this is when foo's jobs refused to start because of reason "InvalidQOS". - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos were already set on the partitions. But... good news! We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is exactly what I wanted - thanks! But if anyone knows why my attempt at using a "drain" qos stopped foo's previously submitted jobs from running, I'd be very interested to hear about it. Thanks again, Mark On Wed, 1 Apr 2020, David Rhey wrote: Hi Mark, I *think* you might need to update the user account to have access to that QoS (as part of their association). Using sacctmgr modify user + some additional args (they escape me at the moment). Also, you *might* have been able to set the MaxSubmitJobs at their account level to 0 and have them run without having to do the QoS approach - but that's just a guess on my end based on how we've done some things here. We had a "free period" for our clusters and once it was over we set the GrpSubmit jobs on an account to 0 which allowed in-flight jobs to continue but no new work to be submitted. HTH, David On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon wrote: Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo from submitting new jobs but allow their existing jobs to run. We have several partitions, each with its own qos and MaxSubmitJobs typically set to some vaue. These qos are stopping a "sacctmgr update user foo set maxsubmitjobs=0" from doing anything useful, as per the documentation. I've tried setting up a competing qos: sacctmgr add qos drain sacctmgr modify qos drain set MaxSubmitJobs=0 sacctmgr modify qos drain set flags=OverPartQOS sacctmgr modify user foo set qos=drain This has successfully prevented the user from submitting new jobs, but their existing jobs aren't running. I'm seeing the reason code "InvalidQOS". Any ideas what I should be looking at, please? Thanks, Mark -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan
Re: [slurm-users] Drain a single user's jobs
Hi Antony, Thanks for the suggestion but that doesn't do anything either, because our partition qos also sets MaxJobs - that value takes precedence. Best, Mark On Wed, 1 Apr 2020, Antony Cleave wrote: why not just sacctmgr modify user foo set maxjobs=0 existing running jobs will run to completion and pending jobs won't start Antony On Wed, 1 Apr 2020 at 10:57, Mark Dixon wrote: Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo from submitting new jobs but allow their existing jobs to run. We have several partitions, each with its own qos and MaxSubmitJobs typically set to some vaue. These qos are stopping a "sacctmgr update user foo set maxsubmitjobs=0" from doing anything useful, as per the documentation. I've tried setting up a competing qos: sacctmgr add qos drain sacctmgr modify qos drain set MaxSubmitJobs=0 sacctmgr modify qos drain set flags=OverPartQOS sacctmgr modify user foo set qos=drain This has successfully prevented the user from submitting new jobs, but their existing jobs aren't running. I'm seeing the reason code "InvalidQOS". Any ideas what I should be looking at, please? Thanks, Mark
Re: [slurm-users] Drain a single user's jobs
Hi Ahmet, Another way to do it! Many thanks - very useful :) But does anyone know why the a user association with my qos stopped jobs running with InvalidQOS? I can imagine using a user qos to override a partition qos being useful for other things, so would be nice to know what I've done wrong. Best, Mark On Wed, 1 Apr 2020, mercan wrote: Hi; If you have working job_submit.lua script, you can put a block new jobs of the spesific user: if job_desc.user_name == "baduser" then return 2045 end thats all! Regards; Ahmet M. 1.04.2020 16:22 tarihinde Mark Dixon yazdı: Hi David, Thanks for this, it sounds like I've not been trying crazy methods - but they don't work for me: - "sacctmgr modify user foo set qos=drain" did set up the association ("sacctmgr show associations" showed that QoS changed from "normal" to "drain"), but this is when foo's jobs refused to start because of reason "InvalidQOS". - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos were already set on the partitions. But... good news! We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is exactly what I wanted - thanks! But if anyone knows why my attempt at using a "drain" qos stopped foo's previously submitted jobs from running, I'd be very interested to hear about it. Thanks again, Mark On Wed, 1 Apr 2020, David Rhey wrote: Hi Mark, I *think* you might need to update the user account to have access to that QoS (as part of their association). Using sacctmgr modify user + some additional args (they escape me at the moment). Also, you *might* have been able to set the MaxSubmitJobs at their account level to 0 and have them run without having to do the QoS approach - but that's just a guess on my end based on how we've done some things here. We had a "free period" for our clusters and once it was over we set the GrpSubmit jobs on an account to 0 which allowed in-flight jobs to continue but no new work to be submitted. HTH, David On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon wrote: Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo from submitting new jobs but allow their existing jobs to run. We have several partitions, each with its own qos and MaxSubmitJobs typically set to some vaue. These qos are stopping a "sacctmgr update user foo set maxsubmitjobs=0" from doing anything useful, as per the documentation. I've tried setting up a competing qos: sacctmgr add qos drain sacctmgr modify qos drain set MaxSubmitJobs=0 sacctmgr modify qos drain set flags=OverPartQOS sacctmgr modify user foo set qos=drain This has successfully prevented the user from submitting new jobs, but their existing jobs aren't running. I'm seeing the reason code "InvalidQOS". Any ideas what I should be looking at, please? Thanks, Mark -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan -- Mark Dixon Tel: +44(0)191 33 41383 Advanced Research Computing (ARC), Durham University, UK
Re: [slurm-users] Drain a single user's jobs
Hi; If you have working job_submit.lua script, you can put a block new jobs of the spesific user: if job_desc.user_name == "baduser" then return 2045 end thats all! Regards; Ahmet M. 1.04.2020 16:22 tarihinde Mark Dixon yazdı: Hi David, Thanks for this, it sounds like I've not been trying crazy methods - but they don't work for me: - "sacctmgr modify user foo set qos=drain" did set up the association ("sacctmgr show associations" showed that QoS changed from "normal" to "drain"), but this is when foo's jobs refused to start because of reason "InvalidQOS". - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos were already set on the partitions. But... good news! We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is exactly what I wanted - thanks! But if anyone knows why my attempt at using a "drain" qos stopped foo's previously submitted jobs from running, I'd be very interested to hear about it. Thanks again, Mark On Wed, 1 Apr 2020, David Rhey wrote: Hi Mark, I *think* you might need to update the user account to have access to that QoS (as part of their association). Using sacctmgr modify user + some additional args (they escape me at the moment). Also, you *might* have been able to set the MaxSubmitJobs at their account level to 0 and have them run without having to do the QoS approach - but that's just a guess on my end based on how we've done some things here. We had a "free period" for our clusters and once it was over we set the GrpSubmit jobs on an account to 0 which allowed in-flight jobs to continue but no new work to be submitted. HTH, David On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon wrote: Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo from submitting new jobs but allow their existing jobs to run. We have several partitions, each with its own qos and MaxSubmitJobs typically set to some vaue. These qos are stopping a "sacctmgr update user foo set maxsubmitjobs=0" from doing anything useful, as per the documentation. I've tried setting up a competing qos: sacctmgr add qos drain sacctmgr modify qos drain set MaxSubmitJobs=0 sacctmgr modify qos drain set flags=OverPartQOS sacctmgr modify user foo set qos=drain This has successfully prevented the user from submitting new jobs, but their existing jobs aren't running. I'm seeing the reason code "InvalidQOS". Any ideas what I should be looking at, please? Thanks, Mark -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan
Re: [slurm-users] Drain a single user's jobs
why not just sacctmgr modify user foo set maxjobs=0 existing running jobs will run to completion and pending jobs won't start Antony On Wed, 1 Apr 2020 at 10:57, Mark Dixon wrote: > Hi all, > > I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. > > I'd like to stop user foo from submitting new jobs but allow their > existing jobs to run. > > We have several partitions, each with its own qos and MaxSubmitJobs > typically set to some vaue. These qos are stopping a "sacctmgr update user > foo set maxsubmitjobs=0" from doing anything useful, as per the > documentation. > > I've tried setting up a competing qos: > >sacctmgr add qos drain >sacctmgr modify qos drain set MaxSubmitJobs=0 >sacctmgr modify qos drain set flags=OverPartQOS >sacctmgr modify user foo set qos=drain > > This has successfully prevented the user from submitting new jobs, but > their existing jobs aren't running. I'm seeing the reason code > "InvalidQOS". > > Any ideas what I should be looking at, please? > > Thanks, > > Mark > >
Re: [slurm-users] Drain a single user's jobs
Hi David, Thanks for this, it sounds like I've not been trying crazy methods - but they don't work for me: - "sacctmgr modify user foo set qos=drain" did set up the association ("sacctmgr show associations" showed that QoS changed from "normal" to "drain"), but this is when foo's jobs refused to start because of reason "InvalidQOS". - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos were already set on the partitions. But... good news! We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is exactly what I wanted - thanks! But if anyone knows why my attempt at using a "drain" qos stopped foo's previously submitted jobs from running, I'd be very interested to hear about it. Thanks again, Mark On Wed, 1 Apr 2020, David Rhey wrote: Hi Mark, I *think* you might need to update the user account to have access to that QoS (as part of their association). Using sacctmgr modify user + some additional args (they escape me at the moment). Also, you *might* have been able to set the MaxSubmitJobs at their account level to 0 and have them run without having to do the QoS approach - but that's just a guess on my end based on how we've done some things here. We had a "free period" for our clusters and once it was over we set the GrpSubmit jobs on an account to 0 which allowed in-flight jobs to continue but no new work to be submitted. HTH, David On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon wrote: Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo from submitting new jobs but allow their existing jobs to run. We have several partitions, each with its own qos and MaxSubmitJobs typically set to some vaue. These qos are stopping a "sacctmgr update user foo set maxsubmitjobs=0" from doing anything useful, as per the documentation. I've tried setting up a competing qos: sacctmgr add qos drain sacctmgr modify qos drain set MaxSubmitJobs=0 sacctmgr modify qos drain set flags=OverPartQOS sacctmgr modify user foo set qos=drain This has successfully prevented the user from submitting new jobs, but their existing jobs aren't running. I'm seeing the reason code "InvalidQOS". Any ideas what I should be looking at, please? Thanks, Mark -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan
Re: [slurm-users] Drain a single user's jobs
Hi Mark, I *think* you might need to update the user account to have access to that QoS (as part of their association). Using sacctmgr modify user + some additional args (they escape me at the moment). Also, you *might* have been able to set the MaxSubmitJobs at their account level to 0 and have them run without having to do the QoS approach - but that's just a guess on my end based on how we've done some things here. We had a "free period" for our clusters and once it was over we set the GrpSubmit jobs on an account to 0 which allowed in-flight jobs to continue but no new work to be submitted. HTH, David On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon wrote: > Hi all, > > I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. > > I'd like to stop user foo from submitting new jobs but allow their > existing jobs to run. > > We have several partitions, each with its own qos and MaxSubmitJobs > typically set to some vaue. These qos are stopping a "sacctmgr update user > foo set maxsubmitjobs=0" from doing anything useful, as per the > documentation. > > I've tried setting up a competing qos: > >sacctmgr add qos drain >sacctmgr modify qos drain set MaxSubmitJobs=0 >sacctmgr modify qos drain set flags=OverPartQOS >sacctmgr modify user foo set qos=drain > > This has successfully prevented the user from submitting new jobs, but > their existing jobs aren't running. I'm seeing the reason code > "InvalidQOS". > > Any ideas what I should be looking at, please? > > Thanks, > > Mark > > -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan