Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon

Ah-ha! Figured out what I did wrong:

  "sacctmgr modify user foo set qos=drain"

  This set the list of qos available to the user. The user inherited a
  default qos job setting of "normal", which wasn't allowed - hence the
  InvalidQOS.

I needed to override the default qos for foo's jobs:

  "sacctmgr modify user foo set qos=drain defaultqos=drain"

  And then update the qos on all of foo's waiting jobs.

I'll be using David's GrpSubmitJobs=0 suggestion instead.

Thanks for everyone's help,

Mark

On Wed, 1 Apr 2020, Mark Dixon wrote:


Hi Ahmet,

Another way to do it! Many thanks - very useful :)

But does anyone know why the a user association with my qos stopped jobs 
running with InvalidQOS?


I can imagine using a user qos to override a partition qos being useful for 
other things, so would be nice to know what I've done wrong.


Best,

Mark

On Wed, 1 Apr 2020, mercan wrote:


 Hi;

 If you have working job_submit.lua script, you can put a block new jobs of
 the spesific user:

 if job_desc.user_name == "baduser" then
     return 2045
 end

 thats all!

 Regards;

 Ahmet M.


 1.04.2020 16:22 tarihinde Mark Dixon yazdı:

  Hi David,

  Thanks for this, it sounds like I've not been trying crazy methods - but
  they don't work for me:

  - "sacctmgr modify user foo set qos=drain" did set up the association
    ("sacctmgr show associations" showed that QoS changed from "normal" to
    "drain"), but this is when foo's jobs refused to start because of
  reason
    "InvalidQOS".

  - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos
    were already set on the partitions.

  But... good news!

  We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user
  foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is
  exactly what I wanted - thanks!

  But if anyone knows why my attempt at using a "drain" qos stopped foo's
  previously submitted jobs from running, I'd be very interested to hear
  about it.

  Thanks again,

  Mark

  On Wed, 1 Apr 2020, David Rhey wrote:


  Hi Mark,

  I *think* you might need to update the user account to have access to
  that
  QoS (as part of their association). Using sacctmgr modify user  +
  some
  additional args (they escape me at the moment).

  Also, you *might* have been able to set the MaxSubmitJobs at their
  account
  level to 0 and have them run without having to do the QoS approach -
  but
  that's just a guess on my end based on how we've done some things here.
  We
  had a "free period" for our clusters and once it was over we set the
  GrpSubmit jobs on an account to 0 which allowed in-flight jobs to
  continue
  but no new work to be submitted.

  HTH,

  David

  On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon 
  wrote:


  Hi all,

  I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.

  I'd like to stop user foo from submitting new jobs but allow their
  existing jobs to run.

  We have several partitions, each with its own qos and MaxSubmitJobs
  typically set to some vaue. These qos are stopping a "sacctmgr update
  user
  foo set maxsubmitjobs=0" from doing anything useful, as per the
  documentation.

  I've tried setting up a competing qos:

     sacctmgr add qos drain
     sacctmgr modify qos drain set MaxSubmitJobs=0
     sacctmgr modify qos drain set flags=OverPartQOS
     sacctmgr modify user foo set qos=drain

  This has successfully prevented the user from submitting new jobs, but
  their existing jobs aren't running. I'm seeing the reason code
  "InvalidQOS".

  Any ideas what I should be looking at, please?

  Thanks,

  Mark




  --
  David Rhey
  ---
  Advanced Research Computing - Technology Services
  University of Michigan





Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon

Hi Antony,

Thanks for the suggestion but that doesn't do anything either, because our 
partition qos also sets MaxJobs - that value takes precedence.


Best,

Mark

On Wed, 1 Apr 2020, Antony Cleave wrote:


why not just sacctmgr modify user foo set maxjobs=0

existing running jobs will run to completion and pending jobs won't start

Antony

On Wed, 1 Apr 2020 at 10:57, Mark Dixon  wrote:


Hi all,

I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.

I'd like to stop user foo from submitting new jobs but allow their
existing jobs to run.

We have several partitions, each with its own qos and MaxSubmitJobs
typically set to some vaue. These qos are stopping a "sacctmgr update user
foo set maxsubmitjobs=0" from doing anything useful, as per the
documentation.

I've tried setting up a competing qos:

   sacctmgr add qos drain
   sacctmgr modify qos drain set MaxSubmitJobs=0
   sacctmgr modify qos drain set flags=OverPartQOS
   sacctmgr modify user foo set qos=drain

This has successfully prevented the user from submitting new jobs, but
their existing jobs aren't running. I'm seeing the reason code
"InvalidQOS".

Any ideas what I should be looking at, please?

Thanks,

Mark








Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon

Hi Ahmet,

Another way to do it! Many thanks - very useful :)

But does anyone know why the a user association with my qos stopped jobs 
running with InvalidQOS?


I can imagine using a user qos to override a partition qos being useful 
for other things, so would be nice to know what I've done wrong.


Best,

Mark

On Wed, 1 Apr 2020, mercan wrote:


Hi;

If you have working job_submit.lua script, you can put a block new jobs of 
the spesific user:


if job_desc.user_name == "baduser" then
    return 2045
end

thats all!

Regards;

Ahmet M.


1.04.2020 16:22 tarihinde Mark Dixon yazdı:

 Hi David,

 Thanks for this, it sounds like I've not been trying crazy methods - but
 they don't work for me:

 - "sacctmgr modify user foo set qos=drain" did set up the association
   ("sacctmgr show associations" showed that QoS changed from "normal" to
   "drain"), but this is when foo's jobs refused to start because of reason
   "InvalidQOS".

 - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos
   were already set on the partitions.

 But... good news!

 We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user
 foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is
 exactly what I wanted - thanks!

 But if anyone knows why my attempt at using a "drain" qos stopped foo's
 previously submitted jobs from running, I'd be very interested to hear
 about it.

 Thanks again,

 Mark

 On Wed, 1 Apr 2020, David Rhey wrote:


 Hi Mark,

 I *think* you might need to update the user account to have access to
 that
 QoS (as part of their association). Using sacctmgr modify user  +
 some
 additional args (they escape me at the moment).

 Also, you *might* have been able to set the MaxSubmitJobs at their
 account
 level to 0 and have them run without having to do the QoS approach - but
 that's just a guess on my end based on how we've done some things here.
 We
 had a "free period" for our clusters and once it was over we set the
 GrpSubmit jobs on an account to 0 which allowed in-flight jobs to
 continue
 but no new work to be submitted.

 HTH,

 David

 On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon 
 wrote:


 Hi all,

 I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.

 I'd like to stop user foo from submitting new jobs but allow their
 existing jobs to run.

 We have several partitions, each with its own qos and MaxSubmitJobs
 typically set to some vaue. These qos are stopping a "sacctmgr update
 user
 foo set maxsubmitjobs=0" from doing anything useful, as per the
 documentation.

 I've tried setting up a competing qos:

    sacctmgr add qos drain
    sacctmgr modify qos drain set MaxSubmitJobs=0
    sacctmgr modify qos drain set flags=OverPartQOS
    sacctmgr modify user foo set qos=drain

 This has successfully prevented the user from submitting new jobs, but
 their existing jobs aren't running. I'm seeing the reason code
 "InvalidQOS".

 Any ideas what I should be looking at, please?

 Thanks,

 Mark




 --
 David Rhey
 ---
 Advanced Research Computing - Technology Services
 University of Michigan







--
Mark Dixon  Tel: +44(0)191 33 41383
Advanced Research Computing (ARC), Durham University, UK

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread mercan

Hi;

If you have working job_submit.lua script, you can put a block new jobs 
of the spesific user:


if job_desc.user_name == "baduser" then
    return 2045
end

thats all!

Regards;

Ahmet M.


1.04.2020 16:22 tarihinde Mark Dixon yazdı:

Hi David,

Thanks for this, it sounds like I've not been trying crazy methods - 
but they don't work for me:


- "sacctmgr modify user foo set qos=drain" did set up the association
  ("sacctmgr show associations" showed that QoS changed from "normal" to
  "drain"), but this is when foo's jobs refused to start because of 
reason

  "InvalidQOS".

- "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos
  were already set on the partitions.

But... good news!

We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify 
user foo set GrpSubmitJobs=0" isn't overridden anywhere, and the 
effect is exactly what I wanted - thanks!


But if anyone knows why my attempt at using a "drain" qos stopped 
foo's previously submitted jobs from running, I'd be very interested 
to hear about it.


Thanks again,

Mark

On Wed, 1 Apr 2020, David Rhey wrote:


Hi Mark,

I *think* you might need to update the user account to have access to 
that
QoS (as part of their association). Using sacctmgr modify user  
+ some

additional args (they escape me at the moment).

Also, you *might* have been able to set the MaxSubmitJobs at their 
account

level to 0 and have them run without having to do the QoS approach - but
that's just a guess on my end based on how we've done some things 
here. We

had a "free period" for our clusters and once it was over we set the
GrpSubmit jobs on an account to 0 which allowed in-flight jobs to 
continue

but no new work to be submitted.

HTH,

David

On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon  
wrote:



Hi all,

I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.

I'd like to stop user foo from submitting new jobs but allow their
existing jobs to run.

We have several partitions, each with its own qos and MaxSubmitJobs
typically set to some vaue. These qos are stopping a "sacctmgr 
update user

foo set maxsubmitjobs=0" from doing anything useful, as per the
documentation.

I've tried setting up a competing qos:

   sacctmgr add qos drain
   sacctmgr modify qos drain set MaxSubmitJobs=0
   sacctmgr modify qos drain set flags=OverPartQOS
   sacctmgr modify user foo set qos=drain

This has successfully prevented the user from submitting new jobs, but
their existing jobs aren't running. I'm seeing the reason code
"InvalidQOS".

Any ideas what I should be looking at, please?

Thanks,

Mark




--
David Rhey
---
Advanced Research Computing - Technology Services
University of Michigan







Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Antony Cleave
why not just sacctmgr modify user foo set maxjobs=0

existing running jobs will run to completion and pending jobs won't start

Antony

On Wed, 1 Apr 2020 at 10:57, Mark Dixon  wrote:

> Hi all,
>
> I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.
>
> I'd like to stop user foo from submitting new jobs but allow their
> existing jobs to run.
>
> We have several partitions, each with its own qos and MaxSubmitJobs
> typically set to some vaue. These qos are stopping a "sacctmgr update user
> foo set maxsubmitjobs=0" from doing anything useful, as per the
> documentation.
>
> I've tried setting up a competing qos:
>
>sacctmgr add qos drain
>sacctmgr modify qos drain set MaxSubmitJobs=0
>sacctmgr modify qos drain set flags=OverPartQOS
>sacctmgr modify user foo set qos=drain
>
> This has successfully prevented the user from submitting new jobs, but
> their existing jobs aren't running. I'm seeing the reason code
> "InvalidQOS".
>
> Any ideas what I should be looking at, please?
>
> Thanks,
>
> Mark
>
>


Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon

Hi David,

Thanks for this, it sounds like I've not been trying crazy methods - but 
they don't work for me:


- "sacctmgr modify user foo set qos=drain" did set up the association
  ("sacctmgr show associations" showed that QoS changed from "normal" to
  "drain"), but this is when foo's jobs refused to start because of reason
  "InvalidQOS".

- "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos
  were already set on the partitions.

But... good news!

We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user 
foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is 
exactly what I wanted - thanks!


But if anyone knows why my attempt at using a "drain" qos stopped foo's 
previously submitted jobs from running, I'd be very interested to hear 
about it.


Thanks again,

Mark

On Wed, 1 Apr 2020, David Rhey wrote:


Hi Mark,

I *think* you might need to update the user account to have access to that
QoS (as part of their association). Using sacctmgr modify user  + some
additional args (they escape me at the moment).

Also, you *might* have been able to set the MaxSubmitJobs at their account
level to 0 and have them run without having to do the QoS approach - but
that's just a guess on my end based on how we've done some things here. We
had a "free period" for our clusters and once it was over we set the
GrpSubmit jobs on an account to 0 which allowed in-flight jobs to continue
but no new work to be submitted.

HTH,

David

On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon  wrote:


Hi all,

I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.

I'd like to stop user foo from submitting new jobs but allow their
existing jobs to run.

We have several partitions, each with its own qos and MaxSubmitJobs
typically set to some vaue. These qos are stopping a "sacctmgr update user
foo set maxsubmitjobs=0" from doing anything useful, as per the
documentation.

I've tried setting up a competing qos:

   sacctmgr add qos drain
   sacctmgr modify qos drain set MaxSubmitJobs=0
   sacctmgr modify qos drain set flags=OverPartQOS
   sacctmgr modify user foo set qos=drain

This has successfully prevented the user from submitting new jobs, but
their existing jobs aren't running. I'm seeing the reason code
"InvalidQOS".

Any ideas what I should be looking at, please?

Thanks,

Mark




--
David Rhey
---
Advanced Research Computing - Technology Services
University of Michigan





Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread David Rhey
Hi Mark,

I *think* you might need to update the user account to have access to that
QoS (as part of their association). Using sacctmgr modify user  + some
additional args (they escape me at the moment).

Also, you *might* have been able to set the MaxSubmitJobs at their account
level to 0 and have them run without having to do the QoS approach - but
that's just a guess on my end based on how we've done some things here. We
had a "free period" for our clusters and once it was over we set the
GrpSubmit jobs on an account to 0 which allowed in-flight jobs to continue
but no new work to be submitted.

HTH,

David

On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon  wrote:

> Hi all,
>
> I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.
>
> I'd like to stop user foo from submitting new jobs but allow their
> existing jobs to run.
>
> We have several partitions, each with its own qos and MaxSubmitJobs
> typically set to some vaue. These qos are stopping a "sacctmgr update user
> foo set maxsubmitjobs=0" from doing anything useful, as per the
> documentation.
>
> I've tried setting up a competing qos:
>
>sacctmgr add qos drain
>sacctmgr modify qos drain set MaxSubmitJobs=0
>sacctmgr modify qos drain set flags=OverPartQOS
>sacctmgr modify user foo set qos=drain
>
> This has successfully prevented the user from submitting new jobs, but
> their existing jobs aren't running. I'm seeing the reason code
> "InvalidQOS".
>
> Any ideas what I should be looking at, please?
>
> Thanks,
>
> Mark
>
>

-- 
David Rhey
---
Advanced Research Computing - Technology Services
University of Michigan