Ah-ha! Figured out what I did wrong:

  "sacctmgr modify user foo set qos=drain"

  This set the list of qos available to the user. The user inherited a
  default qos job setting of "normal", which wasn't allowed - hence the
  InvalidQOS.

I needed to override the default qos for foo's jobs:

  "sacctmgr modify user foo set qos=drain defaultqos=drain"

  And then update the qos on all of foo's waiting jobs.

I'll be using David's GrpSubmitJobs=0 suggestion instead.

Thanks for everyone's help,

Mark

On Wed, 1 Apr 2020, Mark Dixon wrote:

Hi Ahmet,

Another way to do it! Many thanks - very useful :)

But does anyone know why the a user association with my qos stopped jobs running with InvalidQOS?

I can imagine using a user qos to override a partition qos being useful for other things, so would be nice to know what I've done wrong.

Best,

Mark

On Wed, 1 Apr 2020, mercan wrote:

 Hi;

 If you have working job_submit.lua script, you can put a block new jobs of
 the spesific user:

 if job_desc.user_name == "baduser" then
                 return 2045
 end

 thats all!

 Regards;

 Ahmet M.


 1.04.2020 16:22 tarihinde Mark Dixon yazdı:
  Hi David,

  Thanks for this, it sounds like I've not been trying crazy methods - but
  they don't work for me:

  - "sacctmgr modify user foo set qos=drain" did set up the association
    ("sacctmgr show associations" showed that QoS changed from "normal" to
    "drain"), but this is when foo's jobs refused to start because of
  reason
    "InvalidQOS".

  - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos
    were already set on the partitions.

  But... good news!

  We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user
  foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is
  exactly what I wanted - thanks!

  But if anyone knows why my attempt at using a "drain" qos stopped foo's
  previously submitted jobs from running, I'd be very interested to hear
  about it.

  Thanks again,

  Mark

  On Wed, 1 Apr 2020, David Rhey wrote:

  Hi Mark,

  I *think* you might need to update the user account to have access to
  that
  QoS (as part of their association). Using sacctmgr modify user <foo> +
  some
  additional args (they escape me at the moment).

  Also, you *might* have been able to set the MaxSubmitJobs at their
  account
  level to 0 and have them run without having to do the QoS approach -
  but
  that's just a guess on my end based on how we've done some things here.
  We
  had a "free period" for our clusters and once it was over we set the
  GrpSubmit jobs on an account to 0 which allowed in-flight jobs to
  continue
  but no new work to be submitted.

  HTH,

  David

  On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon <mark.c.di...@durham.ac.uk>
  wrote:

  Hi all,

  I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.

  I'd like to stop user foo from submitting new jobs but allow their
  existing jobs to run.

  We have several partitions, each with its own qos and MaxSubmitJobs
  typically set to some vaue. These qos are stopping a "sacctmgr update
  user
  foo set maxsubmitjobs=0" from doing anything useful, as per the
  documentation.

  I've tried setting up a competing qos:

     sacctmgr add qos drain
     sacctmgr modify qos drain set MaxSubmitJobs=0
     sacctmgr modify qos drain set flags=OverPartQOS
     sacctmgr modify user foo set qos=drain

  This has successfully prevented the user from submitting new jobs, but
  their existing jobs aren't running. I'm seeing the reason code
  "InvalidQOS".

  Any ideas what I should be looking at, please?

  Thanks,

  Mark



  --
  David Rhey
  ---------------
  Advanced Research Computing - Technology Services
  University of Michigan


Reply via email to