Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
Ah-ha! Figured out what I did wrong: "sacctmgr modify user foo set qos=drain" This set the list of qos available to the user. The user inherited a default qos job setting of "normal", which wasn't allowed - hence the InvalidQOS. I needed to override the default qos for foo's jobs:

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
Hi Antony, Thanks for the suggestion but that doesn't do anything either, because our partition qos also sets MaxJobs - that value takes precedence. Best, Mark On Wed, 1 Apr 2020, Antony Cleave wrote: why not just sacctmgr modify user foo set maxjobs=0 existing running jobs will run to

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
Hi Ahmet, Another way to do it! Many thanks - very useful :) But does anyone know why the a user association with my qos stopped jobs running with InvalidQOS? I can imagine using a user qos to override a partition qos being useful for other things, so would be nice to know what I've done

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread mercan
Hi; If you have working job_submit.lua script, you can put a block new jobs of the spesific user: if job_desc.user_name == "baduser" then     return 2045 end thats all! Regards; Ahmet M. 1.04.2020 16:22 tarihinde Mark Dixon yazdı: Hi David, Thanks for this, it sounds like

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Antony Cleave
why not just sacctmgr modify user foo set maxjobs=0 existing running jobs will run to completion and pending jobs won't start Antony On Wed, 1 Apr 2020 at 10:57, Mark Dixon wrote: > Hi all, > > I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. > > I'd like to stop user

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
Hi David, Thanks for this, it sounds like I've not been trying crazy methods - but they don't work for me: - "sacctmgr modify user foo set qos=drain" did set up the association ("sacctmgr show associations" showed that QoS changed from "normal" to "drain"), but this is when foo's jobs

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread David Rhey
Hi Mark, I *think* you might need to update the user account to have access to that QoS (as part of their association). Using sacctmgr modify user + some additional args (they escape me at the moment). Also, you *might* have been able to set the MaxSubmitJobs at their account level to 0 and

[slurm-users] Drain a single user's jobs

2020-04-01 Thread Mark Dixon
Hi all, I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster. I'd like to stop user foo from submitting new jobs but allow their existing jobs to run. We have several partitions, each with its own qos and MaxSubmitJobs typically set to some vaue. These qos are stopping a

Re: [slurm-users] Job with srun is still RUNNING after node reboot

2020-04-01 Thread Yair Yarom
I've checked it now, it isn't listed as a runaway job. On Tue, Mar 31, 2020 at 5:24 PM David Rhey wrote: > Hi, Yair, > > Out of curiosity have you checked to see if this is a runaway job? > > David > > On Tue, Mar 31, 2020 at 7:49 AM Yair Yarom wrote: > >> Hi, >> >> We have an issue where

Re: [slurm-users] not allocating the node for job execution even resources are available.

2020-04-01 Thread navin srivastava
In addition to the above problem . oversubscription is NO then according to the document.so in this scenario even if resources are available it is ot accepting the job from other partition. Even i made the same priority for both the partition but it didn't help. Any Suggestion here. Slurm