Re: [slurm-users] [EXT] Is it possible to set a default QOS per partition?

2021-03-01 Thread Sean Crosby
I would have thought partition QoS is the way to do this. We add partition QoS to our partition definitions, and implement quotas on usage as well. PartitionName=physical Nodes=... Default=YES MaxTime=30-0 DefaultTime=0:10:0 State=DOWN QoS=physical TRESBillingWeights=CPU=1.0,Mem=4.0G We then

Re: [slurm-users] [External] Is it possible to set a default QOS per partition?

2021-03-01 Thread Loris Bennett
Stack Korora writes: > On 3/1/21 4:26 PM, Prentice Bisbal wrote: >> Two things: >> >> 1. So your users are okay with specifying a partition, but specifying a QOS >> is >> a bridge too far? > > *sigh* Yeah. It's been requested several times. I can't defend it, but if it > makes them happy...then

Re: [slurm-users] [External] Is it possible to set a default QOS per partition?

2021-03-01 Thread Prentice Bisbal
Two things: 1. So your users are okay with specifying a partition, but specifying a QOS is a bridge too far? 2. Have your job_submit.lua script filter the jobs into the correct QOS. You can check the partition and set the QOS accordingly. First, you need to have this set in your

[slurm-users] Is it possible to set a default QOS per partition?

2021-03-01 Thread Stack Korora
Greetings, We have different node classes that we've set up in different partitions. For example, we have our standard compute nodes in compute; our GPU's in a gpu partition; and jobs that need to run for months go into a long partition with a different set of machines. For each partition,

Re: [slurm-users] [External] Is it possible to set a default QOS per partition?

2021-03-01 Thread Stack Korora
On 3/1/21 4:26 PM, Prentice Bisbal wrote: Two things: 1. So your users are okay with specifying a partition, but specifying a QOS is a bridge too far? *sigh* Yeah. It's been requested several times. I can't defend it, but if it makes them happy...then they will find something else to

Re: [slurm-users] [External] Preemption not working in 20.11

2021-03-01 Thread Prentice Bisbal
Thanks for the info and link to your bug report. Unfortunately, my GraceTime is already set to zero for that QOS: $ sacctmgr show qos interruptible format=Name,gracetime   Name  GraceTime -- -- interrupt+   00:00:00 On 2/26/21 3:58 PM, Michael Robbert wrote: We saw

[slurm-users] slurmctld segfaulting

2021-03-01 Thread Ransom, Geoffrey M.
Hello I have a ticket posted with schedmd, but this may be an issue the community has seen and may have a quick response. Slurmctld segfaulted (signal 11) on us and now segfaults on restart. I'm not aware of an obvious trigger for this behavior. We upgraded this cluster from 20.02.5 to

Re: [slurm-users] fix missing accounting entries

2021-03-01 Thread Brian Andrus
Wow. I am definitely having a Monday. I used that all the time and just could not remember the word even to search. Thanks! Brian On 3/1/2021 9:23 AM, Sarlo, Jeffrey S wrote: Were you thinking of this * Report current jobs that have been orphanded on the local cluster and are now

Re: [slurm-users] fix missing accounting entries

2021-03-01 Thread Sarlo, Jeffrey S
Were you thinking of this * Report current jobs that have been orphanded on the local cluster and are now runaway: sacctmgr show RunawayJobs From: slurm-users on behalf of Brian Andrus Sent: Monday, March 1, 2021 11:14 AM To:

[slurm-users] fix missing accounting entries

2021-03-01 Thread Brian Andrus
All, IIRC, there was a command that would repair the accounting tables when a job had no endtime. I can't seem to find the info for that. Does anyone recall such a thing? Brian Andrus