Hi Loris, not sure this has been answered before, but have you found a solution to it?. We've also seen this, but never came up with the right solution, after throwing a few scontrol requeue/hold/release/resume in a pseudo-random order, we get the system to reschedule the jobs. Not sure it is because of our doing, or because the scheduler doing its job :)
Cheers, Miguel > On 24 May 2016, at 08:46, Loris Bennett <[email protected]> wrote: > > > Hi, > > The 'Reason' field for a pending job has changed from 'Priority' to > 'BadConstraints'. This seems to be because the status of one of the > nodes in the node list reported by 'scontrol show job' has changed to > 'draining'. The job itself just specifies the number of tasks required, > not specific nodes. > > Shouldn't the scheduler just be able to replace the draining node with > another node in the projected node list? This is happening with version > 15.08.8. > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email [email protected] -- Miguel Gila CSCS Swiss National Supercomputing Centre HPC Operations Via Trevano 131 | CH-6900 Lugano | Switzerland mg [at] cscs.ch
