Hi Loris,
I reported a similar problem back in February.
https://groups.google.com/forum/#!topic/slurm-devel/kshbXbqpEIY
The problem is that if only the number of tasks is set then (I think due to the
TRES logic) then following any modification to the job (manual change in
priority, a node becoming unavailable etc) then the MinCPUsNode gets set to the
total size of the job!
JobState=PENDING Reason=BadConstraints Dependency=(null)
...
NumNodes=64 NumCPUs=1536 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1536,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1536 MinMemoryNode=0 MinTmpDiskNode=0
The “BadConstraints” obviously coming from the fact that we have no nodes with
1536 cores.
The only way to fix this seems to be to set the number of nodes which is less
than idea.
I haven’t had the time so see if a newer version of 15.08 still shows the same
behaviour.
Thanks
Ewan
--
Ecole Polytechnique Fédérale de Lausanne
Switzerland
On 6 juin 2016, at 08:48, Miguel Gila
<[email protected]<mailto:[email protected]>> wrote:
Hi Loris,
not sure this has been answered before, but have you found a solution to it?.
We've also seen this, but never came up with the right solution, after throwing
a few scontrol requeue/hold/release/resume in a pseudo-random order, we get the
system to reschedule the jobs. Not sure it is because of our doing, or because
the scheduler doing its job :)
Cheers,
Miguel
On 24 May 2016, at 08:46, Loris Bennett
<[email protected]<mailto:[email protected]>> wrote:
Hi,
The 'Reason' field for a pending job has changed from 'Priority' to
'BadConstraints'. This seems to be because the status of one of the
nodes in the node list reported by 'scontrol show job' has changed to
'draining'. The job itself just specifies the number of tasks required,
not specific nodes.
Shouldn't the scheduler just be able to replace the draining node with
another node in the projected node list? This is happening with version
15.08.8.
Cheers,
Loris
--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email
[email protected]<mailto:[email protected]>
--
Miguel Gila
CSCS Swiss National Supercomputing Centre
HPC Operations
Via Trevano 131 | CH-6900 Lugano | Switzerland
mg [at] cscs.ch<http://cscs.ch>