Hi, after an all_nodes reservation for maintenance, a couple of jobs didn't start. Instead, they complain about BadConstraints. Since they are clones of jobs that ran perfectly before this is puzzling.
Looking deeper into the job details, I believe I have found what causes this - but the deeper reason is still unclear: # scontrol show job 12345 JobState=PENDING Reason=BadConstraints Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A NumNodes=2 NumCPUs=32 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=32,node=1 MinCPUsNode=32 MinMemoryNode=0 MinTmpDiskNode=0 The user had requested 2 (NumNodes) nodes, with a total of 32 (NumCPUs) cores. This is OK, since the nodes have 16 cores each. The TRES prat looks strange though as the total cpu count is still correct, but the number of nodes has been set to 1 only. As a consequence, a matching node must have 32 (MinCPUsNode) cores, which is impossible to fulfill. Attempts to change the values failed as # scontrol update job=12345 MinCPUsNode=16 returns without having changed anything, and TRES cannot be modified. Is there a way to adjust the values to make the jobs runnable again? What may have caused Slurm (which had not been stopped during the reservation) to mangle these values? Thanks S -- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am M�hlenberg 1 D-14476 Potsdam-Golm Germany ~~~ Fon: +49-331-567 7274 Fax: +49-331-567 7298 Mail: steffen.grunewald(at)aei.mpg.de ~~~