is there anyway after a job starts to determine why the scheduler
choose the series of nodes it did?

for some reason on an empty cluster when i spin up a large job it's
staggering the allocation across a seemingly random allocation of
nodes

we're using backfill/cons_res + gres, and all the nodes are identical.

in the past it used to select the next node past a down node and then
start sequential from there.

i haven't made (or are not aware of ) any changes in the system, but
now it's skipping nodes that presumably should have been in the
allocation

Reply via email to