[email protected] (Pär Lindfors) writes:

> We have just discovered a problem with 14.11.4 configured to run Prolog
> at job allocation. ( PrologFlags=Alloc )
>
> When a batch job starts the first time, the Prolog is executed on all
> nodes as expected.
>
> If the job is then requeued and restarted, the Prolog is only run on the
> first node that run the batch script, and any node in the job allocation
> that was not allocated to the job the first time it ran.

Further testing shows that this bug does not depend on
PrologFlags=Alloc, the same things happen without that config option.

When a job is restarted, Prolog is only run on the node running the
batch script, and on nodes that was not allocated to the job the last
time it ran.

On nodes that was allocated to the job the last time, Slurmd does not
run Prolog before running the first job step, it does not run Prolog at
all.

I have not done any detailed analysis of this case, but I would guess
something similar is causing this.

Regards,
Pär Lindfors, NSC

Reply via email to