Occasionally, a node will be correctly set to drain on failure of the prolog 
and the job will be requeued in HOLD state.

However occasionally, if the node state is set back to state=resume without 
doing anything to the node itself, subsequent jobs submitted to the node which 
also result in failure of the prolog will run.

Details can be found in Slurm Bugzilla Bug 2254[1]

If anybody else has any bright ideas I'd sure be grateful for your input while 
waiting for SchedMD to respond.

Thanks,

-- Trevor

[1] - http://bugs.schedmd.com/show_bug.cgi?id=2254

Reply via email to