Hi,
After 14.03 -> 14.11 upgrade our users started to complain that jobs are
randomly failing with reason: slurmstepd: error: get_exit_code task 0
died by signal
Culprit is change in squeue command:
https://github.com/SchedMD/slurm/blob/master/etc/slurm.epilog.clean
squeue --noheader --format=%A --user=991 --node=localhost
squeue: error: Invalid node name localhost
possible workaround:
job_host=`hostname`
job_list=`${SLURM_BIN}squeue --noheader --format=%A --user=$SLURM_UID
--node=$job_host`
Regards,
Tommi