This is fixed in commit:
https://github.com/SchedMD/slurm/commit/637c297d24b764b3ca9875e164dc2eaaece439fc
Thanks,
Brian
On 12/08/2014 03:19 AM, Tommi Tervo wrote:
Hi,
After 14.03 -> 14.11 upgrade our users started to complain that jobs
are randomly failing with reason: slurmstepd: error: get_exit_code
task 0 died by signal
Culprit is change in squeue command:
https://github.com/SchedMD/slurm/blob/master/etc/slurm.epilog.clean
squeue --noheader --format=%A --user=991 --node=localhost
squeue: error: Invalid node name localhost
possible workaround:
job_host=`hostname`
job_list=`${SLURM_BIN}squeue --noheader --format=%A --user=$SLURM_UID
--node=$job_host`
Regards,
Tommi