Paul is correct,
Before 14.03.5 Slurm didn't obey POSIX convention but now does.
Basically if the job was signaled in some fashion the exit code is
increased by 128 to show this is the case.
As an example on the command line, if I do a simple sleep and ctrl-C it
the exit code would be 130
sleep 1000
^C
echo $?
130
Before 14.03.5 srun wouldn't return just 15 in this case but we wanted
to be POSIX compliant so we modified it to increase the exit_code as it
should to be compliant.
What does sacct tell you on the jobs? For the exit code of 137 I would
expect you would get a ExitCode of 0:9 meaning you had an exit code of 0
but it was signaled with a SIGKILL. For the 139 I would expect a 0:11
meaning a Seg Fault happened just as Paul said.
Danny
On 07/25/2014 03:06 PM, Bill Wichser wrote:
From the documentation there is no clear explanation which I find
explaining the exit codes of jobs. I have a user experiencing exit
codes of 137 and 139. Can anyone help me to locate what this 8 bit
unsigned integer references?
Thanks,
Bill