Paul is correct,

Before 14.03.5 Slurm didn't obey POSIX convention but now does.

Basically if the job was signaled in some fashion the exit code is increased by 128 to show this is the case.

As an example on the command line, if I do a simple sleep and ctrl-C it the exit code would be 130

sleep 1000
^C
echo $?
130

Before 14.03.5 srun wouldn't return just 15 in this case but we wanted to be POSIX compliant so we modified it to increase the exit_code as it should to be compliant.

What does sacct tell you on the jobs? For the exit code of 137 I would expect you would get a ExitCode of 0:9 meaning you had an exit code of 0 but it was signaled with a SIGKILL. For the 139 I would expect a 0:11 meaning a Seg Fault happened just as Paul said.

Danny

On 07/25/2014 03:06 PM, Bill Wichser wrote:

From the documentation there is no clear explanation which I find explaining the exit codes of jobs. I have a user experiencing exit codes of 137 and 139. Can anyone help me to locate what this 8 bit unsigned integer references?

Thanks,
Bill

Reply via email to