Thanks. I knew that with our implementation of PBS it was always this
way. But there was no indication from Slurm docs that the lower 7 bits
(-128) also applied for slurm.
My exit codes from sacct are always 137:0 and 139:0 from these jobs.
Bill
On 7/25/2014 6:22 PM, Danny Auble wrote:
Paul is correct,
Before 14.03.5 Slurm didn't obey POSIX convention but now does.
Basically if the job was signaled in some fashion the exit code is
increased by 128 to show this is the case.
As an example on the command line, if I do a simple sleep and ctrl-C
it the exit code would be 130
sleep 1000
^C
echo $?
130
Before 14.03.5 srun wouldn't return just 15 in this case but we wanted
to be POSIX compliant so we modified it to increase the exit_code as
it should to be compliant.
What does sacct tell you on the jobs? For the exit code of 137 I
would expect you would get a ExitCode of 0:9 meaning you had an exit
code of 0 but it was signaled with a SIGKILL. For the 139 I would
expect a 0:11 meaning a Seg Fault happened just as Paul said.
Danny
On 07/25/2014 03:06 PM, Bill Wichser wrote:
From the documentation there is no clear explanation which I find
explaining the exit codes of jobs. I have a user experiencing exit
codes of 137 and 139. Can anyone help me to locate what this 8 bit
unsigned integer references?
Thanks,
Bill