Hi Guys, Where can I find out information on the slurm exit codes? Can't seem to find what exit code 38 means. I have a job that died with exit code 38 and going through the logs we are not able to figure out what might have gone wrong. Here are the logs in case we might be missing something:
-- ..snip.. [301] energycounted = 0 [301] getjoules_task energy = 0 [301] Job 301 memory used:588956 limit:4194304 KB [301] getjoules_task energy = 0 [301] removing task 0 pid 63380 from jobacct [301] task 0 (63380) exited with exit code 38. [301] task_p_post_term: 301.4294967294, task 0 [301] cpu_freq_reset: #cpus reset = 0 [301] Aggregated 1 task exit messages [301] sending task exit msg for 1 tasks status 9728 [301] Before call to spank_fini() [301] After call to spank_fini() [301] job 301 completed with slurm_rc = 0, job_rc = 9728 [301] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 9728 [301] Called _msg_socket_readable [301] false, shutdown [301] Message thread exited [301] done with job debug: task_p_slurmd_release_resources: 301 debug3: state for jobid 301: ctime:1459111532 revoked:0 expires:0 debug: credential for job 301 revoked debug3: Step from other job: jobid=49298 (this jobid=301) debug2: No steps in jobid 301 to send signal 999 debug3: Step from other job: jobid=49298 (this jobid=301) debug2: No steps in jobid 301 to send signal 18 debug3: Step from other job: jobid=49298 (this jobid=301) debug2: No steps in jobid 301 to send signal 15 debug2: set revoke expiration for jobid 301 to 1459889405 UTS debug3: state for jobid 301: ctime:1459111532 revoked:1459888205 expires:1459888205 debug3: destroying job 301 state ..snip.. # sjobexitmod -l 301 JobID Account NNodes NodeList State ExitCode DerivedExitCode Comment ------------ ---------- -------- --------------- ---------- -------- --------------- -------------- 301 prod 1 node234 FAILED 38:0 0:0 -- -- Thanks.