Hi Guys,

Where can I find out information on the slurm exit codes?  Can't seem to
find what exit code 38 means.  I have a job that died with exit code 38 and
going through the logs we are not able to figure out what might have gone
wrong.  Here are the logs in case we might be missing something:

--
..snip..
[301] energycounted = 0
[301] getjoules_task energy = 0
[301] Job 301 memory used:588956 limit:4194304 KB
[301] getjoules_task energy = 0
[301] removing task 0 pid 63380 from jobacct
[301] task 0 (63380) exited with exit code 38.
[301] task_p_post_term: 301.4294967294, task 0
[301] cpu_freq_reset: #cpus reset = 0
[301] Aggregated 1 task exit messages
[301] sending task exit msg for 1 tasks status 9728
[301] Before call to spank_fini()
[301] After call to spank_fini()
[301] job 301 completed with slurm_rc = 0, job_rc = 9728
[301] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 9728
[301] Called _msg_socket_readable
[301]   false, shutdown
[301] Message thread exited
[301] done with job
debug:  task_p_slurmd_release_resources: 301
debug3: state for jobid 301: ctime:1459111532 revoked:0 expires:0
debug:  credential for job 301 revoked
debug3: Step from other job: jobid=49298 (this jobid=301)
debug2: No steps in jobid 301 to send signal 999
debug3: Step from other job: jobid=49298 (this jobid=301)
debug2: No steps in jobid 301 to send signal 18
debug3: Step from other job: jobid=49298 (this jobid=301)
debug2: No steps in jobid 301 to send signal 15
debug2: set revoke expiration for jobid 301 to 1459889405 UTS
debug3: state for jobid 301: ctime:1459111532 revoked:1459888205
expires:1459888205
debug3: destroying job 301 state
..snip..

# sjobexitmod -l 301
       JobID    Account   NNodes        NodeList      State ExitCode
DerivedExitCode        Comment
------------ ---------- -------- --------------- ---------- --------
--------------- --------------
301          prod        1        node234     FAILED     38:0
0:0                --
--

Thanks.

Reply via email to