Hello,
Could someone in the slurm community please advise me on outputting data (job
stats) at the end of a job. I'm currently using the Epilog (slurm.epilog.clean)
to print out a report at the end of each slurm job. This works, however it is
far from ideal. The slurm.epilog.clean can be executed multiple times and so I
do a test to find out when I'm on the "batchhost" and then write out the job
stats. That is...
hostnode=`hostname`
batchhost=`/local/software/slurm/default/bin/scontrol show job ${SLURM_JOB_ID}
| grep BatchHost | cut -f2 -d '='`
if [ $hostnode = $batchhost ] ; then
printf "Submit time : `/local/software/slurm/default/bin/scontrol show
job ${SLURM_JOB_ID}| grep SubmitTime | awk '{print $1}' | cut -f2 -d"="`\n"
>>$stdout
etc
fi
This does work, however it strikes me that using the EpilogSlurmctld might be
better. The issue, of course, is that the EpilogSlurmctld is executed by the
slurm user, and so how can this script be made to write to the stdout file of a
job?
I am, by the way, grep'ing the output of the scontrol (for submit time, etc)
and the sacct (for memory usage, etc) commands to generate my report. Does
this approach make sense or are there better alternatives. Here's an example of
the data printed out by my epilog script....
Submit time : 2016-10-12T09:47:03
Start time : 2016-10-12T09:47:03
End time : 2016-10-12T09:47:17
Elapsed time : 00:00:14 (Timelimit=02:00:00)
JobName MaxRSS Elapsed
---------- ---------- ----------
slurm.mpi 00:00:14
batch 1244K 00:00:14
cpi 28224K 00:00:01
Best regards,
David