I'm using slurm 2.6.2 and found the following issue:
Under some circumstances, upon job completion, the job name is NOT
available right away. However, if you wait like 30 sec, the job name become
available. I've setup job completion triggers, that get fired, and to debug
the problem, I'm logging the command output. I see the following commands
being executed, upon job completion:
job_data=$(sacct --format=JobName%16,ExitCode -Xn -j $JOB_ID)
logger "sacct: $job_data"
/var/log/messages has the following entries:
failure case: Jan 22 18:28:16 ts-isbeng-tms logger: sacct: allocation
1:0
successful case: Jan 22 18:27:28 ts-isbeng-tms logger: sacct:
CopyJob432 1:0
Does anyone know why I'm getting "allocation" as the job name? And why? How
would I fix this problem?
Thanks,
-aamir