Look for the SLURM_RESTART_COUNT environment variable. It is only set
if the job has been restarted.
Quoting Chris Harwell <[email protected]>:
I don't find anything with grep -r RESTARTED slurm-2.2.7/
Did this not make it in?
What are people doing currently in the case where a status=NODE_FAIL
and a job gets requeued?
On Fri, Jan 23, 2009 at 11:05 AM, <[email protected]> wrote:
Hi,
Is there an environment variable visible to the running job which
indicates that a slurm job has been restarted or requeued? Or is there
some other way for the job to determine this? Perhaps some
squeue/scontrol invocation? I've poked around a bit but have been
unable to find anything in the sbatch,srun,scontrol manpages or in the
environment of requeued jobs.
I know grid engine uses the RESTARTED environment variable (from man
qsub):
RESTARTED This variable is set to 1 if a job was restarted
either after a system crash or after a migration in case of a
checkpointing job. The variable has the value 0 otherwise.
I do see the requeue messages in slurmctld.log - those are very nice.
We are using slurm 1.3.12 currently. Hopefully there is a way and I
have just missed it, if not could you please add this if it isn't too
much trouble? It seems that nn improvement on the above {1,0} method
would be to increment the value upon each rerun so that a job could
give up after some number of times.
Thanks,
Chris
Chris,
Although this is a good idea, SLURM does not provide this information
today. Since changes to the RPCs would be required to implement this,
it will need to wait for the next major release of SLURM. We plan to
release version 1.4 in May, and I'll plan to add a SLURM_RESTARTED
environment variable with a counter per your suggestion.
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Morris "Moe" Jette [email protected] 925-423-4856
Integrated Computational Resource Management Group fax 925-423-6961
Livermore Computing Lawrence Livermore National Laboratory
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
"The problem with the world is that we draw the circle of our family
too small." - Mother Teresa
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++