I don't find anything with grep -r RESTARTED slurm-2.2.7/

Did this not make it in?

What are people doing currently in the case where a status=NODE_FAIL
and a job gets requeued?




On Fri, Jan 23, 2009 at 11:05 AM,  <[email protected]> wrote:
>> Hi,
>>
>> Is there an environment variable visible to the running job which
>> indicates that a slurm job has been restarted or requeued? Or is there
>> some other way for the job to determine this? Perhaps some
>> squeue/scontrol invocation?  I've poked around a bit but have been
>> unable to find anything in the sbatch,srun,scontrol manpages or in the
>> environment of requeued jobs.
>>
>> I know grid engine uses the RESTARTED environment variable (from man
>> qsub):
>>  RESTARTED      This variable is set to 1 if a job was restarted
>> either after a system crash or after a migration in case of a
>> checkpointing job. The variable has the value 0 otherwise.
>>
>> I do see the requeue messages in slurmctld.log - those are very nice.
>> We are using slurm 1.3.12 currently. Hopefully there is a way and I
>> have just missed it, if not could you please add this if it isn't too
>> much trouble? It seems that nn improvement on the above {1,0} method
>> would be to increment the value upon each rerun so that a job could
>> give up after some number of times.
>>
>> Thanks,
>> Chris
>
>
> Chris,
>
> Although this is a good idea, SLURM does not provide this information
> today. Since changes to the RPCs would be required to implement this,
> it will need to wait for the next major release of SLURM. We plan to
> release version 1.4 in May, and I'll plan to add a SLURM_RESTARTED
> environment variable with a counter per your suggestion.
> --
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Morris "Moe" Jette       [email protected]                 925-423-4856
> Integrated Computational Resource Management Group   fax 925-423-6961
> Livermore Computing            Lawrence Livermore National Laboratory
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> "The problem with the world is that we draw the circle of our family
>  too small."  - Mother Teresa
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Reply via email to