Hello all,

as a workaround, i finally use a Epilog script to archive the jobs
in slurm.conf
The script does:
scontrol show job -d $SLURM_JOB_ID >> $JOBS_FILE



Le 09/02/2018 à 17:58, Henry Gérard a écrit :
Hello all,
we have slurm 15.08 and we configured preemption. So sometimes, jobs are killed then restarted, and the attribute "Restarts" becomes > 0:
[gerard.henry@xcluster ~]$ scontrol show job 157945
JobId=157945 JobName=sleep
    UserId=gerard.henry(1016) GroupId=grp1(1002)
    Priority=1053144 Nice=0 Account=u_recover QOS=defaultqos
    JobState=PENDING Reason=BeginTime Dependency=(null)
    Requeue=1 Restarts=2 BatchFlag=1 Reboot=0 ExitCode=0:0

Is there a mean to retrieve all the jobs (after they completed) where Restarts is > 0?
i found no such thing in sacct.
I tried with -D (duplicates), but when there are thousands jobs, it's unusable.
Is this information store in the accounting database?

Thanks in advance for help,

Gérard Henry
RSI Irstea Aix en Provence

Reply via email to