We're using the job completion script to insert tracking records into
a database for the jobs run on our machine, every once in a while i
see an error with the start time listed for cancelled jobs

Here's some examples:

jobid=11721
jobstate=cancelled
submit=1300487648
start=1331416200
end=1300487677

jobid=11722
jobstate=cancelled
submit=1300550901
start=1332074346
end=1300550922

jobid=11723
jobstate=cancelled
submit=1300582908
start=1332112169
end=1300582920

I haven't been able to figure out what transpires to reproduce this
behavior though.  We have a longish slurmctld-prolog script, I think
someone is srun'ing the job and cancelling it before/during the
prolog.  The jobs don't show an assigned node in the output so perhaps
our users are srun'ing and canceling the job with rather speedy
fingers, before slurm has a chance to do anything.

Can anyone confirm this as a bug?

Reply via email to