The job's start time is set by the backfill scheduler to be the
expected start time of the job. I'd guess guess someone is cancelling
a pending job after that time is set. Since the job never actually
started, SLURM should avoid passing the start time in the environment
variable to the job completion script. That's probably a trivial patch.

Quoting Michael Di Domenico <[email protected]>:

We're using the job completion script to insert tracking records into
a database for the jobs run on our machine, every once in a while i
see an error with the start time listed for cancelled jobs

Here's some examples:

jobid=11721
jobstate=cancelled
submit=1300487648
start=1331416200
end=1300487677

jobid=11722
jobstate=cancelled
submit=1300550901
start=1332074346
end=1300550922

jobid=11723
jobstate=cancelled
submit=1300582908
start=1332112169
end=1300582920

I haven't been able to figure out what transpires to reproduce this
behavior though.  We have a longish slurmctld-prolog script, I think
someone is srun'ing the job and cancelling it before/during the
prolog.  The jobs don't show an assigned node in the output so perhaps
our users are srun'ing and canceling the job with rather speedy
fingers, before slurm has a chance to do anything.

Can anyone confirm this as a bug?





Reply via email to