The job's start time is set by the backfill scheduler to be the expected start time of the job. I'd guess guess someone is cancelling a pending job after that time is set. Since the job never actually started, SLURM should avoid passing the start time in the environment variable to the job completion script. That's probably a trivial patch.
Quoting Michael Di Domenico <[email protected]>:
We're using the job completion script to insert tracking records into a database for the jobs run on our machine, every once in a while i see an error with the start time listed for cancelled jobs Here's some examples: jobid=11721 jobstate=cancelled submit=1300487648 start=1331416200 end=1300487677 jobid=11722 jobstate=cancelled submit=1300550901 start=1332074346 end=1300550922 jobid=11723 jobstate=cancelled submit=1300582908 start=1332112169 end=1300582920 I haven't been able to figure out what transpires to reproduce this behavior though. We have a longish slurmctld-prolog script, I think someone is srun'ing the job and cancelling it before/during the prolog. The jobs don't show an assigned node in the output so perhaps our users are srun'ing and canceling the job with rather speedy fingers, before slurm has a chance to do anything. Can anyone confirm this as a bug?
