Not setting the time here does not mean there is no start time, only  
that any time set in the previous execution of the backfill scheduler  
is unchanged, which is probably no longer correct.

Note there is a variable "later_start" that attempts to start jobs at  
later times based upon when other jobs complete. That logic currently  
does not try to start jobs upon completion of reservation. Probably  
what needs to happen is to consider these reservations and advance  
"later_start" to either the time of the next job ending OR the time  
when the next reservation ends, whichever comes first. That should  
generate a correct start time for pending jobs affected by reservations.

This shouldn't be too difficult and I'll add that to my to-do list,  
but don't know when I will have time to work on it.

Moe


Quoting Mark Nelson <[email protected]>:

>
> Hi All,
>
> I've noticed that when using the backfill scheduler on SLURM 2.4 (latest
> git), if a job cannot start because it's resources are unavailable (for
> example because of an upcoming reservation that will begin before the
> job will end) it's start time gets set to "now" + whatever the backfill
> window is set to.
>
> For example, there is a whole system reservation for tomorrow morning,
> starting 9am (so it will begin in ~17 hours):
>
> ~$ scontrol show res
> ReservationName=testres StartTime=2012-08-01T09:00:00
> EndTime=2012-08-02T09:00:00 Duration=1-00:00:00
>     Nodes=bgp[000x011] NodeCnt=2048 Features=(null) PartitionName=(null)
> Flags=MAINT,SPEC_NODES
>     Users=root Accounts=(null) Licenses=(null) State=INACTIVE
>
> We have a job that wants half of the machine for one day, so it cannot
> start until after the reservation is complete:
>
> ~$ cat long-waiter
> #!/bin/sh
> #SBATCH --time=1-0
> #SBATCH --nodes=2048
>
> /bin/hostname
> sleep 1200
> /usr/bin/uptime
> ~$
>
> Prior to commit b86bc225f56ec8524243ae848b934844ced83e9c (Add backfill
> scheduler resolution parameter), this job was never given a start time.
> Part of this commit made the following change (to backfill.c):
>
> @@ -634,7 +646,9 @@ static int _attempt_backfill(void)
>                               job_ptr->start_time = 0;
>                               goto TRY_LATER;
>                       }
> +                     /* Job can not start until too far in the future */
>                       job_ptr->time_limit = orig_time_limit;
> +                     job_ptr->start_time = sched_start + backfill_window;
>                       continue;
>               }
>
> (this code is now on line 740 of backfill.c if anyone wants to see the
> whole of _attempt_backfill() )
>
> Thus, I'm now seeing a start time for jobs that previously had no start
> time, and that I'm not sure should have a start time set:
>
> ~$ squeue --start
>    JOBID PARTITION     NAME     USER  ST           START_TIME  NODES
> MIDPLANELIST(REASON)
>       27      main long-wai    markn  PD  2012-08-02T16:00:08     2K
> (Resources)
>
>
> What do others think?
> And also, why is it necessary to set start_time to be just outside the
> backfill window (which above is 2 days)?
>
> Many thanks!
> Mark
>

Reply via email to