[slurm-dev] Re: Scheduling weirdness

Douglas Jacobsen Fri, 16 Jun 2017 07:54:46 -0700

I typically recommend that bf_window be roughly 2x the max wall time,  this
allows for planning beyond the edge of the window.  You may need to
increase bf_resolution (it should be fine for almost all cases to go up to
300s), and potentially increase bf_interval to ensure there is enough time
for the backfill scheduler to get through your workload.


Note that things like completing nodes can cause jobs to continuously move
back in time if they are considered for planning.   An unkillable job step
script that can down nodes after some longish period of time (15 minutes
perhaps)? if need be (and nothing else is running) can help with some level
of automation here.

-Doug

----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
[email protected]

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________


On Fri, Jun 16, 2017 at 5:21 AM, Robbert Eggermont <[email protected]>
wrote:

>
> On 16-06-17 11:42, TO_Webmaster wrote:
>
>> What are the values of bf_window and bf_resolution in your configuration?
>>
>> > From the documentation of bf_window: "The default value is 1440
> > minutes (one day). A value at least as long as the highest allowed
> > time limit is generally advisable to prevent job starvation."
>
> SchedulerType=sched/backfill
> SchedulerParameters=bf_window=7-0,defer
> SelectType=select/cons_res
> SelectTypeParameters=CR_CORE_MEMORY
>
> bf_resolution should be the default (60s).
>
> Our maximum time-limit is 7-0. Any ideas on what would be the optimal
> bf_window value for this?
>
> In relation to the problem: most running jobs had a time limit of 6-21,
> and the highest priority job was scheduled to start within the 7-0
> bf_window. Does that rule out the bf_window as a factor in the problem, or
> not?
>
> In the schedule iteration at 2017-06-15T23:22:10, the highest priority job
> was scheduled to start at 2017-06-22T18:29:17, 6-19:07:07 ahead, so I don't
> understand why in the same iteration the lower priority job with a longer
> 6-21 time limit was immediately started on the same node?
>
> Using defer was an attempt to get more optimal scheduling, but
> unfortunately it didn't change anything for this problem.
>
> Robbert
>
> 2017-06-16 1:16 GMT+02:00 Robbert Eggermont <[email protected]>:
>>
>>>
>>> Hello,
>>>
>>> In our Slurm setup (now 17.02.4) I've noticed several times now that
>>> backfilled jobs push back the start time of the highest priority job.
>>> I'm not sure if this is due to a configuration error or an scheduler
>>> error,
>>> and since I'm having a hard time diagnosing what's happening, I was
>>> hoping
>>> for some insightful tips.
>>>
>>> What happens is that when the highest priority pending job needs a lot of
>>> resources (CPUs, ...), slurm will backfill lower priority jobs with less
>>> requirements but with a higher timelimit than the currently running jobs.
>>>
>>> For example, the highest priority job needs a full node, and the first
>>> node
>>> will become available in 6 days; our slurm will happily backfill all
>>> pending
>>> lower priority 2-CPU 7-day jobs on every possible node in the cluster,
>>> thus
>>> pushing back the highest priority job 1 day.
>>>
>>> Looking into the scheduler debugging info, I noticed some things I can't
>>> explain:
>>> 1) the highest priority job ("A") is not always scheduled to start on the
>>> first node ("1") that will become available;
>>> 2) in the same iteration, the backfill logic will start another, lower
>>> priority, smaller job with a timelimit longer than the expected start
>>> time
>>> of job "A" on the same node "1";
>>> 3) when "A" is scheduled to start on another node, the scheduled starting
>>> time remains the same (i.e. it is not updated to the time that the new
>>> node
>>> becomes available).
>>> 4) the scheduled starting time of the highest priority job ("A") is
>>> sometimes later than the time that the node becomes available;
>>>
>>> See below for some log entries for these events.
>>>
>>> Does anybody have an idea what's going on here, and how we can fix it?
>>>
>>> Robbert
>>>
>>>
>>> 1)
>>> JobID=1315558 has a scheduled start time on node maxwell of
>>> 2017-06-22T19:11:16; forcing it to another node (by draining maxwell)
>>> reduces the start time to 2017-06-22T16:47:43.2017-06-15T23:22:10
>>> (But slurm is consistent: when maxwell is resumed, the job is scheduled
>>> there again, with the later start time.)
>>>
>>>>
>>>> [2017-06-15T22:11:26.688] backfill: beginning
>>>> [2017-06-15T22:11:26.693] backfill test for JobID=1315558 Prio=22703485
>>>> Partition=general
>>>> [2017-06-15T22:11:26.694] Job 1315558 to start at 2017-06-22T19:11:16,
>>>> end
>>>> at 2017-06-27T07:11:00 on maxwell
>>>> [2017-06-15T22:11:26.694] backfill: reached end of job queue
>>>> [2017-06-15T22:11:52.223] update_node: node maxwell state set to
>>>> DRAINING
>>>> [2017-06-15T22:11:56.695] backfill: beginning
>>>> [2017-06-15T22:11:56.701] backfill test for JobID=1315558 Prio=22703485
>>>> Partition=general
>>>> [2017-06-15T22:11:56.703] Job 1315558 to start at 2017-06-22T16:47:43,
>>>> end
>>>> at 2017-06-27T04:47:00 on parzen
>>>> [2017-06-15T22:11:56.703] backfill: reached end of job queue
>>>>
>>>
>>> ...
>>>
>>>>
>>>> [2017-06-15T22:34:46.036] backfill: beginning
>>>> [2017-06-15T22:34:46.039] =========================================
>>>> [2017-06-15T22:34:46.039] Begin:2017-06-15T22:34:46
>>>> End:2017-06-15T22:41:462017-06-15T23:22:10
>>>>
>>>> Nodes:100plus,gauss,hopper,markov,neumann,parzen,sanger,viterbi,watson
>>>> [2017-06-15T22:34:46.039] =========================================
>>>> [2017-06-15T22:34:46.040] backfill test for JobID=1315558 Prio=22776428
>>>> Partition=general
>>>> [2017-06-15T22:34:46.040] Test job 1315558 at 2017-06-15T22:34:46 on
>>>> hopper,parzen,viterbi
>>>> [2017-06-15T22:34:46.040] Job 1315558 to start at 2017-06-22T16:47:43,
>>>> end
>>>> at 2017-06-27T04:47:00 on parzen
>>>> [2017-06-15T22:34:46.040] backfill: reached end of job queue
>>>> [2017-06-15T22:34:55.236] update_node: node maxwell state set to
>>>> ALLOCATED
>>>> [2017-06-15T22:35:16.041] backfill: beginning
>>>> [2017-06-15T22:35:16.045] =========================================
>>>> [2017-06-15T22:35:16.045] Begin:2017-06-15T22:35:16
>>>> End:2017-06-15T22:42:16
>>>> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,san
>>>> ger,viterbi,watson
>>>> [2017-06-15T22:35:16.045] =========================================
>>>> [2017-06-15T22:35:16.045] backfill test for JobID=1315558 Prio=22776428
>>>> Partition=general
>>>> [2017-06-15T22:35:16.045] Test job 1315558 at 2017-06-15T22:35:16 on
>>>> hopper,maxwell,parzen,viterbi
>>>> [2017-06-15T22:35:16.046] Job 1315558 to start at 2017-06-22T19:11:16,
>>>> end
>>>> at 2017-06-27T07:11:00 on maxwell
>>>> [2017-06-15T22:35:16.046] backfill: reached end of job queue
>>>>
>>>
>>>
>>> 2)
>>> Our highest prio job is now scheduled to start on maxwell at
>>> 2017-06-22T18:29:17, but then another job is backfilled on maxwell with
>>> EndTime=2017-06-22T21:02:10??
>>>
>>>>
>>>> [2017-06-15T23:22:10.227] backfill: beginning
>>>> [2017-06-15T23:22:10.231] =========================================
>>>> [2017-06-15T23:22:10.231] Begin:2017-06-15T23:22:10
>>>> End:2017-06-15T23:29:10
>>>> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,san
>>>> ger,viterbi,watson
>>>> [2017-06-15T23:22:10.231] =========================================
>>>> [2017-06-15T23:22:10.231] backfill test for JobID=1315558 Prio=22938897
>>>> Partition=general
>>>> [2017-06-15T23:22:10.231] Test job 1315558 at 2017-06-15T23:22:10 on
>>>> hopper,maxwell,parzen,viterbi
>>>> [2017-06-15T23:22:10.232] Job 1315558 to start at 2017-06-22T18:29:17,
>>>> end
>>>> at 2017-06-27T06:29:00 on maxwell
>>>> [2017-06-15T23:22:10.232] backfill test for JobID=1314524 Prio=10000002
>>>> Partition=general
>>>> [2017-06-15T23:22:10.233] Test job 1314524 at 2017-06-15T23:22:10 on
>>>> gauss,hopper,markov,maxwell,neumann,parzen,sanger,viterbi,watson
>>>> [2017-06-15T23:22:10.233] backfill: Started JobId=1314524 in general on
>>>> maxwell
>>>> [2017-06-15T23:22:10.237] backfill: reached end of job queue
>>>>
>>>
>>>
>>> 3 and 4)
>>> Our job is immediately scheduled on another node, however the scheduled
>>> starting time 2017-06-22T18:29:17 remains the same, although the current
>>> longest jobs on parzen has EndTime=2017-06-22T15:27:08.
>>> Almost an hour later the start time is updated (note: the job complete
>>> should be unrelated since the job ran on maxwell, right?).
>>> Then 40 minutes later the start time is updated again, pushed back
>>> (while no
>>> new jobs were started on parzen in the mean time, and we don't suspend;
>>> there was however another job complete on maxwell?).
>>>
>>>>
>>>> [2017-06-15T23:22:40.237] backfill: beginning
>>>> [2017-06-15T23:22:40.241] =========================================
>>>> [2017-06-15T23:22:40.242] Begin:2017-06-15T23:22:40
>>>> End:2017-06-15T23:29:40
>>>> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,san
>>>> ger,viterbi,watson
>>>> [2017-06-15T23:22:40.242] =========================================
>>>> [2017-06-15T23:22:40.242] backfill test for JobID=1315558 Prio=22938897
>>>> Partition=general
>>>> [2017-06-15T23:22:40.242] Test job 1315558 at 2017-06-15T23:22:40 on
>>>> hopper,maxwell,parzen,viterbi
>>>> [2017-06-15T23:22:40.243] Job 1315558 to start at 2017-06-22T18:29:17,
>>>> end
>>>> at 2017-06-27T06:29:00 on parzen
>>>> [2017-06-15T23:22:40.243] backfill: reached end of job queue
>>>>
>>>
>>> ...
>>>
>>>>
>>>> [2017-06-16T00:16:06.161] backfill: beginning
>>>> [2017-06-16T00:16:06.165] =========================================
>>>> [2017-06-16T00:16:06.165] Begin:2017-06-16T00:16:06
>>>> End:2017-06-16T00:23:06
>>>> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,san
>>>> ger,viterbi,watson
>>>> [2017-06-16T00:16:06.165] =========================================
>>>> [2017-06-16T00:16:06.165] backfill test for JobID=1315558 Prio=23087721
>>>> Partition=general
>>>> [2017-06-16T00:16:06.165] Test job 1315558 at 2017-06-16T00:16:06 on
>>>> hopper,maxwell,parzen,viterbi
>>>> [2017-06-16T00:16:06.166] Job 1315558 to start at 2017-06-22T18:29:17,
>>>> end
>>>> at 2017-06-27T06:29:00 on parzen
>>>> [2017-06-16T00:16:06.167] backfill: reached end of job queue
>>>> [2017-06-16T00:16:12.763] job_complete: JobID=1316201 State=0x8003
>>>> NodeCnt=1 done
>>>> [2017-06-16T00:16:36.167] backfill: beginning
>>>> [2017-06-16T00:16:36.170] =========================================
>>>> [2017-06-16T00:16:36.170] Begin:2017-06-16T00:16:36
>>>> End:2017-06-16T00:23:36
>>>> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,san
>>>> ger,viterbi,watson
>>>> [2017-06-16T00:16:36.170] =========================================
>>>> [2017-06-16T00:16:36.170] backfill test for JobID=1315558 Prio=23101442
>>>> Partition=general
>>>> [2017-06-16T00:16:36.170] Test job 1315558 at 2017-06-16T00:16:36 on
>>>> hopper,maxwell,parzen,viterbi
>>>> [2017-06-16T00:16:36.171] Job 1315558 to start at 2017-06-22T16:41:17,
>>>> end
>>>> at 2017-06-27T04:41:00 on parzen
>>>> [2017-06-16T00:16:36.171] backfill: reached end of job queue
>>>>
>>>
>>> ...
>>>
>>>>
>>>> [2017-06-16T00:55:41.642] backfill: beginning
>>>> [2017-06-16T00:55:41.646] =========================================
>>>> [2017-06-16T00:55:41.646] Begin:2017-06-16T00:55:41
>>>> End:2017-06-16T01:02:41
>>>> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,san
>>>> ger,viterbi,watson
>>>> [2017-06-16T00:55:41.646] =========================================
>>>> [2017-06-16T00:55:41.646] backfill test for JobID=1315558 Prio=23186871
>>>> Partition=general
>>>> [2017-06-16T00:55:41.646] Test job 1315558 at 2017-06-16T00:55:41 on
>>>> hopper,maxwell,parzen,viterbi
>>>> [2017-06-16T00:55:41.647] Job 1315558 to start at 2017-06-22T16:41:17,
>>>> end
>>>> at 2017-06-27T04:41:00 on parzen
>>>> [2017-06-16T00:55:41.694] backfill: reached end of job queue
>>>> [2017-06-16T00:55:47.592] job_complete: JobID=1316353 State=0x1
>>>> NodeCnt=1
>>>> WEXITSTATUS 0
>>>> [2017-06-16T00:55:47.598] job_complete: JobID=1316353 State=0x8003
>>>> NodeCnt=1 done
>>>> [2017-06-16T00:56:11.695] backfill: beginning
>>>> [2017-06-16T00:56:11.701] =========================================
>>>> [2017-06-16T00:56:11.701] Begin:2017-06-16T00:56:11
>>>> End:2017-06-16T01:03:11
>>>> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,san
>>>> ger,viterbi,watson
>>>> [2017-06-16T00:56:11.701] =========================================
>>>> [2017-06-16T00:56:11.701] backfill test for JobID=1315558 Prio=23186871
>>>> Partition=general
>>>> [2017-06-16T00:56:11.701] Test job 1315558 at 2017-06-16T00:56:11 on
>>>> hopper,maxwell,parzen,viterbi
>>>> [2017-06-16T00:56:11.703] Job 1315558 to start at 2017-06-22T18:01:33,
>>>> end
>>>> at 2017-06-27T06:01:00 on parzen
>>>> [2017-06-16T00:56:11.703] backfill: reached end of job queue
>>>>
>>>
>
> --
> Robbert Eggermont                                  Intelligent Systems
> [email protected]         Electr.Eng., Mathematics & Comp.Science
> +31 15 27 83234                         Delft University of Technology
>

[slurm-dev] Re: Scheduling weirdness

Reply via email to