[slurm-dev] RE: Scheduling weirdness

2017-06-21 Thread Skouson, Gary B
I've set our bf_window to 7 days and our bf_window_linear=10.  Most of our jobs 
are constrained to a couple of days, so we generally aren't having jobs reserve 
5 or 6 days in the future.  We generally have less than a few hundred jobs 
running at a time, so the performance penalty for using bf_window_linear isn't 
a big deal in our case. Setting the bf_window_linear to 1, would get you 
closest, but there may be a performance penalty, depending on how many jobs it 
has to go through.

-
Gary Skouson


-Original Message-
From: Robbert Eggermont [mailto:r.eggerm...@tudelft.nl] 
Sent: Wednesday, June 21, 2017 7:36 AM
To: slurm-dev <slurm-dev@schedmd.com>
Subject: [slurm-dev] RE: Scheduling weirdness


Hi Gary,

Thanks, that might explain at least part of the problem. It looks like 
that patch won't be included until 17.11 (at least not in 17.02.5) so 
I'll need to find some spare time to try it.

What would be a reasonable number for bf_window_linear for our 
bf_window=7-0? (300/600/900s?)

Robbert

On 19-06-17 18:34, Skouson, Gary B wrote:
> If you're using the cons_res select plugin, the backfill algorithm uses a 
> doubling factor on the backfill time window, as it goes through the running 
> jobs. The further out you get, the wider the window will be.  In your case, 
> being out 6 days means that the granularity for the backfill window may be 
> more than a day wide, so your high priority jobs that can't start right away 
> will seem to be pushed out further into the distance.
> 
> I saw the same problem with large parallel job reservations continuing to 
> drift into the future.  There's a patch to fix this, but it isn't in the 
> 17.02 tarball.  Take a look at 
> https://github.com/SchedMD/slurm/commit/3f7e10f868145a505b1dad6a69b040a167eaa541
> 
> -
> Gary Skouson
> 
> 
> -Original Message-
> From: Robbert Eggermont [mailto:r.eggerm...@tudelft.nl]
> Sent: Thursday, June 15, 2017 4:15 PM
> To: slurm-dev <slurm-dev@schedmd.com>
> Subject: [slurm-dev] Scheduling weirdness
> 
> 
> Hello,
> 
> In our Slurm setup (now 17.02.4) I've noticed several times now that
> backfilled jobs push back the start time of the highest priority job.
> I'm not sure if this is due to a configuration error or an scheduler
> error, and since I'm having a hard time diagnosing what's happening, I
> was hoping for some insightful tips.
> 
> What happens is that when the highest priority pending job needs a lot
> of resources (CPUs, ...), slurm will backfill lower priority jobs with
> less requirements but with a higher timelimit than the currently running
> jobs.
> 
> For example, the highest priority job needs a full node, and the first
> node will become available in 6 days; our slurm will happily backfill
> all pending lower priority 2-CPU 7-day jobs on every possible node in
> the cluster, thus pushing back the highest priority job 1 day.
> 
> Looking into the scheduler debugging info, I noticed some things I can't
> explain:
> 1) the highest priority job ("A") is not always scheduled to start on
> the first node ("1") that will become available;
> 2) in the same iteration, the backfill logic will start another, lower
> priority, smaller job with a timelimit longer than the expected start
> time of job "A" on the same node "1";
> 3) when "A" is scheduled to start on another node, the scheduled
> starting time remains the same (i.e. it is not updated to the time that
> the new node becomes available).
> 4) the scheduled starting time of the highest priority job ("A") is
> sometimes later than the time that the node becomes available;
> 
> See below for some log entries for these events.
> 
> Does anybody have an idea what's going on here, and how we can fix it?
> 
> Robbert
> 
> 
> 1)
> JobID=1315558 has a scheduled start time on node maxwell of
> 2017-06-22T19:11:16; forcing it to another node (by draining maxwell)
> reduces the start time to 2017-06-22T16:47:43.
> (But slurm is consistent: when maxwell is resumed, the job is scheduled
> there again, with the later start time.)
>> [2017-06-15T22:11:26.688] backfill: beginning
>> [2017-06-15T22:11:26.693] backfill test for JobID=1315558 Prio=22703485 
>> Partition=general
>> [2017-06-15T22:11:26.694] Job 1315558 to start at 2017-06-22T19:11:16, end 
>> at 2017-06-27T07:11:00 on maxwell
>> [2017-06-15T22:11:26.694] backfill: reached end of job queue
>> [2017-06-15T22:11:52.223] update_node: node maxwell state set to DRAINING
>> [2017-06-15T22:11:56.695] backfill: beginning
>> [2017-06-15T22:11:56.701] backfill test for JobID=1315558 Prio=22703485 
>> Partition=general
>> [2017-06-15T22:11:56.703] Job 1315558 to start at 2

[slurm-dev] RE: Scheduling weirdness

2017-06-21 Thread Robbert Eggermont


Hi Gary,

Thanks, that might explain at least part of the problem. It looks like 
that patch won't be included until 17.11 (at least not in 17.02.5) so 
I'll need to find some spare time to try it.


What would be a reasonable number for bf_window_linear for our 
bf_window=7-0? (300/600/900s?)


Robbert

On 19-06-17 18:34, Skouson, Gary B wrote:

If you're using the cons_res select plugin, the backfill algorithm uses a 
doubling factor on the backfill time window, as it goes through the running 
jobs. The further out you get, the wider the window will be.  In your case, 
being out 6 days means that the granularity for the backfill window may be more 
than a day wide, so your high priority jobs that can't start right away will 
seem to be pushed out further into the distance.

I saw the same problem with large parallel job reservations continuing to drift 
into the future.  There's a patch to fix this, but it isn't in the 17.02 
tarball.  Take a look at 
https://github.com/SchedMD/slurm/commit/3f7e10f868145a505b1dad6a69b040a167eaa541

-
Gary Skouson


-Original Message-
From: Robbert Eggermont [mailto:r.eggerm...@tudelft.nl]
Sent: Thursday, June 15, 2017 4:15 PM
To: slurm-dev 
Subject: [slurm-dev] Scheduling weirdness


Hello,

In our Slurm setup (now 17.02.4) I've noticed several times now that
backfilled jobs push back the start time of the highest priority job.
I'm not sure if this is due to a configuration error or an scheduler
error, and since I'm having a hard time diagnosing what's happening, I
was hoping for some insightful tips.

What happens is that when the highest priority pending job needs a lot
of resources (CPUs, ...), slurm will backfill lower priority jobs with
less requirements but with a higher timelimit than the currently running
jobs.

For example, the highest priority job needs a full node, and the first
node will become available in 6 days; our slurm will happily backfill
all pending lower priority 2-CPU 7-day jobs on every possible node in
the cluster, thus pushing back the highest priority job 1 day.

Looking into the scheduler debugging info, I noticed some things I can't
explain:
1) the highest priority job ("A") is not always scheduled to start on
the first node ("1") that will become available;
2) in the same iteration, the backfill logic will start another, lower
priority, smaller job with a timelimit longer than the expected start
time of job "A" on the same node "1";
3) when "A" is scheduled to start on another node, the scheduled
starting time remains the same (i.e. it is not updated to the time that
the new node becomes available).
4) the scheduled starting time of the highest priority job ("A") is
sometimes later than the time that the node becomes available;

See below for some log entries for these events.

Does anybody have an idea what's going on here, and how we can fix it?

Robbert


1)
JobID=1315558 has a scheduled start time on node maxwell of
2017-06-22T19:11:16; forcing it to another node (by draining maxwell)
reduces the start time to 2017-06-22T16:47:43.
(But slurm is consistent: when maxwell is resumed, the job is scheduled
there again, with the later start time.)

[2017-06-15T22:11:26.688] backfill: beginning
[2017-06-15T22:11:26.693] backfill test for JobID=1315558 Prio=22703485 
Partition=general
[2017-06-15T22:11:26.694] Job 1315558 to start at 2017-06-22T19:11:16, end at 
2017-06-27T07:11:00 on maxwell
[2017-06-15T22:11:26.694] backfill: reached end of job queue
[2017-06-15T22:11:52.223] update_node: node maxwell state set to DRAINING
[2017-06-15T22:11:56.695] backfill: beginning
[2017-06-15T22:11:56.701] backfill test for JobID=1315558 Prio=22703485 
Partition=general
[2017-06-15T22:11:56.703] Job 1315558 to start at 2017-06-22T16:47:43, end at 
2017-06-27T04:47:00 on parzen
[2017-06-15T22:11:56.703] backfill: reached end of job queue

...

[2017-06-15T22:34:46.036] backfill: beginning
[2017-06-15T22:34:46.039] =
[2017-06-15T22:34:46.039] Begin:2017-06-15T22:34:46 End:2017-06-15T22:41:46 
Nodes:100plus,gauss,hopper,markov,neumann,parzen,sanger,viterbi,watson
[2017-06-15T22:34:46.039] =
[2017-06-15T22:34:46.040] backfill test for JobID=1315558 Prio=22776428 
Partition=general
[2017-06-15T22:34:46.040] Test job 1315558 at 2017-06-15T22:34:46 on 
hopper,parzen,viterbi
[2017-06-15T22:34:46.040] Job 1315558 to start at 2017-06-22T16:47:43, end at 
2017-06-27T04:47:00 on parzen
[2017-06-15T22:34:46.040] backfill: reached end of job queue
[2017-06-15T22:34:55.236] update_node: node maxwell state set to ALLOCATED
[2017-06-15T22:35:16.041] backfill: beginning
[2017-06-15T22:35:16.045] =
[2017-06-15T22:35:16.045] Begin:2017-06-15T22:35:16 End:2017-06-15T22:42:16 
Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,sanger,viterbi,watson
[2017-06-15T22:35:16.045] =

[slurm-dev] RE: Scheduling weirdness

2017-06-19 Thread Skouson, Gary B
If you're using the cons_res select plugin, the backfill algorithm uses a 
doubling factor on the backfill time window, as it goes through the running 
jobs. The further out you get, the wider the window will be.  In your case, 
being out 6 days means that the granularity for the backfill window may be more 
than a day wide, so your high priority jobs that can't start right away will 
seem to be pushed out further into the distance.

I saw the same problem with large parallel job reservations continuing to drift 
into the future.  There's a patch to fix this, but it isn't in the 17.02 
tarball.  Take a look at 
https://github.com/SchedMD/slurm/commit/3f7e10f868145a505b1dad6a69b040a167eaa541
 

-
Gary Skouson


-Original Message-
From: Robbert Eggermont [mailto:r.eggerm...@tudelft.nl] 
Sent: Thursday, June 15, 2017 4:15 PM
To: slurm-dev 
Subject: [slurm-dev] Scheduling weirdness


Hello,

In our Slurm setup (now 17.02.4) I've noticed several times now that 
backfilled jobs push back the start time of the highest priority job.
I'm not sure if this is due to a configuration error or an scheduler 
error, and since I'm having a hard time diagnosing what's happening, I 
was hoping for some insightful tips.

What happens is that when the highest priority pending job needs a lot 
of resources (CPUs, ...), slurm will backfill lower priority jobs with 
less requirements but with a higher timelimit than the currently running 
jobs.

For example, the highest priority job needs a full node, and the first 
node will become available in 6 days; our slurm will happily backfill 
all pending lower priority 2-CPU 7-day jobs on every possible node in 
the cluster, thus pushing back the highest priority job 1 day.

Looking into the scheduler debugging info, I noticed some things I can't 
explain:
1) the highest priority job ("A") is not always scheduled to start on 
the first node ("1") that will become available;
2) in the same iteration, the backfill logic will start another, lower 
priority, smaller job with a timelimit longer than the expected start 
time of job "A" on the same node "1";
3) when "A" is scheduled to start on another node, the scheduled 
starting time remains the same (i.e. it is not updated to the time that 
the new node becomes available).
4) the scheduled starting time of the highest priority job ("A") is 
sometimes later than the time that the node becomes available;

See below for some log entries for these events.

Does anybody have an idea what's going on here, and how we can fix it?

Robbert


1)
JobID=1315558 has a scheduled start time on node maxwell of 
2017-06-22T19:11:16; forcing it to another node (by draining maxwell) 
reduces the start time to 2017-06-22T16:47:43.
(But slurm is consistent: when maxwell is resumed, the job is scheduled 
there again, with the later start time.)
> [2017-06-15T22:11:26.688] backfill: beginning
> [2017-06-15T22:11:26.693] backfill test for JobID=1315558 Prio=22703485 
> Partition=general
> [2017-06-15T22:11:26.694] Job 1315558 to start at 2017-06-22T19:11:16, end at 
> 2017-06-27T07:11:00 on maxwell
> [2017-06-15T22:11:26.694] backfill: reached end of job queue
> [2017-06-15T22:11:52.223] update_node: node maxwell state set to DRAINING
> [2017-06-15T22:11:56.695] backfill: beginning
> [2017-06-15T22:11:56.701] backfill test for JobID=1315558 Prio=22703485 
> Partition=general
> [2017-06-15T22:11:56.703] Job 1315558 to start at 2017-06-22T16:47:43, end at 
> 2017-06-27T04:47:00 on parzen
> [2017-06-15T22:11:56.703] backfill: reached end of job queue
...
> [2017-06-15T22:34:46.036] backfill: beginning
> [2017-06-15T22:34:46.039] =
> [2017-06-15T22:34:46.039] Begin:2017-06-15T22:34:46 End:2017-06-15T22:41:46 
> Nodes:100plus,gauss,hopper,markov,neumann,parzen,sanger,viterbi,watson
> [2017-06-15T22:34:46.039] =
> [2017-06-15T22:34:46.040] backfill test for JobID=1315558 Prio=22776428 
> Partition=general
> [2017-06-15T22:34:46.040] Test job 1315558 at 2017-06-15T22:34:46 on 
> hopper,parzen,viterbi
> [2017-06-15T22:34:46.040] Job 1315558 to start at 2017-06-22T16:47:43, end at 
> 2017-06-27T04:47:00 on parzen
> [2017-06-15T22:34:46.040] backfill: reached end of job queue
> [2017-06-15T22:34:55.236] update_node: node maxwell state set to ALLOCATED
> [2017-06-15T22:35:16.041] backfill: beginning
> [2017-06-15T22:35:16.045] =
> [2017-06-15T22:35:16.045] Begin:2017-06-15T22:35:16 End:2017-06-15T22:42:16 
> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,sanger,viterbi,watson
> [2017-06-15T22:35:16.045] =
> [2017-06-15T22:35:16.045] backfill test for JobID=1315558 Prio=22776428 
> Partition=general
> [2017-06-15T22:35:16.045] Test job 1315558 at 2017-06-15T22:35:16 on 
> hopper,maxwell,parzen,viterbi
> [2017-06-15T22:35:16.046] Job 1315558 to start at 2017-06-22T19:11:16, 

[slurm-dev] Re: Scheduling weirdness

2017-06-16 Thread Douglas Jacobsen
I typically recommend that bf_window be roughly 2x the max wall time,  this
allows for planning beyond the edge of the window.  You may need to
increase bf_resolution (it should be fine for almost all cases to go up to
300s), and potentially increase bf_interval to ensure there is enough time
for the backfill scheduler to get through your workload.

Note that things like completing nodes can cause jobs to continuously move
back in time if they are considered for planning.   An unkillable job step
script that can down nodes after some longish period of time (15 minutes
perhaps)? if need be (and nothing else is running) can help with some level
of automation here.

-Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Fri, Jun 16, 2017 at 5:21 AM, Robbert Eggermont 
wrote:

>
> On 16-06-17 11:42, TO_Webmaster wrote:
>
>> What are the values of bf_window and bf_resolution in your configuration?
>>
>> > From the documentation of bf_window: "The default value is 1440
> > minutes (one day). A value at least as long as the highest allowed
> > time limit is generally advisable to prevent job starvation."
>
> SchedulerType=sched/backfill
> SchedulerParameters=bf_window=7-0,defer
> SelectType=select/cons_res
> SelectTypeParameters=CR_CORE_MEMORY
>
> bf_resolution should be the default (60s).
>
> Our maximum time-limit is 7-0. Any ideas on what would be the optimal
> bf_window value for this?
>
> In relation to the problem: most running jobs had a time limit of 6-21,
> and the highest priority job was scheduled to start within the 7-0
> bf_window. Does that rule out the bf_window as a factor in the problem, or
> not?
>
> In the schedule iteration at 2017-06-15T23:22:10, the highest priority job
> was scheduled to start at 2017-06-22T18:29:17, 6-19:07:07 ahead, so I don't
> understand why in the same iteration the lower priority job with a longer
> 6-21 time limit was immediately started on the same node?
>
> Using defer was an attempt to get more optimal scheduling, but
> unfortunately it didn't change anything for this problem.
>
> Robbert
>
> 2017-06-16 1:16 GMT+02:00 Robbert Eggermont :
>>
>>>
>>> Hello,
>>>
>>> In our Slurm setup (now 17.02.4) I've noticed several times now that
>>> backfilled jobs push back the start time of the highest priority job.
>>> I'm not sure if this is due to a configuration error or an scheduler
>>> error,
>>> and since I'm having a hard time diagnosing what's happening, I was
>>> hoping
>>> for some insightful tips.
>>>
>>> What happens is that when the highest priority pending job needs a lot of
>>> resources (CPUs, ...), slurm will backfill lower priority jobs with less
>>> requirements but with a higher timelimit than the currently running jobs.
>>>
>>> For example, the highest priority job needs a full node, and the first
>>> node
>>> will become available in 6 days; our slurm will happily backfill all
>>> pending
>>> lower priority 2-CPU 7-day jobs on every possible node in the cluster,
>>> thus
>>> pushing back the highest priority job 1 day.
>>>
>>> Looking into the scheduler debugging info, I noticed some things I can't
>>> explain:
>>> 1) the highest priority job ("A") is not always scheduled to start on the
>>> first node ("1") that will become available;
>>> 2) in the same iteration, the backfill logic will start another, lower
>>> priority, smaller job with a timelimit longer than the expected start
>>> time
>>> of job "A" on the same node "1";
>>> 3) when "A" is scheduled to start on another node, the scheduled starting
>>> time remains the same (i.e. it is not updated to the time that the new
>>> node
>>> becomes available).
>>> 4) the scheduled starting time of the highest priority job ("A") is
>>> sometimes later than the time that the node becomes available;
>>>
>>> See below for some log entries for these events.
>>>
>>> Does anybody have an idea what's going on here, and how we can fix it?
>>>
>>> Robbert
>>>
>>>
>>> 1)
>>> JobID=1315558 has a scheduled start time on node maxwell of
>>> 2017-06-22T19:11:16; forcing it to another node (by draining maxwell)
>>> reduces the start time to 2017-06-22T16:47:43.2017-06-15T23:22:10
>>> (But slurm is consistent: when maxwell is resumed, the job is scheduled
>>> there again, with the later start time.)
>>>

 [2017-06-15T22:11:26.688] backfill: beginning
 [2017-06-15T22:11:26.693] backfill test for JobID=1315558 Prio=22703485
 Partition=general
 [2017-06-15T22:11:26.694] Job 1315558 to start at 2017-06-22T19:11:16,
 end
 at 2017-06-27T07:11:00 on maxwell
 [2017-06-15T22:11:26.694] backfill: reached end of job queue
 [2017-06-15T22:11:52.223] update_node: node maxwell state set to
 DRAINING
 [2017-06-15T22:11:56.695] backfill: beginning
 

[slurm-dev] Re: Scheduling weirdness

2017-06-16 Thread Robbert Eggermont


On 16-06-17 11:42, TO_Webmaster wrote:

What are the values of bf_window and bf_resolution in your configuration?


> From the documentation of bf_window: "The default value is 1440
> minutes (one day). A value at least as long as the highest allowed
> time limit is generally advisable to prevent job starvation."

SchedulerType=sched/backfill
SchedulerParameters=bf_window=7-0,defer
SelectType=select/cons_res
SelectTypeParameters=CR_CORE_MEMORY

bf_resolution should be the default (60s).

Our maximum time-limit is 7-0. Any ideas on what would be the optimal 
bf_window value for this?


In relation to the problem: most running jobs had a time limit of 6-21, 
and the highest priority job was scheduled to start within the 7-0 
bf_window. Does that rule out the bf_window as a factor in the problem, 
or not?


In the schedule iteration at 2017-06-15T23:22:10, the highest priority 
job was scheduled to start at 2017-06-22T18:29:17, 6-19:07:07 ahead, so 
I don't understand why in the same iteration the lower priority job with 
a longer 6-21 time limit was immediately started on the same node?


Using defer was an attempt to get more optimal scheduling, but 
unfortunately it didn't change anything for this problem.


Robbert


2017-06-16 1:16 GMT+02:00 Robbert Eggermont :


Hello,

In our Slurm setup (now 17.02.4) I've noticed several times now that
backfilled jobs push back the start time of the highest priority job.
I'm not sure if this is due to a configuration error or an scheduler error,
and since I'm having a hard time diagnosing what's happening, I was hoping
for some insightful tips.

What happens is that when the highest priority pending job needs a lot of
resources (CPUs, ...), slurm will backfill lower priority jobs with less
requirements but with a higher timelimit than the currently running jobs.

For example, the highest priority job needs a full node, and the first node
will become available in 6 days; our slurm will happily backfill all pending
lower priority 2-CPU 7-day jobs on every possible node in the cluster, thus
pushing back the highest priority job 1 day.

Looking into the scheduler debugging info, I noticed some things I can't
explain:
1) the highest priority job ("A") is not always scheduled to start on the
first node ("1") that will become available;
2) in the same iteration, the backfill logic will start another, lower
priority, smaller job with a timelimit longer than the expected start time
of job "A" on the same node "1";
3) when "A" is scheduled to start on another node, the scheduled starting
time remains the same (i.e. it is not updated to the time that the new node
becomes available).
4) the scheduled starting time of the highest priority job ("A") is
sometimes later than the time that the node becomes available;

See below for some log entries for these events.

Does anybody have an idea what's going on here, and how we can fix it?

Robbert


1)
JobID=1315558 has a scheduled start time on node maxwell of
2017-06-22T19:11:16; forcing it to another node (by draining maxwell)
reduces the start time to 2017-06-22T16:47:43.2017-06-15T23:22:10
(But slurm is consistent: when maxwell is resumed, the job is scheduled
there again, with the later start time.)


[2017-06-15T22:11:26.688] backfill: beginning
[2017-06-15T22:11:26.693] backfill test for JobID=1315558 Prio=22703485
Partition=general
[2017-06-15T22:11:26.694] Job 1315558 to start at 2017-06-22T19:11:16, end
at 2017-06-27T07:11:00 on maxwell
[2017-06-15T22:11:26.694] backfill: reached end of job queue
[2017-06-15T22:11:52.223] update_node: node maxwell state set to DRAINING
[2017-06-15T22:11:56.695] backfill: beginning
[2017-06-15T22:11:56.701] backfill test for JobID=1315558 Prio=22703485
Partition=general
[2017-06-15T22:11:56.703] Job 1315558 to start at 2017-06-22T16:47:43, end
at 2017-06-27T04:47:00 on parzen
[2017-06-15T22:11:56.703] backfill: reached end of job queue


...


[2017-06-15T22:34:46.036] backfill: beginning
[2017-06-15T22:34:46.039] =
[2017-06-15T22:34:46.039] Begin:2017-06-15T22:34:46
End:2017-06-15T22:41:462017-06-15T23:22:10
Nodes:100plus,gauss,hopper,markov,neumann,parzen,sanger,viterbi,watson
[2017-06-15T22:34:46.039] =
[2017-06-15T22:34:46.040] backfill test for JobID=1315558 Prio=22776428
Partition=general
[2017-06-15T22:34:46.040] Test job 1315558 at 2017-06-15T22:34:46 on
hopper,parzen,viterbi
[2017-06-15T22:34:46.040] Job 1315558 to start at 2017-06-22T16:47:43, end
at 2017-06-27T04:47:00 on parzen
[2017-06-15T22:34:46.040] backfill: reached end of job queue
[2017-06-15T22:34:55.236] update_node: node maxwell state set to ALLOCATED
[2017-06-15T22:35:16.041] backfill: beginning
[2017-06-15T22:35:16.045] =
[2017-06-15T22:35:16.045] Begin:2017-06-15T22:35:16
End:2017-06-15T22:42:16
Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,sanger,viterbi,watson

[slurm-dev] Re: Scheduling weirdness

2017-06-16 Thread TO_Webmaster

What are the values of bf_window and bf_resolution in your configuration?

>From the documentation of bf_window: "The default value is 1440
minutes (one day). A value at least as long as the highest allowed
time limit is generally advisable to prevent job starvation."

2017-06-16 1:16 GMT+02:00 Robbert Eggermont :
>
> Hello,
>
> In our Slurm setup (now 17.02.4) I've noticed several times now that
> backfilled jobs push back the start time of the highest priority job.
> I'm not sure if this is due to a configuration error or an scheduler error,
> and since I'm having a hard time diagnosing what's happening, I was hoping
> for some insightful tips.
>
> What happens is that when the highest priority pending job needs a lot of
> resources (CPUs, ...), slurm will backfill lower priority jobs with less
> requirements but with a higher timelimit than the currently running jobs.
>
> For example, the highest priority job needs a full node, and the first node
> will become available in 6 days; our slurm will happily backfill all pending
> lower priority 2-CPU 7-day jobs on every possible node in the cluster, thus
> pushing back the highest priority job 1 day.
>
> Looking into the scheduler debugging info, I noticed some things I can't
> explain:
> 1) the highest priority job ("A") is not always scheduled to start on the
> first node ("1") that will become available;
> 2) in the same iteration, the backfill logic will start another, lower
> priority, smaller job with a timelimit longer than the expected start time
> of job "A" on the same node "1";
> 3) when "A" is scheduled to start on another node, the scheduled starting
> time remains the same (i.e. it is not updated to the time that the new node
> becomes available).
> 4) the scheduled starting time of the highest priority job ("A") is
> sometimes later than the time that the node becomes available;
>
> See below for some log entries for these events.
>
> Does anybody have an idea what's going on here, and how we can fix it?
>
> Robbert
>
>
> 1)
> JobID=1315558 has a scheduled start time on node maxwell of
> 2017-06-22T19:11:16; forcing it to another node (by draining maxwell)
> reduces the start time to 2017-06-22T16:47:43.
> (But slurm is consistent: when maxwell is resumed, the job is scheduled
> there again, with the later start time.)
>>
>> [2017-06-15T22:11:26.688] backfill: beginning
>> [2017-06-15T22:11:26.693] backfill test for JobID=1315558 Prio=22703485
>> Partition=general
>> [2017-06-15T22:11:26.694] Job 1315558 to start at 2017-06-22T19:11:16, end
>> at 2017-06-27T07:11:00 on maxwell
>> [2017-06-15T22:11:26.694] backfill: reached end of job queue
>> [2017-06-15T22:11:52.223] update_node: node maxwell state set to DRAINING
>> [2017-06-15T22:11:56.695] backfill: beginning
>> [2017-06-15T22:11:56.701] backfill test for JobID=1315558 Prio=22703485
>> Partition=general
>> [2017-06-15T22:11:56.703] Job 1315558 to start at 2017-06-22T16:47:43, end
>> at 2017-06-27T04:47:00 on parzen
>> [2017-06-15T22:11:56.703] backfill: reached end of job queue
>
> ...
>>
>> [2017-06-15T22:34:46.036] backfill: beginning
>> [2017-06-15T22:34:46.039] =
>> [2017-06-15T22:34:46.039] Begin:2017-06-15T22:34:46
>> End:2017-06-15T22:41:46
>> Nodes:100plus,gauss,hopper,markov,neumann,parzen,sanger,viterbi,watson
>> [2017-06-15T22:34:46.039] =
>> [2017-06-15T22:34:46.040] backfill test for JobID=1315558 Prio=22776428
>> Partition=general
>> [2017-06-15T22:34:46.040] Test job 1315558 at 2017-06-15T22:34:46 on
>> hopper,parzen,viterbi
>> [2017-06-15T22:34:46.040] Job 1315558 to start at 2017-06-22T16:47:43, end
>> at 2017-06-27T04:47:00 on parzen
>> [2017-06-15T22:34:46.040] backfill: reached end of job queue
>> [2017-06-15T22:34:55.236] update_node: node maxwell state set to ALLOCATED
>> [2017-06-15T22:35:16.041] backfill: beginning
>> [2017-06-15T22:35:16.045] =
>> [2017-06-15T22:35:16.045] Begin:2017-06-15T22:35:16
>> End:2017-06-15T22:42:16
>> Nodes:100plus,gauss,hopper,markov,maxwell,neumann,parzen,sanger,viterbi,watson
>> [2017-06-15T22:35:16.045] =
>> [2017-06-15T22:35:16.045] backfill test for JobID=1315558 Prio=22776428
>> Partition=general
>> [2017-06-15T22:35:16.045] Test job 1315558 at 2017-06-15T22:35:16 on
>> hopper,maxwell,parzen,viterbi
>> [2017-06-15T22:35:16.046] Job 1315558 to start at 2017-06-22T19:11:16, end
>> at 2017-06-27T07:11:00 on maxwell
>> [2017-06-15T22:35:16.046] backfill: reached end of job queue
>
>
> 2)
> Our highest prio job is now scheduled to start on maxwell at
> 2017-06-22T18:29:17, but then another job is backfilled on maxwell with
> EndTime=2017-06-22T21:02:10??
>>
>> [2017-06-15T23:22:10.227] backfill: beginning
>> [2017-06-15T23:22:10.231] =
>> [2017-06-15T23:22:10.231] Begin:2017-06-15T23:22:10
>>