Hi Bill,

I think Moe gives you the right answer but it was so concise it can be
easily misunderstood.

If we take the situation you describe with a simple analysis from
backfilling algorithm point of view, the answer is job 300 should be
scheduled without any impact on jobs 201 and 202. However, what I think
Moe tried to say is there are other details to take into account, not
just total number of free cores. Those cores could be really free but,
for example, due to per-node memory requirements they can not be used.
Or maybe you have reservations which are reserving some cores but you
can not see it just looking at free cores. Or you have some licenses or
partitions limitations. Or your system does not allow to share nodes so
free cores does not mean you can use them. All this assuming you do not
have other pending jobs between job 201 and job 300. There is a
backfilling parameter max_job_bf which limits the number of jobs to be
processed by the algorithm. The default number is 50. Also, as
backfilling is so demanding it is suspended after some time. Before
resuming, if something changed in the system, the backfilling algorithm
will start from scratch. You can avoid this using bf_continue parameter.

As you can see there are a lot of details which could have an impact. We
have suffered this situation in the past and it is not always trivial to
see the reason behind scheduling decisions. I added extra debug
information for backfilling algorithm to see how resources were being
reserved by pending jobs and it was helpful. Maybe it would be
interesting to have some way for knowing why a job can not be scheduled.
There are other resource managers giving this detailed information but
it would have a cost, of course.

On 02/21/2014 12:45 AM, Bill Wichser wrote:
>
> Moe,
>
> That's quite an obfusicated answer!  I was looking for a "yes, this is
> the expected behavior" or "no, something is amuck."
>
> In the case presented, again I'll say, it is clearly evident that the
> job waiting, number 300, can run.  It has free cores, the job
> currently waiting will have plenty of cores available when the job it
> is waiting on finishes, yet it does not start simply because the time
> it requires would interfere with the current start time of the
> currently waiting job, #201.
>
> But the assertion that job 201 would be held up by starting job 300 is
> completely incorrect in this case.
>
> Now if this is the way the scheduler works, by being simple minded
> about time constraints,  then it is what it is.  I'm asking only if
> this behavior is the expected behavior.  I think you are trying to say
> that indeed this is the case.
>
> Sincerely,
> Bill
>
>
> On 2/20/2014 1:21 PM, Moe Jette wrote:
>>
>> Slurm uses what is known as a conservative backfill scheduling
>> algorithm. No job will be started that adversely impacts the expected
>> start time of _any_ higher priority job. The scheduling can also be
>> effected by a job's requirements for memory, generic resources,
>> licenses, and resource limits.
>>
>> Moe Jette
>> SchedMD LLC
>>
>>
>> Quoting Bill Wichser <[email protected]>:
>>
>>>
>>> Just a question on expected behavior of the backfill scheduler. This
>>> is an SMP machine if that matters.  Scheduler is backfill with no
>>> preemption.
>>>
>>> I have a number of jobs queued.  There are three which matter,
>>> ordered by priority.  In the current state I have 60 free cores.
>>>
>>> job 201 needs 200 cores and will start in 1 hour requiring 24 hours
>>> of runtime
>>> job 202 needs 250 cores and will start in 5 hours requiring 24 hours
>>> of runtime
>>> ...
>>> job 300 needs 30 cores and will start in 300 hours requiring 2 hours
>>> of runtime
>>>
>>> The job completing in 1 hour will free 252 cores.
>>>
>>> Clearly, starting job 300 will not impact job 201's start time in
>>> any way.  Yet it will not start since the time overlaps the expected
>>> 1 hour start time of job 201.  Is this the expected behavior?  I
>>> haven't yet checked the source code to verify that this just looks
>>> at the trivial impact on the next job but I'd expect the scheduler
>>> to be able to look a little deeper than this.
>>>
>>> Bill
>>>
>>


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer

Reply via email to