Hi Moe,

I believe there's another place where time_limit needs to be changed
back as the patch I submitted shows. When yields_lock is called there's
a chance for the current job being processed by the backfilling
algorithm to be started by the normal scheduler using the wrong
time_limit. As the time_limit needs to be changed again in case of
yields_lock returning 0, it is better to use a extra variable to save
time_limit value. In other case you need to check again for qos flags or
time_min parameter.

With your documentation update about this NoReserved flag, it now makes
sense to me changing time_limit for those jobs since you want they to be
executed in smaller intervals then preempted. But I think this flag
could be used also for improving backfilling in such a way that some
jobs belonging to more priority qos will get reservations and the rest
will not. This would make the algorithm faster and more jobs to be checked.

Maybe adding a new qos flag like "PREEMPTED" then combination of this
and NoReseve making the first mode, and only the NoReserve one making
the other mode.


On 09/30/2011 12:18 AM, Moe Jette wrote:
> Alex,
>
> There are definitely a couple of bugs with respect to the NoReserve
> QOS. I believe the attached patch will fix these problems (based upon
> your patch plus moving where the qos_ptr is set). I updated the
> documentation too.
>
> Thanks,
> Moe
>
>
> Quoting Alejandro Lucero Palau <[email protected]>:
>
>> Hi,
>>
>> Running simulator with a long trace shows a bug in the backfilling code.
>> Although I'm using a 2.2.6 version it seems it remains in 2.3.
>>
>> Line number 570 at plugins/sched/backfill/backfill.c checks for a job
>> being from a qos with NoReserve flags on, but qos_ptr variable is
>> updated just at the end of the loop so when used this is pointing to a
>> wrong job. I do not add any code for solving this in the patch attached.
>>
>> So, line 571 modifies time_limit for a job to 1 minute. I can not
>> understand why this is done since it can lead to a job from a NoReserve
>> qos overtaking a more priority job. Maybe there's a reason for this but
>> I can not see it.
>>
>> This modification needs to be changed back to avoid a job runnig with
>> the wrong time_limit value, but this is not done in all the places.
>>
>> Patch attached solves this problem.
>>
>>
>> WARNING / LEGAL TEXT: This message is intended only for the use of the
>> individual or entity to which it is addressed and may contain
>> information which is privileged, confidential, proprietary, or exempt
>> from disclosure under applicable law. If you are not the intended
>> recipient or the person responsible for delivering the message to the
>> intended recipient, you are strictly prohibited from disclosing,
>> distributing, copying, or in any way using this message. If you have
>> received this communication in error, please notify the sender and
>> destroy and delete any copies you may have received.
>>
>> http://www.bsc.es/disclaimer.htm
>
>


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm

Reply via email to