Hi Moe, I believe there's another place where time_limit needs to be changed back as the patch I submitted shows. When yields_lock is called there's a chance for the current job being processed by the backfilling algorithm to be started by the normal scheduler using the wrong time_limit. As the time_limit needs to be changed again in case of yields_lock returning 0, it is better to use a extra variable to save time_limit value. In other case you need to check again for qos flags or time_min parameter.
With your documentation update about this NoReserved flag, it now makes sense to me changing time_limit for those jobs since you want they to be executed in smaller intervals then preempted. But I think this flag could be used also for improving backfilling in such a way that some jobs belonging to more priority qos will get reservations and the rest will not. This would make the algorithm faster and more jobs to be checked. Maybe adding a new qos flag like "PREEMPTED" then combination of this and NoReseve making the first mode, and only the NoReserve one making the other mode. On 09/30/2011 12:18 AM, Moe Jette wrote: > Alex, > > There are definitely a couple of bugs with respect to the NoReserve > QOS. I believe the attached patch will fix these problems (based upon > your patch plus moving where the qos_ptr is set). I updated the > documentation too. > > Thanks, > Moe > > > Quoting Alejandro Lucero Palau <[email protected]>: > >> Hi, >> >> Running simulator with a long trace shows a bug in the backfilling code. >> Although I'm using a 2.2.6 version it seems it remains in 2.3. >> >> Line number 570 at plugins/sched/backfill/backfill.c checks for a job >> being from a qos with NoReserve flags on, but qos_ptr variable is >> updated just at the end of the loop so when used this is pointing to a >> wrong job. I do not add any code for solving this in the patch attached. >> >> So, line 571 modifies time_limit for a job to 1 minute. I can not >> understand why this is done since it can lead to a job from a NoReserve >> qos overtaking a more priority job. Maybe there's a reason for this but >> I can not see it. >> >> This modification needs to be changed back to avoid a job runnig with >> the wrong time_limit value, but this is not done in all the places. >> >> Patch attached solves this problem. >> >> >> WARNING / LEGAL TEXT: This message is intended only for the use of the >> individual or entity to which it is addressed and may contain >> information which is privileged, confidential, proprietary, or exempt >> from disclosure under applicable law. If you are not the intended >> recipient or the person responsible for delivering the message to the >> intended recipient, you are strictly prohibited from disclosing, >> distributing, copying, or in any way using this message. If you have >> received this communication in error, please notify the sender and >> destroy and delete any copies you may have received. >> >> http://www.bsc.es/disclaimer.htm > > WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. http://www.bsc.es/disclaimer.htm
