Am 17.04.2013 um 15:34 schrieb Arnau Bria:

> On Tue, 16 Apr 2013 17:40:12 +0200
> Reuti Reuti wrote:
> 
> Hi Reuti,
> 
> I think I'm starting to understand how it works. :-) 
> (As queuewise preemption doesn't fit my needs, I've moved to slotwise
> preemption).
> My problems was node complex_values (slots & virtual_free) definitions.
> I used to work based on that, and I saw no reference to that in any
> example.
> Once I've added preempt, I was not aware that when complex_value limits
> are reached, preemption is not evaluated.

The "problem" is, that a suspension is the result of the passed limit in 
subordinate_list. But if the additional job can't start at all, the 
subordination will never happen. There is no "look ahead" feature in SGE, that 
would allow to realize that the start of the high priority job wouldn't bypass 
the overall limit after the low priority job got suspended.


> That's what was happening in
> my previous conf. only 8 slots where allowed in my node, so new jobs
> could never start because of that). So, I must relax those limit or
> remove them (at least at host level). Thanks Reuti, without your
> answers I'd be _more_ lost (if possible).
> 
> 
> Conclusion: my current conf must change if I want to start using
> preemption.
> 
> So I've moved node slot complex to queue definition:
> 
> high-queue:
> slots                 1,[aracne13=8]
> subordinate_list      slots=4(low-el6:1:sr)
> 
> 
> low-queue:
> slots                 4
> 
> 
> I submit 16 jobs in low, wait till 4 start, submit 16 in high, and 4
> from low get suspended 8 from high start. Great!
> 
> 
> But I'm facing 2 problems when changing values...
> 
> 
> 1.-) subordinate_list slots value must be the same number as
> low-queue slots. If not I get a confusing behaivour:
> 
> I submit 16 jobs in low and none in high, and I get 4 runnig 2
> suspended:
> 
> 475878 0.06387 low        abria        r     04/17/2013 14:54:23 
> [email protected]      1
> 475879 0.06382 low        abria        r     04/17/2013 14:54:23 
> [email protected]      1
> 475880 0.06378 low        abria        r     04/17/2013 14:54:23 
> [email protected]      1
> 475881 0.06373 low        abria        r     04/17/2013 14:54:23 
> [email protected]      1
> 475882 0.06368 low        abria        S     04/17/2013 14:54:23 
> [email protected]      1
> 475883 0.06364 low        abria        S     04/17/2013 14:54:23 
> [email protected]      1

This is strange. The overall limit is "slots 4" according to your above low.q 
definition - hence 6 won't fit at all. These additional jobs should never 
start. I don't see this in 6.2u5, maybe Rayson can make a statement about OGS's 
fork.


> 4 are able to start, and 2 start as suspended....
> those 2 suspened should never start.
> 
> 
> 
> 2.-) Requeueing jobs (using trasnparent_chekpoint). I'm facing the same
> problem as
> http://www.mentby.com/Group/grid-engine/another-slotwise-preemption-question.html
>  John had on 2010.
> 
> Suspendedn (requeud) jobs get rescheduled every scheduler cycle. It's
> more or less what happens in case 1, the system detects "empty" slots
> and tries to push a job there, but then it sees that it's a subordinate
> slot and suspends (checkpoints-reque) the job.

Yep.


> So, in that thread it is said that version 6.2.u6 it must be solved,
> I'm runnning gridengine-qmaster-2011.11p1-2 (which I don't really know
> what version it corresponds), so, my question: am I affected but the
> bug or have I missconfigured something?
> I could try adding a slot complex to queue and send it to alarm (as you
> suggested, but if an upgrade solves the issue, I'll go there).
> 
> 
> Then I'll have to play with virtual_free cause I will face the same
> issue as when I define host slots.
> 
> 
> *Reuti, do you know where are the docs you talk about in the above link?

http://arc.liv.ac.uk/SGE/howto/

-- Reuti


> http://gridengine.sunsource.net/howto/APSTC-TB-2004-005.pdf 
> http://gridengine.sunsource.net/howto/checkointing.html 
> 
> TIA,
> Arnau
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to