|
Ok, I see what is going on now. So using
"subordinate_list free=1" will suspend the entire node as soon as
1 or more core jobs enters the node. So this begs to ask, with all of the fancy OGE programmable options, why can't one specify a one-to-one suspension? So if an 8-core (slots) job goes into a 64-core node, only suspend 8 slots and not the entire node ( all 64 cores )? On 6/24/2012 2:27 AM, Reuti wrote:
Hi, Am 24.06.2012 um 11:20 schrieb Joseph A. Farran:Hi Reuti. Are you able to give more details? I don't follow. I setup 2 nodes at 8-cores each with 16 1-core jobs. When I set suspend_thresholds to .9, all 16 1-core jobs get suspended instead of 8 cores. Or are you saying that a parallel job using 8-cores cannot suspend 8 1-core jobs?correct. At least it's not working for slotwise subordination, for the traditional format it should work. If you specify just: subordinate_list free=1 it should suspend all jobs in "free" as soon as one slot is used in the superordinated queue. -- ReutiJoseph On 6/22/2012 5:43 AM, Reuti wrote:Hi, Am 22.06.2012 um 05:17 schrieb Joseph Farran:I am playing with subordinate queues. I have defined "owner" queue and "free" queue. The owner queue has: # qconf -sq owner | grep subordinate subordinate_list slots=8(free:0:sr) If I submit a 1-core job to owner queue, OGE suspends 1 core (slot) job from free queue. If I submit another 1 core job to owner queue, OGE suspend another 1 core job from free queue and so on. All is well. Now, if I submit a parallel job that uses 8 cores, OGE will still suspend 1 core job and not 8 1-core jobs, so the node is now over subscribed. What is the trick in telling OGE to suspend 8 slots if an 8 slot job is submitted?yes, it's not working with slotweise subordination. If you use the original synatx by specifying a threshold, all will be suspended as expected. -- Reuti |
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
