Ok, I see what is going on now.   So using "subordinate_list  free=1" will suspend the entire node as soon as 1 or more core jobs enters the node.

So this begs to ask, with all of the fancy OGE programmable options, why can't one specify a one-to-one suspension?    So if an 8-core (slots) job goes into a 64-core node, only suspend 8 slots and not the entire node ( all 64 cores )?


On 6/24/2012 2:27 AM, Reuti wrote:
Hi,

Am 24.06.2012 um 11:20 schrieb Joseph A. Farran:

Hi Reuti.

Are you able to give more details?   I don't follow.

I setup 2 nodes at 8-cores each with 16 1-core jobs.    When I set suspend_thresholds  to .9, all 16 1-core jobs get suspended instead of 8 cores.

Or are you saying that a parallel job using 8-cores cannot suspend 8 1-core jobs?
correct. At least it's not working for slotwise subordination, for the traditional format it should work. If you specify just:

subordinate_list      free=1

it should suspend all jobs in "free" as soon as one slot is used in the superordinated queue.

-- Reuti


Joseph


On 6/22/2012 5:43 AM, Reuti wrote:
Hi,

Am 22.06.2012 um 05:17 schrieb Joseph Farran:


I am playing with subordinate queues.   I have defined "owner" queue and "free" queue.

The owner queue has:

# qconf -sq owner | grep subordinate
subordinate_list      slots=8(free:0:sr)

If I submit a 1-core job to owner queue, OGE suspends 1 core (slot) job from free queue.   If I submit another 1 core job to owner queue, OGE suspend another 1 core job from free queue and so on.    All is well.

Now, if I submit a parallel job that uses 8 cores, OGE will still suspend 1 core job and not 8 1-core jobs, so the node is now over subscribed.    What is the trick in telling OGE to suspend 8 slots if an 8 slot job is submitted?

yes, it's not working with slotweise subordination. If you use the original synatx by specifying a threshold, all will be suspended as expected.

-- Reuti








_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to