Am 09.01.2013 um 16:26 schrieb Arnau Bria:

> Hi all,
> 
> I have a doubt about how to define (if needed) queue slots when you
> already have hosts slots defined as complex_values.
> 
> Following some guides, I had some queues where I defined slots as:
> 
> [...]
> slots                 0,[@xe=16]
> [...]
> 
> I'm also configuring host's slots as a complex_value, for avoiding
> over-subscription.
> 
> So, from understanding, I'm telling the system twice that the queue has
> X slots.

If you have only one queue: yes, it's twice the same information.

But if you have more than one queue, the host complex is the slot count across 
all queues in total per host (and you could set it to an arbitrary value like 
100 on a queue level as it's limited anyway - but then the `qstat -f` output 
would be confusing listing something like 0/8/100 as 8 are used out of 100. So 
it's better as you suggested below to define two hostgroups and attach the 
correct value on a queue level too).


> My concern with the above conf, is that this conf requires a hostgroup
> (@xe) definition with all the hosts that have the same the number of
> cpus. So, if that hostgroup mixes hosts with different number of slots,
> I should defined 2 groups, and so on.

yes, see above


> I've been doing some tests removing slots values in queues, and OGS
> behaves as desired: not allocating too many jobs in one node even if
> the queue has free slots ot if the node is already running enough jobs,
> but, as I found no examples of this conf, I'm wondering if this is
> correct to define queue slots as a maximum and then define hosts with
> slots complex_values.

You mean you lowered the number of slots on a queue level for testing purpose?


> ** I'm planning to start using queue preemption (subordination), and 
> this guide https://blogs.oracle.com/templedf/entry/better_preemption 
> talks about queue slots in some way. As I've not tested preemption, I'm
> not sure if removing queue slots will cause any problem (from what I
> understand not, cause you publish how many slots are available for
> preemption... but it's just my understanding).

In this case don't limit it on a host level. The suspension is a consequence of 
the start of a superordinated job. Hence, if all slots in the subordinated 
queue are filled and in addition the slot count is limited on a host level to 
the same value, the superordinated job will never start (in principle you could 
define slots=n+1 on a host level, so that the short oversubscripton will allow 
the superordinated job to start and suspend the subordinated job).

-- Reuti


> Many thanks in advance.
> 
> Cheers,
> Arnau
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to