On 11 October 2011 12:55, Reuti <[email protected]> wrote:
> Am 10.10.2011 um 20:46 schrieb Gerald Ragghianti:
>
>> We have a cluster consisting of 48-core compute nodes where we need to run 
>> parallel (MPI) jobs across nodes.  There is a hardware limitation on the QDR 
>> Infiniband cards that limits the available hardware contexts to 16 per card. 
>>  We have to ensure that we don't over-subscribe these hardware contexts 
>> because parallel jobs without available contexts will crash.  The difficulty 
>> is that the contexts needed for a job are a function of the number of 
>> compute nodes the job uses, not the number of job slots.
>
> When I get you right, you are seeking for something like a complex with 
> "consumable HOST" (instead of JOB or YES, i.e. consume it one time on each 
> used exechost independent from the total number of slots granted on this 
> machine). Unfortunately it was discussed before but not implemented yet.
>
>
I don't think per host consumables would be needed.  With a later
version of grid engine 2 queues should be sufficient.
1 queue with an exclusive resource and multi-node PEs and one without
either of those.  You'd have to add a slots resource at the host level
to stop the host being overloaded and possibly use a JSV to ensure all
jobs are appropriately directed.

Unfortunately I don't think 6.1 supports exclusive resources.

William

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to