On 11 October 2011 12:55, Reuti <[email protected]> wrote: > Am 10.10.2011 um 20:46 schrieb Gerald Ragghianti: > >> We have a cluster consisting of 48-core compute nodes where we need to run >> parallel (MPI) jobs across nodes. There is a hardware limitation on the QDR >> Infiniband cards that limits the available hardware contexts to 16 per card. >> We have to ensure that we don't over-subscribe these hardware contexts >> because parallel jobs without available contexts will crash. The difficulty >> is that the contexts needed for a job are a function of the number of >> compute nodes the job uses, not the number of job slots. > > When I get you right, you are seeking for something like a complex with > "consumable HOST" (instead of JOB or YES, i.e. consume it one time on each > used exechost independent from the total number of slots granted on this > machine). Unfortunately it was discussed before but not implemented yet. > > I don't think per host consumables would be needed. With a later version of grid engine 2 queues should be sufficient. 1 queue with an exclusive resource and multi-node PEs and one without either of those. You'd have to add a slots resource at the host level to stop the host being overloaded and possibly use a JSV to ensure all jobs are appropriately directed.
Unfortunately I don't think 6.1 supports exclusive resources. William _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
