[gridengine users] Parallel GE jobs on 48-way nodes

Gerald Ragghianti Mon, 10 Oct 2011 11:47:45 -0700

We have a cluster consisting of 48-core compute nodes where we need torun parallel (MPI) jobs across nodes. There is a hardware limitation onthe QDR Infiniband cards that limits the available hardware contexts to16 per card. We have to ensure that we don't over-subscribe thesehardware contexts because parallel jobs without available contexts willcrash. The difficulty is that the contexts needed for a job are afunction of the number of compute nodes the job uses, not the number ofjob slots.

We don't want to make each node dedicated to a single job because wealso want to be able to run smaller multi-threaded and single-slot jobs.If we assume (for now) that we allow each parallel job to use all 16contexts on each compute node, how can we ensure that no other paralleljobs will be allocated to these nodes?


GE version: 6.1u5

--
Gerald Ragghianti



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] Parallel GE jobs on 48-way nodes

Reply via email to