Hi, I have recently been trying to run OpenMPI jobs spanning several nodes on our small cluster. However, it seems to me as sub-jobs launched with qsub -inherit (by openmpi) gets killed at a memory limit of h_vmem, instead of h_vmem times the number of slots allocated to the sub-node. Is there any way to get the correct allocation to the sub nodes? I have some vague memory that I have read something about this. As it behaves now, it is impossible to run large memory MPI jobs for us. Would making h_vmem a per job consumable, rather than slot wise, give any other behaviour?
We are using OGS GE2011.11. Thanks for any hints on this issue, Mikael _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
