[gridengine users] h_vmem_master for the first node

Schmidt U. Fri, 07 Sep 2012 07:18:29 -0700

Dear all,

since a week we have h_vmem consumable in the cluster. Now suddenlysome massive parallel jobs are killed because of memory allocationfailure. The user are in creasing the value for h_vmem until their jobruns stable.The effects are to much "wasted" slots, because our machines havelimited amount of RAM.

The reason for that I found in
http://gridengine.org/pipermail/users/2011-September/001636.html

The virtual memory overload of the first node: overhead_vmem = bash_vmem+ mpirun_vmem + (nodes -1)*qrsh_vmem

I checked it for our environment and I am tending to enforce

#$ -l exclusive=true with use of a multiply of the available slots permachine.Does anybody have same experiences and found a flexible solution forthat ? I would wish or imagine a kind of "complex variableh_vmem_master" for the Master node.


Udo
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] h_vmem_master for the first node

Reply via email to