Hi guys, A typical node on our cluster has 64 cores and 512GB memory. So it's about 8GB/core. Occasionally, we have some jobs that utilizes only 1 core but 400-500GB of memory, that annoys lots of users. So I am seeking a way that can force jobs to run strictly below 8GB/core ration or it should be killed.
For example, the above job should ask for 64 cores in order to use 500GB of memory (we have user quota for slots). I have been trying to play around h_vmem, set it to consumable and configure RQS { name max_user_vmem enabled true description "Each user can utilize more than 8GB/slot" limit users {bad_user} to h_vmem=8g } but it seems to be setting a total vmem bad_user can use per job. I would love to set it on users instead of queue or hosts because we have applications that utilize the same set of nodes and app should be unlimited. Thanks Derrick
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users