Hi guys,

A typical node on our cluster has 64 cores and 512GB memory. So it's about
8GB/core. Occasionally, we have some jobs that utilizes only 1 core but
400-500GB of memory, that annoys lots of users. So I am seeking a way that
can force jobs to run strictly below 8GB/core ration or it should be killed.

For example, the above job should ask for 64 cores in order to use 500GB of
memory (we have user quota for slots).

I have been trying to play around h_vmem, set it to consumable and
configure RQS

{
        name    max_user_vmem
        enabled true
        description     "Each user can utilize more than 8GB/slot"
        limit   users {bad_user} to h_vmem=8g
}

but it seems to be setting a total vmem bad_user can use per job.

I would love to set it on users instead of queue or hosts because we have
applications that utilize the same set of nodes and app should be unlimited.

Thanks
Derrick
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to