Am 08.04.2014 um 16:02 schrieb Fan Dong: > Thanks for the help. I guess part of my original question was 'will h_vmem > help the scheduler to hold off the job if the node does not have enough > h_vmem left?' > > Say, we have > • a consumable h_vmem (qconf -mc) with default value 4GB, > • the exec host h1 and h2 both have h_vmem = 32GB (qconf -me), > • the queue a.q is configured with 18GB h_vmem (qconf -mq). > > What happens a user sends 3 jobs to a.q, assuming there are more than two > slots on each of the host ? -- will > • 3 jobs get to run simultaneously?
Yes. 4 GB times 3 will fit into the available 32 GB. In case the user requests more memory, like 18 GB for each of them, it will be different of course. -- Reuti > • or there is a job has to be held off ? (because h_vmem on each of > the host will decrease to 32-18=14G, not enough for the third job) > > > > > On 04/07/2014 11:29 AM, Reuti wrote: >> Hi, >> >> Am 07.04.2014 um 17:10 schrieb Fan Dong: >> >> >>> I am a little confused about the consumable h_vmem setup on the node and >>> the queue. Let's say we have one queue, called a.q, spans two host, h1 and >>> h2. h1 has 32GB of ram and h2 has 128GB. >>> >>> I attached h_vmem to both hosts, using the value of actual physical ram, >>> >> You defined this value `qconf -me ...` => "complex_values"? >> >> >> >>> also a.q has default h_vmem value of 18GB, which is the peak memory usage >>> of the job. >>> >> Yes, the setting in the queue is per job, while in the exechost definition >> it's across all jobs. >> >> >> >>> Here is how I understand the way h_vmem works. When the first job in a.q >>> is sent to node h1, the h_vmem on the node will decrease to 32-18=14GB, >>> >> Did you make the "h_vmem" complex consumable in `qconf -mc`? What is the >> default value specified there for it? >> >> You check with `qhost -F h_vmem` and the values are not right? >> >> >> >>> the h_vmem attached to queue will make sure that job won't use memory more >>> than 18GB. When the second job comes in, it will be sent to node h2 >>> because there is no enough h_vmem on node h1 left. >>> >> ...as the value was subtracted on a host level. >> >> >> >>> I am not sure if I am correct about the h_vmem as I have an impression >>> h_vmem won't stop jobs from being sent to a node but virtual_free does. >>> Any suggestions? >>> >> Keep in mind, the "h_vmem" is a hard limit, while "virtual_free" is a hint >> for SGE how to distribute jobs while it allows to consume more than >> requested. It depends on the workflow what fits best. >> >> -- Reuti >> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
