Am 08.04.2014 um 16:02 schrieb Fan Dong:

> Thanks for the help.  I guess part of my original question was 'will h_vmem 
> help the scheduler to hold off the job if the node does not have enough 
> h_vmem left?'
> 
> Say, we have  
>       • a consumable h_vmem (qconf -mc) with default value 4GB, 
>       • the exec host h1 and h2 both have h_vmem = 32GB  (qconf -me), 
>       • the queue a.q is configured with 18GB h_vmem (qconf -mq).
> 
> What happens a user sends 3 jobs to a.q, assuming there are more than two 
> slots on each of the host ? -- will 
>       • 3 jobs get to run simultaneously?  

Yes. 4 GB times 3 will fit into the available 32 GB.

In case the user requests more memory, like 18 GB for each of them, it will be 
different of course.

-- Reuti


>       • or there is a job has to be held off ?  (because h_vmem on each of 
> the host will decrease to 32-18=14G, not enough for the third job)
> 
> 
>  
> 
> On 04/07/2014 11:29 AM, Reuti wrote:
>> Hi,
>> 
>> Am 07.04.2014 um 17:10 schrieb Fan Dong:
>> 
>> 
>>> I am a little confused about the consumable h_vmem setup on the node and 
>>> the queue.  Let's say we have one queue, called a.q, spans two host, h1 and 
>>> h2.  h1 has 32GB of ram and h2 has 128GB.
>>> 
>>> I attached h_vmem to both hosts, using the value of actual physical ram,
>>> 
>> You defined this value `qconf -me ...` => "complex_values"?
>> 
>> 
>> 
>>> also a.q has default h_vmem value of 18GB, which is the peak memory usage 
>>> of the job.
>>> 
>> Yes, the setting in the queue is per job, while in the exechost definition 
>> it's across all jobs.
>> 
>> 
>> 
>>> Here is how I understand the way h_vmem works.  When the first job in a.q 
>>> is sent to node h1, the h_vmem on the node will decrease to 32-18=14GB, 
>>> 
>> Did you make the "h_vmem" complex consumable in `qconf -mc`? What is the 
>> default value specified there for it?
>> 
>> You check with `qhost -F h_vmem` and the values are not right?
>> 
>> 
>> 
>>> the h_vmem attached to queue will make sure that job won't use memory more 
>>> than 18GB.  When the second job comes in, it will be sent to node h2 
>>> because there is no enough h_vmem on node h1 left.
>>> 
>> ...as the value was subtracted on a host level.
>> 
>> 
>> 
>>> I am not sure if I am correct about the h_vmem as I have an impression 
>>> h_vmem won't stop jobs from being sent to a node but virtual_free does.  
>>> Any suggestions?
>>> 
>> Keep in mind, the "h_vmem" is a hard limit, while "virtual_free" is a hint 
>> for SGE how to distribute jobs while it allows to consume more than 
>> requested. It depends on the workflow what fits best.
>> 
>> -- Reuti
>> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to