Am 08.04.2014 um 16:45 schrieb Fan Dong: > > On 04/08/2014 10:41 AM, Reuti wrote: >> Am 08.04.2014 um 16:34 schrieb Fan Dong: >> >>> On 04/08/2014 10:17 AM, Reuti wrote: >>>> Am 08.04.2014 um 16:02 schrieb Fan Dong: >>>> >>>>> Thanks for the help. I guess part of my original question was 'will >>>>> h_vmem help the scheduler to hold off the job if the node does not have >>>>> enough h_vmem left?' >>>>> >>>>> Say, we have >>>>> • a consumable h_vmem (qconf -mc) with default value 4GB, >>>>> • the exec host h1 and h2 both have h_vmem = 32GB (qconf -me), >>>>> • the queue a.q is configured with 18GB h_vmem (qconf -mq). >>>>> >>>>> What happens a user sends 3 jobs to a.q, assuming there are more than two >>>>> slots on each of the host ? -- will >>>>> • 3 jobs get to run simultaneously? >>>> Yes. 4 GB times 3 will fit into the available 32 GB. >>>> >>> Then what is the use of h_vmem setup in the queue??? h_vmem has the value >>> of 18GB in a.q, how does that come into the play? >> It is the maximum a user can request per job. >> >> They get 4 GB by default, but they can request more - up to 18 GB for a >> particular job. In case they request more than 18GB , the job will never >> start. >> >> Nevertheless, the overall consumption of memory will be restricted by the >> definition on the host level, i.e. that all jobs in total on an exechost >> will never exceed 32 GB. > Excellent! but just to double check - if a user does not explicitly use qsub > -l h_vmem in the submission script, the default 4GB will be used. Is that > correct?
As long as the "h_vmem" complex is assigned to each exechost with a suitable value: yes. -- Reuti > >> -- Reuti >> >> >>> Shouldn't the h_vmem in the queue override the default global consumable >>> value?? You suggested earlier that the h_vmem attached to the queue is >>> enforced per job but your calculation '4GB times 3' seems ignore the h_vmem >>> in the queue. Could you please clarify? Thank you. >>> >>> >>> >>>> In case the user requests more memory, like 18 GB for each of them, it >>>> will be different of course. >>>> >>>> -- Reuti >>>> >>>> >>>>> • or there is a job has to be held off ? (because h_vmem on each of >>>>> the host will decrease to 32-18=14G, not enough for the third job) >>>>> >>>>> >>>>> On 04/07/2014 11:29 AM, Reuti wrote: >>>>>> Hi, >>>>>> >>>>>> Am 07.04.2014 um 17:10 schrieb Fan Dong: >>>>>> >>>>>> >>>>>>> I am a little confused about the consumable h_vmem setup on the node >>>>>>> and the queue. Let's say we have one queue, called a.q, spans two >>>>>>> host, h1 and h2. h1 has 32GB of ram and h2 has 128GB. >>>>>>> >>>>>>> I attached h_vmem to both hosts, using the value of actual physical ram, >>>>>>> >>>>>> You defined this value `qconf -me ...` => "complex_values"? >>>>>> >>>>>> >>>>>> >>>>>>> also a.q has default h_vmem value of 18GB, which is the peak memory >>>>>>> usage of the job. >>>>>>> >>>>>> Yes, the setting in the queue is per job, while in the exechost >>>>>> definition it's across all jobs. >>>>>> >>>>>> >>>>>> >>>>>>> Here is how I understand the way h_vmem works. When the first job in >>>>>>> a.q is sent to node h1, the h_vmem on the node will decrease to >>>>>>> 32-18=14GB, >>>>>>> >>>>>> Did you make the "h_vmem" complex consumable in `qconf -mc`? What is the >>>>>> default value specified there for it? >>>>>> >>>>>> You check with `qhost -F h_vmem` and the values are not right? >>>>>> >>>>>> >>>>>> >>>>>>> the h_vmem attached to queue will make sure that job won't use memory >>>>>>> more than 18GB. When the second job comes in, it will be sent to node >>>>>>> h2 because there is no enough h_vmem on node h1 left. >>>>>>> >>>>>> ...as the value was subtracted on a host level. >>>>>> >>>>>> >>>>>> >>>>>>> I am not sure if I am correct about the h_vmem as I have an impression >>>>>>> h_vmem won't stop jobs from being sent to a node but virtual_free does. >>>>>>> Any suggestions? >>>>>>> >>>>>> Keep in mind, the "h_vmem" is a hard limit, while "virtual_free" is a >>>>>> hint for SGE how to distribute jobs while it allows to consume more than >>>>>> requested. It depends on the workflow what fits best. >>>>>> >>>>>> -- Reuti >>>>>> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
