Am 09.04.2014 um 15:47 schrieb Fan Dong: > # qconf -se comp92 > > ... > load_scaling NONE > complex_values slots=8,num_proc=8,h_vmem=32g
g ≠ G Please check `man sge_types`. -- Reuti > load_values arch=linux-x64,num_proc=8,mem_total=32108.289062M, \ > swap_total=1999.992188M,virtual_total=34108.281250M, \ > load_avg=1.000000,load_short=1.000000, \ > load_medium=1.000000,load_long=1.000000, \ > mem_free=30152.914062M,swap_free=1999.992188M, \ > virtual_free=32152.906250M,mem_used=1955.375000M, \ > swap_used=0.000000M,virtual_used=1955.375000M, \ > cpu=12.500000,m_topology=SCTTSCTTSCTTSCTT, \ > m_topology_inuse=SCTTSCTTSCTTSCTT,m_socket=4,m_core=4, \ > np_load_avg=0.125000,np_load_short=0.125000, \ > np_load_medium=0.125000,np_load_long=0.125000 > ... > ... > > On 04/09/2014 09:45 AM, Reuti wrote: >> Am 09.04.2014 um 15:43 schrieb Fan Dong: >> >> >>> Here comes another question. comp02 is one exec nodes, it has 32GB memory. >>> bigmem_16.q has 2 slots on this node. A bunch of jobs are submitted with >>> -l h_vmem=16G. At the first a few hours, both 2 slots on comp92 are used >>> just as expected until this morning I found the following numbers that I >>> donot understand. First off, only one slot on comp92 is used (the same is >>> true for rest of our nodes similar to comp92). The h_vmem on comp92 shows >>> 13.802G and that's why jobs are held in the queue because they request >>> h_vmem=16G. What might be the cause for this 'weird' h_vmem value? >>> >>> >>> The is taken from running 'qhost -F h_vmem' >>> >>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO >>> SWAPUS >>> comp92 linux-x64 8 1.00 31.4G 1.9G 2.0G >>> 0.0 >>> Host Resource(s): hc:h_vmem=13.802G >>> >>> >>> This is taken from running 'qstat -f -q bigmem_16.q' >>> >>> queuename qtype resv/used/tot. load_avg arch >>> states >>> --------------------------------------------------------------------------------- >>> >>> [email protected] >>> BI 0/1/2 1.00 linux-x64 >>> 44946 0.55500 pipeline_w training r 04/09/2014 07:19:30 1 >>> >> What was defined in `qconf -se comp92` for "h_vmem"? >> >> -- Reuti >> >> >> >>> On 04/08/2014 12:13 PM, Reuti wrote: >>> >>>> Am 08.04.2014 um 16:45 schrieb Fan Dong: >>>> >>>> >>>> >>>>> On 04/08/2014 10:41 AM, Reuti wrote: >>>>> >>>>> >>>>>> Am 08.04.2014 um 16:34 schrieb Fan Dong: >>>>>> >>>>>> >>>>>> >>>>>>> On 04/08/2014 10:17 AM, Reuti wrote: >>>>>>> >>>>>>> >>>>>>>> Am 08.04.2014 um 16:02 schrieb Fan Dong: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Thanks for the help. I guess part of my original question was 'will >>>>>>>>> h_vmem help the scheduler to hold off the job if the node does not >>>>>>>>> have enough h_vmem left?' >>>>>>>>> >>>>>>>>> Say, we have >>>>>>>>> • a consumable h_vmem (qconf -mc) with default value 4GB, >>>>>>>>> • the exec host h1 and h2 both have h_vmem = 32GB (qconf -me), >>>>>>>>> • the queue a.q is configured with 18GB h_vmem (qconf -mq). >>>>>>>>> >>>>>>>>> What happens a user sends 3 jobs to a.q, assuming there are more than >>>>>>>>> two slots on each of the host ? -- will >>>>>>>>> • 3 jobs get to run simultaneously? >>>>>>>>> >>>>>>>>> >>>>>>>> Yes. 4 GB times 3 will fit into the available 32 GB. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Then what is the use of h_vmem setup in the queue??? h_vmem has the >>>>>>> value of 18GB in a.q, how does that come into the play? >>>>>>> >>>>>>> >>>>>> It is the maximum a user can request per job. >>>>>> >>>>>> They get 4 GB by default, but they can request more - up to 18 GB for a >>>>>> particular job. In case they request more than 18GB , the job will never >>>>>> start. >>>>>> >>>>>> Nevertheless, the overall consumption of memory will be restricted by >>>>>> the definition on the host level, i.e. that all jobs in total on an >>>>>> exechost will never exceed 32 GB. >>>>>> >>>>>> >>>>> Excellent! but just to double check - if a user does not explicitly use >>>>> qsub -l h_vmem in the submission script, the default 4GB will be used. Is >>>>> that correct? >>>>> >>>>> >>>> As long as the "h_vmem" complex is assigned to each exechost with a >>>> suitable value: yes. >>>> >>>> -- Reuti >>>> >>>> >>>> >>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Shouldn't the h_vmem in the queue override the default global >>>>>>> consumable value?? You suggested earlier that the h_vmem attached to >>>>>>> the queue is enforced per job but your calculation '4GB times 3' seems >>>>>>> ignore the h_vmem in the queue. Could you please clarify? Thank you. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> In case the user requests more memory, like 18 GB for each of them, it >>>>>>>> will be different of course. >>>>>>>> >>>>>>>> -- Reuti >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> • or there is a job has to be held off ? (because h_vmem on >>>>>>>>> each of the host will decrease to 32-18=14G, not enough for the third >>>>>>>>> job) >>>>>>>>> >>>>>>>>> >>>>>>>>> On 04/07/2014 11:29 AM, Reuti wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Am 07.04.2014 um 17:10 schrieb Fan Dong: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I am a little confused about the consumable h_vmem setup on the >>>>>>>>>>> node and the queue. Let's say we have one queue, called a.q, spans >>>>>>>>>>> two host, h1 and h2. h1 has 32GB of ram and h2 has 128GB. >>>>>>>>>>> >>>>>>>>>>> I attached h_vmem to both hosts, using the value of actual physical >>>>>>>>>>> ram, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> You defined this value `qconf -me ...` => "complex_values"? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> also a.q has default h_vmem value of 18GB, which is the peak memory >>>>>>>>>>> usage of the job. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes, the setting in the queue is per job, while in the exechost >>>>>>>>>> definition it's across all jobs. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Here is how I understand the way h_vmem works. When the first job >>>>>>>>>>> in a.q is sent to node h1, the h_vmem on the node will decrease to >>>>>>>>>>> 32-18=14GB, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Did you make the "h_vmem" complex consumable in `qconf -mc`? What is >>>>>>>>>> the default value specified there for it? >>>>>>>>>> >>>>>>>>>> You check with `qhost -F h_vmem` and the values are not right? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> the h_vmem attached to queue will make sure that job won't use >>>>>>>>>>> memory more than 18GB. When the second job comes in, it will be >>>>>>>>>>> sent to node h2 because there is no enough h_vmem on node h1 left. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> ...as the value was subtracted on a host level. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I am not sure if I am correct about the h_vmem as I have an >>>>>>>>>>> impression h_vmem won't stop jobs from being sent to a node but >>>>>>>>>>> virtual_free does. Any suggestions? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Keep in mind, the "h_vmem" is a hard limit, while "virtual_free" is >>>>>>>>>> a hint for SGE how to distribute jobs while it allows to consume >>>>>>>>>> more than requested. It depends on the workflow what fits best. >>>>>>>>>> >>>>>>>>>> -- Reuti >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
