Re: [gridengine users] Memory allocation woes. Any thoughts?

Reuti Wed, 12 Dec 2012 02:54:56 -0800

Am 12.12.2012 um 02:17 schrieb Jake Carroll:

> Cool.
> 
> Thanks for the response guys. See in line:
> 
> 
> On 12/12/12 6:45 AM, "Reuti" <[email protected]> wrote:
> 
>> Am 11.12.2012 um 21:32 schrieb Gowtham:
>> 
>>> I second Alex's thoughts. In all our clusters, we only use h_vmem
>> 
>> The difference is that virtual_free is only a guidance for SGE, but
>> h_vmem will also be enforced. It depends on the working style of the
>> users/groups which you want to prefer to use. Is only one group using a
>> cluster I prefer virtual_free, as they are checking their results and
>> prediction of memory requests, but with many groups in a cluster
>> enforcing h_vmem might be more suitable to avoid oversubscription.
> 
> That is indeed our situation. It's very much a multi-tenancy environment
> [probably about 50 or 60 users and 10 groups of therein]. So, to that end,
> I should enable/allow users to make h_vmem requestable and set it as a
> consumable?


Yes, you also need to attach an initial value to each exechost set to the built 
in physical RAM. - Reuti


> Cheers.
> 
> --JC
> 
> 
>> 
>> -- Reuti
>> 
>> 
>>> (to indicate the hard cap per job) and mem_free (a suggestion to
>>> the scheduler as to which node the job should be started on).
>>> 
>>> Best regards,
>>> g
>>> 
>>> --
>>> Gowtham
>>> Information Technology Services
>>> Michigan Technological University
>>> 
>>> (906) 487/3593
>>> http://www.it.mtu.edu/
>>> 
>>> 
>>> On Tue, 11 Dec 2012, Alex Chekholko wrote:
>>> 
>>> | Hi Jake,
>>> | 
>>> | You can do 'qhost -F h_vmem,mem_free,virtual_free', that might be a
>>> useful
>>> | view for you.
>>> | 
>>> | In general, I've only ever used one of the three complexes above.
>>> | 
>>> | Which one(s) do you have defined for the execution hosts? e.g.
>>> | qconf -se compute-1-7
>>> | 
>>> | h_vmem will map to 'ulimit -v'
>>> | mem_free just tracks 'free'
>>> | virtual_free I'm not sure, I'd have to search the mailing list
>>> archives.
>>> | 
>>> | I recommend you just use one of those three complexes.  If you want
>>> to set a
>>> | hard memory limit for jobs, use h_vmem.  If you want to just suggest
>>> to the
>>> | scheduler, use mem_free, it will use the current instantaneous
>>> mem_free level
>>> | during job scheduling (well, the lower of the consumable mem_free (if
>>> you
>>> | havve that defined) and the actual current mem_free).
>>> | 
>>> | What is the compelling reason to use virtual_free?  I guess it
>>> includes swap?
>>> | 
>>> | Regards,
>>> | Alex
>>> | 
>>> | 
>>> | On 12/7/12 2:31 AM, Jake Carroll wrote:
>>> | > Hi all.
>>> | > 
>>> | > We've got some memory allocation/memory contention issues our users
>>> are
>>> | > complaining about. Many are saying they can't get their jobs to run
>>> | > because of memory resource issues.
>>> | > 
>>> | > An example:
>>> | > 
>>> | > scheduling info:
>>> | >              (-l h_vmem=24G,virtual_free=24G) cannot run at host
>>> | > "compute-2-3.local" because it offers only hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-12.local" because it offers only
>>> hc:virtual_free=12.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-6.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-10.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-11.local" because it offers only
>>> hc:virtual_free=2.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-9.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-2-1.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-3.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-0.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-4.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-14.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-8.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-1-6.local" because it offers only
>>> hc:virtual_free=5.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-2-2.local" because it offers only
>>> hc:virtual_free=12.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-5.local" because it offers only
>>> hc:virtual_free=4.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-1-3.local" because it offers only
>>> hc:virtual_free=5.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-0-7.local" because it offers only
>>> hc:virtual_free=12.000G
>>> | >                              (-l h_vmem=24G,virtual_free=24G)
>>> cannot run
>>> | > at host "compute-1-5.local" because it offers only
>>> hc:virtual_free=5.000G
>>> | > 
>>> | > Another example, of a user who's job is successfully running:
>>> | > 
>>> | > hard resource_list:         mem_free=100G
>>> | > mail_list:                  xyz
>>> | > notify:                     FALSE
>>> | > job_name:                   mlmassoc_GRMi
>>> | > stdout_path_list:           NONE:NONE:/commented.out
>>> | > jobshare:                   0
>>> | > env_list:
>>> | > script_file:                /commented.out
>>> | > usage    1:                 cpu=2:08:09:22, mem=712416.09719 GBs,
>>> | > io=0.59519, vmem=3.379G, maxvmem=4.124G
>>> | > 
>>> | > If I look at the qhost outputs:
>>> | > 
>>> | > [root@cluster ~]# qhost
>>> | > HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE
>>> SWAPTO
>>> | >   SWAPUS
>>> | > 
>>> | > 
>>> -------------------------------------------------------------------------
>>> ------
>>> | > global                  -               -     -       -       -
>>> -
>>> | >        -
>>> | > compute-0-0             lx26-amd64     24  6.49   94.6G    5.5G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-1             lx26-amd64     24 10.71   94.6G    5.9G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-10            lx26-amd64     24  6.09   94.6G    5.1G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-11            lx26-amd64     24  6.10   94.6G    5.5G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-12            lx26-amd64     24  6.12   94.6G    8.1G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-13            lx26-amd64     24  8.41   94.6G    5.3G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-14            lx26-amd64     24  7.32   94.6G    7.6G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-15            lx26-amd64     24 10.42   94.6G    6.3G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-2             lx26-amd64     24  9.67   94.6G    5.5G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-3             lx26-amd64     24  7.17   94.6G    5.5G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-4             lx26-amd64     24  6.13   94.6G    4.0G
>>> 996.2M
>>> | >    27.5M
>>> | > compute-0-5             lx26-amd64     24  6.36   94.6G    5.4G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-6             lx26-amd64     24  6.35   94.6G    6.4G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-7             lx26-amd64     24  8.08   94.6G    6.0G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-8             lx26-amd64     24  6.12   94.6G    8.4G
>>> 0.0
>>> | >      0.0
>>> | > compute-0-9             lx26-amd64     24  6.12   94.6G    5.9G
>>> 0.0
>>> | >      0.0
>>> | > compute-1-0             lx26-amd64     80 30.13  378.7G   36.2G
>>> 0.0
>>> | >      0.0
>>> | > compute-1-1             lx26-amd64     80 28.93  378.7G   21.8G
>>> 996.2M
>>> | >   168.1M
>>> | > compute-1-2             lx26-amd64     80 29.84  378.7G   23.2G
>>> 996.2M
>>> | >    46.8M
>>> | > compute-1-3             lx26-amd64     80 27.03  378.7G   24.4G
>>> 996.2M
>>> | >    39.3M
>>> | > compute-1-4             lx26-amd64     80 28.05  378.7G   23.2G
>>> 996.2M
>>> | >   122.0M
>>> | > compute-1-5             lx26-amd64     80 27.47  378.7G   23.5G
>>> 996.2M
>>> | >   161.4M
>>> | > compute-1-6             lx26-amd64     80 25.07  378.7G   25.6G
>>> 996.2M
>>> | >    91.5M
>>> | > compute-1-7             lx26-amd64     80 26.98  378.7G   22.8G
>>> 996.2M
>>> | >   115.9M
>>> | > compute-2-0             lx26-amd64     32 11.03   47.2G    2.6G
>>> 1000.0M
>>> | >    67.1M
>>> | > compute-2-1             lx26-amd64     32  8.35   47.2G    3.7G
>>> 1000.0M
>>> | >    11.4M
>>> | > compute-2-2             lx26-amd64     32 10.10   47.2G    1.7G
>>> 1000.0M
>>> | >   126.5M
>>> | > compute-2-3             lx26-amd64     32  7.02   47.2G    3.4G
>>> 1000.0M
>>> | >    11.3M
>>> | > 
>>> | > So, it would seem to me we've got _plenty_ of actual resources
>>> free, but
>>> | > our virtual_free complex seems to be doing something
>>> funny/misguided?
>>> | > 
>>> | > I'm worried that our virtual_free complex might actually be doing
>>> more
>>> | > harm than god here
>>> | > 
>>> | > Here is an example of some qhost F output on two different node
>>> types:
>>> | > 
>>> | > compute-2-3             lx26-amd64     32  7.00   47.2G    3.4G
>>> 1000.0M
>>> | >    11.3M
>>> | >     hl:arch=lx26-amd64
>>> | >     hl:num_proc=32.000000
>>> | >     hl:mem_total=47.187G
>>> | >     hl:swap_total=999.992M
>>> | >     hl:virtual_total=48.163G
>>> | >     hl:load_avg=7.000000
>>> | >     hl:load_short=7.000000
>>> | >     hl:load_medium=7.000000
>>> | >     hl:load_long=7.060000
>>> | >     hl:mem_free=43.788G
>>> | >     hl:swap_free=988.703M
>>> | >     hc:virtual_free=4.000G
>>> | >     hl:mem_used=3.398G
>>> | >     hl:swap_used=11.289M
>>> | >     hl:virtual_used=3.409G
>>> | >     hl:cpu=6.400000
>>> | >     hl:m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT
>>> | >     
>>> hl:m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT
>>> | >     hl:m_socket=2.000000
>>> | >     hl:m_core=16.000000
>>> | >     hl:np_load_avg=0.218750
>>> | >     hl:np_load_short=0.218750
>>> | >     hl:np_load_medium=0.218750
>>> | >     hl:np_load_long=0.220625
>>> | > 
>>> | > compute-1-7             lx26-amd64     80 27.83  378.7G   22.8G
>>> 996.2M
>>> | >   115.9M
>>> | >     hl:arch=lx26-amd64
>>> | >     hl:num_proc=80.000000
>>> | >     hl:mem_total=378.652G
>>> | >     hl:swap_total=996.207M
>>> | >     hl:virtual_total=379.624G
>>> | >     hl:load_avg=27.830000
>>> | >     hl:load_short=29.050000
>>> | >     hl:load_medium=27.830000
>>> | >     hl:load_long=27.360000
>>> | >     hl:mem_free=355.814G
>>> | >     hl:swap_free=880.266M
>>> | >     hc:virtual_free=13.000G
>>> | >     hl:mem_used=22.838G
>>> | >     hl:swap_used=115.941M
>>> | >     hl:virtual_used=22.951G
>>> | >     hl:cpu=33.600000
>>> | > 
>>> | >   
>>> hl:m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTT
>>> CTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT
>>> | > 
>>> | >   
>>> hl:m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTT
>>> CTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT
>>> | >     hl:m_socket=4.000000
>>> | >     hl:m_core=40.000000
>>> | >     hl:np_load_avg=0.347875
>>> | >     hl:np_load_short=0.363125
>>> | >     hl:np_load_medium=0.347875
>>> | >     hl:np_load_long=0.342000
>>> | > 
>>> | > Our virtual free complex, designated as memory complex, relation
>>> <=, is
>>> | > request able, is set as a consumable and has a default of 2.
>>> | > 
>>> | > I guess what I'd like to aim for is some sane memory management and
>>> a
>>> | > way of setting up some "rules" for my users so they can allocate
>>> | > sensible amounts of RAM, that reflect really what the
>>> hosts/execution
>>> | > nodes are capable of.
>>> | > 
>>> | > I've got (unfortunately!) three types of nodes in the one queue. One
>>> | > type has 384GB of RAM. One type has 96GB of RAM. One type has 48GB
>>> of RAM.
>>> | > 
>>> | > Are my users just expecting too much? Are there some caps/resource
>>> | > limits I should put in place to manage expectations or simplyinvest
>>> in
>>> | > some "big memory" nodes for really large jobs and make a separate
>>> | > highmem.q for such tasks? You'll see above some users have tried
>>> asking
>>> | > for 100GB as the mem_free complex is used.
>>> | > 
>>> | > Thoughts/experiences/ideas?
>>> | > 
>>> | > Thanks for your time, all.
>>> | > 
>>> | > --JC
>>> | > 
>>> | _______________________________________________
>>> | users mailing list
>>> | [email protected]
>>> | https://gridengine.org/mailman/listinfo/users
>>> | _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Memory allocation woes. Any thoughts?

Reply via email to