> Am 27.06.2019 um 13:46 schrieb Dan Whitehouse <d.whiteho...@qmul.ac.uk>:
> 
> First off, I am running UGE as opposed to SGE.
> We've got a couple of systems, one running 8.5.4 and the other 8.6.5.
> Users request memory resources in their job scripts by passing:
> 
> "-l h_vmem=1G" (for example).
> 
> We make use of a JSV and when this is set, what actually gets passed to 
> the scheduler is:
> 
> "h_vmem=1G,m_mem_free=1G"
> 
> We set "h_vmem_limit=true" in cgroups_params so it is enforced by cgroups.
> 
> The thing that I am not entirely sure about is what we are actually 
> limiting here!
> 
> If I write a program to malloc memory in a loop, then cgroups kills it 
> when it has allocated over 400G of ram (on a machine with about 24G).
> 
> Looking at the output of qacct, it has used ~1G to do so. So my 
> assumption here is that cgroups is killing on memory used as opposed to 
> virtual memory allocated. Which of the two settings (h_vmem / 
> m_mem_free) is responsible for this, and what is the other one for?
> 
> I'm sure this isn't the first time this has been asked, and for that I 
> apologise but I can't seem to find a clear explanation of this.

We don't use cgroups for now. But the allocation of memory is often delayed 
until you are really access the allocated space (even without cgroups in 
place). You could fill the allocated area with data, and test what happens then.

h_vmem will be enforced by the kernel, even without cgroups. In case you make 
h_vmem consumable and attach a sensible value to each exechost, SGE will also 
keep track of this and disallow further submissions.

With h_vmem used at jobsubmission, either the kernel or SGE will then notice 
that your job and the sum of all of its processes passed this limit and kill 
the job. SGE can do this, by using the additional group ID which is attached to 
all processes by a particular job, while the kernel might only watch a certain 
process.

Using now an assigned cgroup, even the kernel can keep track of the overall 
memory consumption of sevreal processes in this assigned cgroup.

-- Reuti


> Thanks!
> 
> -- 
> Dan Whitehouse
> Research Systems Administrator, IT Services
> Queen Mary University of London
> Mile End
> E1 4NS
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to