> Am 27.06.2019 um 13:46 schrieb Dan Whitehouse <d.whiteho...@qmul.ac.uk>: > > First off, I am running UGE as opposed to SGE. > We've got a couple of systems, one running 8.5.4 and the other 8.6.5. > Users request memory resources in their job scripts by passing: > > "-l h_vmem=1G" (for example). > > We make use of a JSV and when this is set, what actually gets passed to > the scheduler is: > > "h_vmem=1G,m_mem_free=1G" > > We set "h_vmem_limit=true" in cgroups_params so it is enforced by cgroups. > > The thing that I am not entirely sure about is what we are actually > limiting here! > > If I write a program to malloc memory in a loop, then cgroups kills it > when it has allocated over 400G of ram (on a machine with about 24G). > > Looking at the output of qacct, it has used ~1G to do so. So my > assumption here is that cgroups is killing on memory used as opposed to > virtual memory allocated. Which of the two settings (h_vmem / > m_mem_free) is responsible for this, and what is the other one for? > > I'm sure this isn't the first time this has been asked, and for that I > apologise but I can't seem to find a clear explanation of this.
We don't use cgroups for now. But the allocation of memory is often delayed until you are really access the allocated space (even without cgroups in place). You could fill the allocated area with data, and test what happens then. h_vmem will be enforced by the kernel, even without cgroups. In case you make h_vmem consumable and attach a sensible value to each exechost, SGE will also keep track of this and disallow further submissions. With h_vmem used at jobsubmission, either the kernel or SGE will then notice that your job and the sum of all of its processes passed this limit and kill the job. SGE can do this, by using the additional group ID which is attached to all processes by a particular job, while the kernel might only watch a certain process. Using now an assigned cgroup, even the kernel can keep track of the overall memory consumption of sevreal processes in this assigned cgroup. -- Reuti > Thanks! > > -- > Dan Whitehouse > Research Systems Administrator, IT Services > Queen Mary University of London > Mile End > E1 4NS > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users