Thanks, haven't done any JSV but I will give it a try. I actually am thinking to just set 3 nodes worth of h_vmem quota to each user (eg. h_vmem=512*3G). So together with slot quota it seems to do the job. Or am I too naive?
Cheers, D Sent from my iPad > On 1 Jul 2014, at 6:54 am, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: > > If that's the case, why not craft a JSV that takes the amount of > requested RAM (again, assuming it is a requestable and consumable > resource) and figures out how many cores it equates to, thus reqriting > the job with the number of cores also consumed. > > For example, if the user requests 400GB of RAM, the JSV will perform > the 400/8 = 50 cores, and then rewrites it as a request for 50 cores > as well. This will decrease the user's available slots to 142. > > Ian > >> On Mon, Jun 30, 2014 at 1:48 PM, Derrick Lin <klin...@gmail.com> wrote: >> Hi guys, >> >> Thanks for the replies. I think what really matters is per user RQS, >> currently we only set quota to be 192 slots per user, equivalents to 3 >> nodes. So the users can run 192 such big memory jobs and occupy 192 nodes. >> >> So my original idea doesn't help to improve the resource utilisation but >> really just for preventing a user to use more than 3 entire nodes. >> >> Maybe there is some sorts of resource equivalency between slot and memory >> can achieve that? >> >> Thanks >> D >> >> Sent from my iPad >> >>> On 1 Jul 2014, at 5:57 am, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: >>> >>> I don't get the problem here. >>> >>> If a single core job (let's assume it cannot easily be parallelized) >>> consumes 400 - 500 GB of RAM, leaving only a little left over for >>> others to use, what's the issue. Any jobs launched will be limited by >>> how much RAM is available (assuming it is a consumable), and any job >>> that cannot run in whatver amount of RAM is left is either run on >>> another node, or queued up until a node with sufficient resources is >>> available. Forcing the user to use, say, 50 cores for a 400GB job, >>> even though it is single threaded, would have the same end result - >>> 400GB is in use (and 50 cores are also "in use" even though 49 are >>> idle), and other jobs either run somewhere else, or queue up. >>> >>> Ian >>> >>> On Mon, Jun 30, 2014 at 12:01 PM, Michael Stauffer <mgsta...@gmail.com> >>> wrote: >>>>> Message: 4 >>>>> Date: Mon, 30 Jun 2014 11:53:12 +0200 >>>>> From: Txema Heredia <txema.llis...@gmail.com> >>>>> To: Derrick Lin <klin...@gmail.com>, SGE Mailing List >>>>> <users@gridengine.org> >>>>> Subject: Re: [gridengine users] Enforce users to use specific amount >>>>> of memory/slot >>>>> Message-ID: <53b13388.5060...@gmail.com> >>>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" >>>>> >>>>> >>>>> Hi Derrick, >>>>> >>>>> You could either set h_vmem as a consumable (consumable=yes) attribute >>>>> and set a default value of 8GB for it. This way, whenever a job doesn't >>>>> request any amount of h_vmem, it will automatically request 8GB per >>>>> slot. This will affect all types of jobs. >>>>> >>>>> You could also define a JSV script that checks the username, and forces >>>>> a -l h_vmem=8G for his/her jobs ( >>>>> jsv_sub_add_param('l_hard','h_vmem','8G') ). This will affect all jobs >>>>> for that user, but could turn into a pain to manage. >>>>> >>>>> Or, you could set a different policy and allow all users to request the >>>>> amount of memory they really need, trying to fit best the node. What is >>>>> the point of forcing the user to reserve 63 additional cores when they >>>>> only need 1 core and 500GB of memory? You could fit in that node one job >>>>> like this, and, say, two 30-core-6GB-memory jobs. >>>>> >>>>> Txema >>>>> >>>>> >>>>> >>>>> El 30/06/14 08:55, Derrick Lin escribi?: >>>>> >>>>>> Hi guys, >>>>>> >>>>>> A typical node on our cluster has 64 cores and 512GB memory. So it's >>>>>> about 8GB/core. Occasionally, we have some jobs that utilizes only 1 >>>>>> core but 400-500GB of memory, that annoys lots of users. So I am >>>>>> seeking a way that can force jobs to run strictly below 8GB/core >>>>>> ration or it should be killed. >>>>>> >>>>>> For example, the above job should ask for 64 cores in order to use >>>>>> 500GB of memory (we have user quota for slots). >>>>>> >>>>>> I have been trying to play around h_vmem, set it to consumable and >>>>>> configure RQS >>>>>> >>>>>> { >>>>>> name max_user_vmem >>>>>> enabled true >>>>>> description "Each user can utilize more than 8GB/slot" >>>>>> limit users {bad_user} to h_vmem=8g >>>>>> } >>>>>> >>>>>> but it seems to be setting a total vmem bad_user can use per job. >>>>>> >>>>>> I would love to set it on users instead of queue or hosts because we >>>>>> have applications that utilize the same set of nodes and app should be >>>>>> unlimited. >>>>>> >>>>>> Thanks >>>>>> Derrick >>>> >>>> >>>> I've been dealing with this too. I'm using h_vmem to kill processes that go >>>> above the limit, and s_vmem set slightly lower by default to give >>>> well-behaved processes a chance first to exit gracefully. >>>> >>>> The issue is that these use virtual memory, which is (always, more or less) >>>> great than resident memory, i.e. the actual ram usage. And with java apps >>>> like Matlab, the amount of virtual memory reserved/used is HUGE compared to >>>> resident, by 10x give or take. So it makes it really impracticle actually. >>>> However so far I've just set the default h_vmem and s_vmem values high >>>> enough to accomadate jvm apps, and increased the per-host consumable >>>> appropriately. We don't get fine-grained memory control, but it definitely >>>> controls out-of-control users/procs that otherwise might gobble up enough >>>> ram to slow dow the entire node. >>>> >>>> We may switch to UVE just for this reason, to get memory limits based on >>>> resident memory, if it seems worth it enough in the end. >>>> >>>> -M >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@gridengine.org >>>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> >>> -- >>> Ian Kaufman >>> Research Systems Administrator >>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >>> _______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users > > > > -- > Ian Kaufman > Research Systems Administrator > UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users