Thanks, haven't done any JSV but I will give it a try.

I actually am thinking to just set 3 nodes worth of h_vmem quota to each user 
(eg. h_vmem=512*3G). So together with slot quota it seems to do the job. Or am 
I too naive?

Cheers,
D

Sent from my iPad

> On 1 Jul 2014, at 6:54 am, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote:
> 
> If that's the case, why not craft a JSV that takes the amount of
> requested RAM (again, assuming it is a requestable and consumable
> resource) and figures out how many cores it equates to, thus reqriting
> the job with the number of cores also consumed.
> 
> For example, if the user requests 400GB of RAM, the JSV will perform
> the 400/8 = 50 cores, and then rewrites it as a request for 50 cores
> as well. This will decrease the user's available slots to 142.
> 
> Ian
> 
>> On Mon, Jun 30, 2014 at 1:48 PM, Derrick Lin <klin...@gmail.com> wrote:
>> Hi guys,
>> 
>> Thanks for the replies. I think what really matters is per user RQS, 
>> currently we only set quota to be 192 slots per user, equivalents to 3 
>> nodes. So the users can run 192 such big memory jobs and occupy 192 nodes.
>> 
>> So my original idea doesn't help to improve the resource utilisation but 
>> really just for preventing a user to use more than 3 entire nodes.
>> 
>> Maybe there is some sorts of resource equivalency between slot and memory 
>> can achieve that?
>> 
>> Thanks
>> D
>> 
>> Sent from my iPad
>> 
>>> On 1 Jul 2014, at 5:57 am, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote:
>>> 
>>> I don't get the problem here.
>>> 
>>> If a single core job (let's assume it cannot easily be parallelized)
>>> consumes 400 - 500 GB of RAM, leaving only a little left over for
>>> others to use, what's the issue. Any jobs launched will be limited by
>>> how much RAM is available (assuming it is a consumable), and any job
>>> that cannot run in whatver amount of RAM is left is either run on
>>> another node, or queued up until a node with sufficient resources is
>>> available. Forcing the user to use, say, 50 cores for a 400GB job,
>>> even though it is single threaded, would have the same end result -
>>> 400GB is in use (and 50 cores are also "in use" even though 49 are
>>> idle), and other jobs either run somewhere else, or queue up.
>>> 
>>> Ian
>>> 
>>> On Mon, Jun 30, 2014 at 12:01 PM, Michael Stauffer <mgsta...@gmail.com> 
>>> wrote:
>>>>> Message: 4
>>>>> Date: Mon, 30 Jun 2014 11:53:12 +0200
>>>>> From: Txema Heredia <txema.llis...@gmail.com>
>>>>> To: Derrick Lin <klin...@gmail.com>, SGE Mailing List
>>>>>       <users@gridengine.org>
>>>>> Subject: Re: [gridengine users] Enforce users to use specific amount
>>>>>       of      memory/slot
>>>>> Message-ID: <53b13388.5060...@gmail.com>
>>>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>>> 
>>>>> 
>>>>> Hi Derrick,
>>>>> 
>>>>> You could either set h_vmem as a consumable (consumable=yes) attribute
>>>>> and set a default value of 8GB for it. This way, whenever a job doesn't
>>>>> request any amount of h_vmem, it will automatically request 8GB per
>>>>> slot. This will affect all types of jobs.
>>>>> 
>>>>> You could also define a JSV script that checks the username, and forces
>>>>> a -l h_vmem=8G for his/her jobs (
>>>>> jsv_sub_add_param('l_hard','h_vmem','8G') ). This will affect all jobs
>>>>> for that user, but could turn into a pain to manage.
>>>>> 
>>>>> Or, you could set a different policy and allow all users to request the
>>>>> amount of memory they really need, trying to fit best the node. What is
>>>>> the point of forcing the user to reserve 63 additional cores when they
>>>>> only need 1 core and 500GB of memory? You could fit in that node one job
>>>>> like this, and, say, two 30-core-6GB-memory jobs.
>>>>> 
>>>>> Txema
>>>>> 
>>>>> 
>>>>> 
>>>>> El 30/06/14 08:55, Derrick Lin escribi?:
>>>>> 
>>>>>> Hi guys,
>>>>>> 
>>>>>> A typical node on our cluster has 64 cores and 512GB memory. So it's
>>>>>> about 8GB/core. Occasionally, we have some jobs that utilizes only 1
>>>>>> core but 400-500GB of memory, that annoys lots of users. So I am
>>>>>> seeking a way that can force jobs to run strictly below 8GB/core
>>>>>> ration or it should be killed.
>>>>>> 
>>>>>> For example, the above job should ask for 64 cores in order to use
>>>>>> 500GB of memory (we have user quota for slots).
>>>>>> 
>>>>>> I have been trying to play around h_vmem, set it to consumable and
>>>>>> configure RQS
>>>>>> 
>>>>>> {
>>>>>>       name    max_user_vmem
>>>>>>       enabled true
>>>>>>       description     "Each user can utilize more than 8GB/slot"
>>>>>>       limit   users {bad_user} to h_vmem=8g
>>>>>> }
>>>>>> 
>>>>>> but it seems to be setting a total vmem bad_user can use per job.
>>>>>> 
>>>>>> I would love to set it on users instead of queue or hosts because we
>>>>>> have applications that utilize the same set of nodes and app should be
>>>>>> unlimited.
>>>>>> 
>>>>>> Thanks
>>>>>> Derrick
>>>> 
>>>> 
>>>> I've been dealing with this too. I'm using h_vmem to kill processes that go
>>>> above the limit, and s_vmem set slightly lower by default to give
>>>> well-behaved processes a chance first to exit gracefully.
>>>> 
>>>> The issue is that these use virtual memory, which is (always, more or less)
>>>> great than resident memory, i.e. the actual ram usage. And with java apps
>>>> like Matlab, the amount of virtual memory reserved/used is HUGE compared to
>>>> resident, by 10x give or take. So it makes it really impracticle actually.
>>>> However so far I've just set the default h_vmem and s_vmem values high
>>>> enough to accomadate jvm apps, and increased the per-host consumable
>>>> appropriately. We don't get fine-grained memory control, but it definitely
>>>> controls out-of-control users/procs that otherwise might gobble up enough
>>>> ram to slow dow the entire node.
>>>> 
>>>> We may switch to UVE just for this reason, to get memory limits based on
>>>> resident memory, if it seems worth it enough in the end.
>>>> 
>>>> -M
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> users@gridengine.org
>>>> https://gridengine.org/mailman/listinfo/users
>>> 
>>> 
>>> 
>>> --
>>> Ian Kaufman
>>> Research Systems Administrator
>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>>> _______________________________________________
>>> users mailing list
>>> users@gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> -- 
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to