Am 29.08.2012 um 17:21 schrieb Brian Smith:

> We use mem_free variable as a consumable.  Then, we use a cronjob called 
> memkiller that terminates jobs if they go over their requested (or default) 
> memory allocation and

It would be more straight forward to use directly h_vmem. This is controlled by 
SGE and the job exceeding the limit will be killed by SGE. If you consume it as 
a consumable on a exechost level, it could be set to the built in physical 
memory.

Was there any reason to use mem_free?

-- Reuti


> 
> 1. Swap space on node is used
> 2. Swap rate is greater than 100 I/Os per second
> 
> The user gets emailed with a report if this happens.
> 
> This has made dealing with the oom killer a thing of the past in our shop.
> 
> We manage memory on the principle that swap should NEVER be used.  If you're 
> hitting oom killer, you're pretty far beyond that in terms of memory 
> utilization; if performance is a consideration, MHO is you should be looking 
> to schedule your memory usage accordingly.  Oom killer shouldn't be a factor 
> if memory is handled as a scheduler consideration.
> 
> -Brian
> 
> Brian Smith
> Sr. System Administrator
> Research Computing, University of South Florida
> 4202 E. Fowler Ave. SVC4010
> Office Phone: +1 813 974-1467
> Organization URL: http://rc.usf.edu
> 
> On 08/29/2012 11:02 AM, Ben De Luca wrote:
>> I was wondering, how people deal with oom conditions on there cluster.
>> We constantly have machines that die because the oom killer takes out
>> critical system services.
>> 
>> Has any experiance with the oom_adj proc value, or a patch to grid to
>> support it?
>> 
>> 
>>  /proc/[pid]/oom_adj (since Linux 2.6.11)
>>               This file can be used to adjust the score used to select
>> which process
>>               should be killed in an out-of-memory (OOM) situation.
>> The kernel uses
>>               this value for a bit-shift operation of the process's
>> oom_score value:
>>               valid values are in the range -16 to +15, plus the
>> special value -17,
>>               which disables OOM-killing altogether for this process.
>> A positive
>>               score increases the likelihood of this process being
>> killed by the OOM-
>>               killer; a negative score decreases the likelihood.  The
>> default value
>>               for this file is 0; a new process inherits its parent's oom_adj
>>               setting.  A process must be privileged
>> (CAP_SYS_RESOURCE) to update
>>               this file.
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to