Thanks a lot for your suggestions, gonna put them in practice, I'll let you know if something else comes up.
On Sun, May 13, 2012 at 10:56 PM, kuilin lu <[email protected]> wrote: > Dear Iván José Pulido Sánchez, > > I used oom before and had a little experience on this( kernel > 2.6.18, centos 5 kernels). I found when one set vm.overcommit_ratio > too much high, say, 100 or 90, oom is going to kill sshd and many > system services. So I keep setting vm.overcommit_ratio = 80 at most. > For me, I feel oom should always need careful setting if one want > it work properly.. > > I found a possible answer for you "-17" question. In > http://lwn.net/Articles/317814/, It says: > "The more memory the process uses, the higher the score. The > longer a process is alive in the system, the smaller the score" > Since siesta must be the longest process in the system, it will > get "-17" for oom_adj. > > Possible solutions are mentioned in http://lwn.net/Articles/317814/too:) > > Best Wishes, > Kuilin > > > On 5/13/12, Nick Papior Andersen <[email protected]> wrote: > > I have no real experience with oom-killer. However, it is not set > > intrinsically by siesta. I believe it must be from openmpi or the > compiler > > in itself. I have just checked my values of oom_adj, and they are all 0 > > (also debian 6). So maybe you have set a system setting. > > > > Please check if it is siesta which is set to -17 or not, just to be sure > of > > its origin. > > > > Nevertheless, to circumvent you can when you start siesta catch the pid > and > > set the oom_adj value manually. > > Do: > > cat "0" > /proc/<pid>/oom_adj > > > > Then you should be just fine. > > > > Kind regards Nick > > > > > > 2012/5/10 Iván Pulido Sanchez <[email protected]> > > > >> Hello, > >> > >> I've been having problem with Siesta in Linux (Debian) when it needs a > >> lot > >> of RAM, specially in the case when it take all the available memory of > >> the system (ram + swap). The problem is that when it does that the > linux > >> oom-killer gets invoked and then it start killing processes. The > >> following > >> are the relevant lines in the kernel log (kern.log) explaining this: > >> > >> May 8 21:00:16 nodo4 kernel: [8217579.288562] siesta invoked > oom-killer: > >> gfp_mask=0x200da, order=0, oom_adj=-17 > >> ... > >> May 8 21:00:16 nodo4 kernel: [8217579.306967] Out of memory: kill > >> process > >> 1975 (dbus-daemon) score 646 or a child > >> ... > >> May 8 21:00:16 nodo4 kernel: [8217579.856575] rsyslogd invoked > >> oom-killer: gfp_mask=0x200da, order=0, oom_adj=0 > >> ... > >> May 8 21:00:16 nodo4 kernel: [8217579.875127] Out of memory: kill > >> process > >> 1599 (rpc.statd) score 399 or a child > >> May 8 21:00:16 nodo4 kernel: [8217579.875934] Killed process 1599 > >> (rpc.statd) > >> May 9 11:48:08 nodo4 kernel: imklog 4.6.2, log source = /proc/kmsg > >> started. > >> > >> And thats how it ends (as you can see in the jump in the date). > >> > >> There is something that worries me and that's in the first line of the > >> previous ones. It basically says that siesta is using a oom_adj value > >> of -17 meaning that it can't be killed by the oom-killer, I don't know > >> why is Siesta running with this value for oom_adj. > >> > >> Processes like ssh or rpc.statd (NFS) shouldn't get killed before > >> Siesta, this is why the node "dies" when this happens. > >> > >> Here is my siesta version and/or configuration: > >> > >> Siesta Version: siesta-3.1 > >> Architecture : x86_64-debian6 > >> Compiler flags: mpif90 -g -O2 > >> PARALLEL version > >> > >> > >> Any idea, help or suggestions are very much appreciated. > >> > >> Thanks > >> > >> -- > >> Iván José Pulido Sánchez > >> Estudiante de Física > >> Universidad Nacional de Colombia > >> > >> > > > -- Iván José Pulido Sánchez Estudiante de Física Universidad Nacional de Colombia
