Re: Linux machines dieing in swap storms
Hi, It seems very similar to my case, where it is very easy to completely trash the computer in 30 seconds. Conf: core2 cpu, 2gb memory, 2gb swap, 64-bit os. Software: latest stable xorg, firefox2, latest gnash (all from gutsy) Go to site http://www.epl.ee/ and almost immeditately you loose control of your computer - mouse gets very jerky, clicks aren't registered, switching consoles doesn't work. If I'm quick enough, I can go and log on over ssh remotely and kill all gnashes. But some moments later even that is not possible. It just trashes hard drive. Yes, one could set up different limits and what-so-ever ... but my point is - it is tooo easy to kill your linux computer (or your friends server). This should be changed. With best, Lenar Richard Purdie wrote: I've got a problem I keep running into. My computers have buggy software which can sometimes run out of control. Two specific examples: Evolution: Sometimes its memory usage decides to suddenly grow out of control. It usually idles at around 300MB, you can watch it in top, doubling, trebling and ending up going past the 1200MB mark. My system has 1.5GB ram and you notice it swapping heavily past say 800MB. Spamassassin: If my mail server log files hit the 2GB file size limit of the filesystem something strange happens and for whatever reason spamd suddenly starts growing in memory usage until it uses up all available system memory. Arguably both pieces of software are buggy, I accept that, fine. In both machines in totally different circumstances what happens next is bad. The systems swap more and more heavily trying to cope with these out of control processes. Network interactivity stops. The swap storm gets so bad you can't log onto the console any more. I've left machines in this state for 1-2 hours and they don't come back. Watching the console, the OOM killer does kick in but it never kills the problem process (both spamd and evolution are long running processes that have suddenly gone out of control). In then end, you have to hit the reset switch :(. This happened to my desktop once again about 10 minutes ago and its *extremely* frustrating. Sometimes I can catch and kill the offending process but I shouldn't have to. This isn't a new problem. My mail server used to be running an ancient 2.6.12 kernel and I upgraded it to 2.6.22.X in an effort to solve this problem which no change. My desktop shows exactly the same kind of OOM swap storm behaviour (2.6.20 based). I realise that tuning the OOM killer is a really tricky problem but something needs improving as the current user experience is broken. I'm seriously tempted to add a "kill the process using the most memory" key combination into SysRq which might let me save the desktop but won't help with my remote server. I could also just disable swap I guess. Advice on solving this welcome preferably in mainline but I'll happily hack my kernels with a workaround if need be. Cheers, Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
Hi, It seems very similar to my case, where it is very easy to completely trash the computer in 30 seconds. Conf: core2 cpu, 2gb memory, 2gb swap, 64-bit os. Software: latest stable xorg, firefox2, latest gnash (all from gutsy) Go to site http://www.epl.ee/ and almost immeditately you loose control of your computer - mouse gets very jerky, clicks aren't registered, switching consoles doesn't work. If I'm quick enough, I can go and log on over ssh remotely and kill all gnashes. But some moments later even that is not possible. It just trashes hard drive. Yes, one could set up different limits and what-so-ever ... but my point is - it is tooo easy to kill your linux computer (or your friends server). This should be changed. With best, Lenar Richard Purdie wrote: I've got a problem I keep running into. My computers have buggy software which can sometimes run out of control. Two specific examples: Evolution: Sometimes its memory usage decides to suddenly grow out of control. It usually idles at around 300MB, you can watch it in top, doubling, trebling and ending up going past the 1200MB mark. My system has 1.5GB ram and you notice it swapping heavily past say 800MB. Spamassassin: If my mail server log files hit the 2GB file size limit of the filesystem something strange happens and for whatever reason spamd suddenly starts growing in memory usage until it uses up all available system memory. Arguably both pieces of software are buggy, I accept that, fine. In both machines in totally different circumstances what happens next is bad. The systems swap more and more heavily trying to cope with these out of control processes. Network interactivity stops. The swap storm gets so bad you can't log onto the console any more. I've left machines in this state for 1-2 hours and they don't come back. Watching the console, the OOM killer does kick in but it never kills the problem process (both spamd and evolution are long running processes that have suddenly gone out of control). In then end, you have to hit the reset switch :(. This happened to my desktop once again about 10 minutes ago and its *extremely* frustrating. Sometimes I can catch and kill the offending process but I shouldn't have to. This isn't a new problem. My mail server used to be running an ancient 2.6.12 kernel and I upgraded it to 2.6.22.X in an effort to solve this problem which no change. My desktop shows exactly the same kind of OOM swap storm behaviour (2.6.20 based). I realise that tuning the OOM killer is a really tricky problem but something needs improving as the current user experience is broken. I'm seriously tempted to add a kill the process using the most memory key combination into SysRq which might let me save the desktop but won't help with my remote server. I could also just disable swap I guess. Advice on solving this welcome preferably in mainline but I'll happily hack my kernels with a workaround if need be. Cheers, Richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
On Fri, 26 Oct 2007 05:56:49 +0200 Bodo Eggert <[EMAIL PROTECTED]> wrote: > Rik van Riel <[EMAIL PROTECTED]> wrote: > > On Thu, 25 Oct 2007 16:20:41 +0100 > > Richard Purdie <[EMAIL PROTECTED]> wrote: > > >> Advice on solving this welcome preferably in mainline but I'll > >> happily hack my kernels with a workaround if need be. > > > > I can't see any easy hacks or workarounds to fix the issue in the > > current MM, except maybe activate the OOM killer if the amount of > > page cache and buffer cache is really low and swap is full... > > > > In the longer run, I'm working on: > > > > http://linux-mm.org/PageReplacementDesign > > What about only reclaimimn cache if the cache has grown beyond a > watermark and only reclaimimn non-cache if it's below another > watermark? I can imagine it will solve my > diskcache-pushes-out-mousehandler problem, and I'm pretty sure having > very low file cache is bad for performance, too. There are much better ways to determine such thresholds than requiring the sysadmin to set them by hand. I have described one on the page linked above. > Another thing I can imagine is to detect thrashing conditions and to > change scheduling in order to increase the likehood of cache hits and > thereby progress: If an application just got a page, keep it running > for a while (accumulating negative credits). If the process needs another page after the page it just got (very likely), you cannot "keep it running". -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
Rik van Riel <[EMAIL PROTECTED]> wrote: > On Thu, 25 Oct 2007 16:20:41 +0100 > Richard Purdie <[EMAIL PROTECTED]> wrote: >> Advice on solving this welcome preferably in mainline but I'll happily >> hack my kernels with a workaround if need be. > > I can't see any easy hacks or workarounds to fix the issue in the > current MM, except maybe activate the OOM killer if the amount of > page cache and buffer cache is really low and swap is full... > > In the longer run, I'm working on: > > http://linux-mm.org/PageReplacementDesign What about only reclaimimn cache if the cache has grown beyond a watermark and only reclaimimn non-cache if it's below another watermark? I can imagine it will solve my diskcache-pushes-out-mousehandler problem, and I'm pretty sure having very low file cache is bad for performance, too. Another thing I can imagine is to detect thrashing conditions and to change scheduling in order to increase the likehood of cache hits and thereby progress: If an application just got a page, keep it running for a while (accumulating negative credits). -- "Of course, as admin, I can read all your email. But I am not THAT bored!" -- unknown author in comp.unix.aix Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
Richard Purdie wrote: I've got a problem I keep running into. My computers have buggy software which can sometimes run out of control. Ulimit them. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
On 25/10/07 16:20, Richard Purdie wrote: > This isn't a new problem. My mail server used to be running an ancient > 2.6.12 kernel and I upgraded it to 2.6.22.X in an effort to solve this > problem which no change. My desktop shows exactly the same kind of OOM > swap storm behaviour (2.6.20 based). > > I realise that tuning the OOM killer is a really tricky problem but > something needs improving as the current user experience is broken. > > I'm seriously tempted to add a "kill the process using the most memory" > key combination into SysRq which might let me save the desktop but won't > help with my remote server. I could also just disable swap I guess. I have no swap. If I accidentally start The GIMP and load a very large image, everything just freezes and I have to reboot - the OOM killer doesn't appear to care. -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
On Thu, 25 Oct 2007 16:20:41 +0100 Richard Purdie <[EMAIL PROTECTED]> wrote: > Advice on solving this welcome preferably in mainline but I'll happily > hack my kernels with a workaround if need be. I can't see any easy hacks or workarounds to fix the issue in the current MM, except maybe activate the OOM killer if the amount of page cache and buffer cache is really low and swap is full... In the longer run, I'm working on: http://linux-mm.org/PageReplacementDesign -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
On Thu, 2007-10-25 at 17:13 +0100, Alan Cox wrote: > > I'm seriously tempted to add a "kill the process using the most memory" > > key combination into SysRq which might let me save the desktop but won't > > help with my remote server. I could also just disable swap I guess. > > For specific applications you can set resource limits, you can also set > OOM priorities in current kernels to pick who dies. I couldn't seem to find much documentation on this. For the archive and to confirm we're talking about the same thing, you mean: echo 10 > /proc/PID/oom_adj (and ulimit/setrlimit for the resource limits) ? This assumes I know in advance which processes are likely to go mad which isn't ideal although it could solve my immediate problem. > Finally you can disable overcommit and go for a rigid "no overcommit" > policy where the system will fail any memory allocation which might lead > to out of memory situations later. Its certainly another option but other processes then suffer because certain applications have bugs in them? Thanks, Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
> I'm seriously tempted to add a "kill the process using the most memory" > key combination into SysRq which might let me save the desktop but won't > help with my remote server. I could also just disable swap I guess. For specific applications you can set resource limits, you can also set OOM priorities in current kernels to pick who dies. Finally you can disable overcommit and go for a rigid "no overcommit" policy where the system will fail any memory allocation which might lead to out of memory situations later. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux machines dieing in swap storms
I've got a problem I keep running into. My computers have buggy software which can sometimes run out of control. Two specific examples: Evolution: Sometimes its memory usage decides to suddenly grow out of control. It usually idles at around 300MB, you can watch it in top, doubling, trebling and ending up going past the 1200MB mark. My system has 1.5GB ram and you notice it swapping heavily past say 800MB. Spamassassin: If my mail server log files hit the 2GB file size limit of the filesystem something strange happens and for whatever reason spamd suddenly starts growing in memory usage until it uses up all available system memory. Arguably both pieces of software are buggy, I accept that, fine. In both machines in totally different circumstances what happens next is bad. The systems swap more and more heavily trying to cope with these out of control processes. Network interactivity stops. The swap storm gets so bad you can't log onto the console any more. I've left machines in this state for 1-2 hours and they don't come back. Watching the console, the OOM killer does kick in but it never kills the problem process (both spamd and evolution are long running processes that have suddenly gone out of control). In then end, you have to hit the reset switch :(. This happened to my desktop once again about 10 minutes ago and its *extremely* frustrating. Sometimes I can catch and kill the offending process but I shouldn't have to. This isn't a new problem. My mail server used to be running an ancient 2.6.12 kernel and I upgraded it to 2.6.22.X in an effort to solve this problem which no change. My desktop shows exactly the same kind of OOM swap storm behaviour (2.6.20 based). I realise that tuning the OOM killer is a really tricky problem but something needs improving as the current user experience is broken. I'm seriously tempted to add a "kill the process using the most memory" key combination into SysRq which might let me save the desktop but won't help with my remote server. I could also just disable swap I guess. Advice on solving this welcome preferably in mainline but I'll happily hack my kernels with a workaround if need be. Cheers, Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux machines dieing in swap storms
I've got a problem I keep running into. My computers have buggy software which can sometimes run out of control. Two specific examples: Evolution: Sometimes its memory usage decides to suddenly grow out of control. It usually idles at around 300MB, you can watch it in top, doubling, trebling and ending up going past the 1200MB mark. My system has 1.5GB ram and you notice it swapping heavily past say 800MB. Spamassassin: If my mail server log files hit the 2GB file size limit of the filesystem something strange happens and for whatever reason spamd suddenly starts growing in memory usage until it uses up all available system memory. Arguably both pieces of software are buggy, I accept that, fine. In both machines in totally different circumstances what happens next is bad. The systems swap more and more heavily trying to cope with these out of control processes. Network interactivity stops. The swap storm gets so bad you can't log onto the console any more. I've left machines in this state for 1-2 hours and they don't come back. Watching the console, the OOM killer does kick in but it never kills the problem process (both spamd and evolution are long running processes that have suddenly gone out of control). In then end, you have to hit the reset switch :(. This happened to my desktop once again about 10 minutes ago and its *extremely* frustrating. Sometimes I can catch and kill the offending process but I shouldn't have to. This isn't a new problem. My mail server used to be running an ancient 2.6.12 kernel and I upgraded it to 2.6.22.X in an effort to solve this problem which no change. My desktop shows exactly the same kind of OOM swap storm behaviour (2.6.20 based). I realise that tuning the OOM killer is a really tricky problem but something needs improving as the current user experience is broken. I'm seriously tempted to add a kill the process using the most memory key combination into SysRq which might let me save the desktop but won't help with my remote server. I could also just disable swap I guess. Advice on solving this welcome preferably in mainline but I'll happily hack my kernels with a workaround if need be. Cheers, Richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
I'm seriously tempted to add a kill the process using the most memory key combination into SysRq which might let me save the desktop but won't help with my remote server. I could also just disable swap I guess. For specific applications you can set resource limits, you can also set OOM priorities in current kernels to pick who dies. Finally you can disable overcommit and go for a rigid no overcommit policy where the system will fail any memory allocation which might lead to out of memory situations later. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
On Thu, 2007-10-25 at 17:13 +0100, Alan Cox wrote: I'm seriously tempted to add a kill the process using the most memory key combination into SysRq which might let me save the desktop but won't help with my remote server. I could also just disable swap I guess. For specific applications you can set resource limits, you can also set OOM priorities in current kernels to pick who dies. I couldn't seem to find much documentation on this. For the archive and to confirm we're talking about the same thing, you mean: echo 10 /proc/PID/oom_adj (and ulimit/setrlimit for the resource limits) ? This assumes I know in advance which processes are likely to go mad which isn't ideal although it could solve my immediate problem. Finally you can disable overcommit and go for a rigid no overcommit policy where the system will fail any memory allocation which might lead to out of memory situations later. Its certainly another option but other processes then suffer because certain applications have bugs in them? Thanks, Richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
On Thu, 25 Oct 2007 16:20:41 +0100 Richard Purdie [EMAIL PROTECTED] wrote: Advice on solving this welcome preferably in mainline but I'll happily hack my kernels with a workaround if need be. I can't see any easy hacks or workarounds to fix the issue in the current MM, except maybe activate the OOM killer if the amount of page cache and buffer cache is really low and swap is full... In the longer run, I'm working on: http://linux-mm.org/PageReplacementDesign -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
On 25/10/07 16:20, Richard Purdie wrote: This isn't a new problem. My mail server used to be running an ancient 2.6.12 kernel and I upgraded it to 2.6.22.X in an effort to solve this problem which no change. My desktop shows exactly the same kind of OOM swap storm behaviour (2.6.20 based). I realise that tuning the OOM killer is a really tricky problem but something needs improving as the current user experience is broken. I'm seriously tempted to add a kill the process using the most memory key combination into SysRq which might let me save the desktop but won't help with my remote server. I could also just disable swap I guess. I have no swap. If I accidentally start The GIMP and load a very large image, everything just freezes and I have to reboot - the OOM killer doesn't appear to care. -- Simon Arlott - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
Richard Purdie wrote: I've got a problem I keep running into. My computers have buggy software which can sometimes run out of control. Ulimit them. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
Rik van Riel [EMAIL PROTECTED] wrote: On Thu, 25 Oct 2007 16:20:41 +0100 Richard Purdie [EMAIL PROTECTED] wrote: Advice on solving this welcome preferably in mainline but I'll happily hack my kernels with a workaround if need be. I can't see any easy hacks or workarounds to fix the issue in the current MM, except maybe activate the OOM killer if the amount of page cache and buffer cache is really low and swap is full... In the longer run, I'm working on: http://linux-mm.org/PageReplacementDesign What about only reclaimimn cache if the cache has grown beyond a watermark and only reclaimimn non-cache if it's below another watermark? I can imagine it will solve my diskcache-pushes-out-mousehandler problem, and I'm pretty sure having very low file cache is bad for performance, too. Another thing I can imagine is to detect thrashing conditions and to change scheduling in order to increase the likehood of cache hits and thereby progress: If an application just got a page, keep it running for a while (accumulating negative credits). -- Of course, as admin, I can read all your email. But I am not THAT bored! -- unknown author in comp.unix.aix Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux machines dieing in swap storms
On Fri, 26 Oct 2007 05:56:49 +0200 Bodo Eggert [EMAIL PROTECTED] wrote: Rik van Riel [EMAIL PROTECTED] wrote: On Thu, 25 Oct 2007 16:20:41 +0100 Richard Purdie [EMAIL PROTECTED] wrote: Advice on solving this welcome preferably in mainline but I'll happily hack my kernels with a workaround if need be. I can't see any easy hacks or workarounds to fix the issue in the current MM, except maybe activate the OOM killer if the amount of page cache and buffer cache is really low and swap is full... In the longer run, I'm working on: http://linux-mm.org/PageReplacementDesign What about only reclaimimn cache if the cache has grown beyond a watermark and only reclaimimn non-cache if it's below another watermark? I can imagine it will solve my diskcache-pushes-out-mousehandler problem, and I'm pretty sure having very low file cache is bad for performance, too. There are much better ways to determine such thresholds than requiring the sysadmin to set them by hand. I have described one on the page linked above. Another thing I can imagine is to detect thrashing conditions and to change scheduling in order to increase the likehood of cache hits and thereby progress: If an application just got a page, keep it running for a while (accumulating negative credits). If the process needs another page after the page it just got (very likely), you cannot keep it running. -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/