Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Wed, Oct 11, 2000 at 11:08:41AM +0200, Helge Hafting wrote: > Nothing wrong with a big init - the problem is a memory-leaking init. > That one will die anyway, wether it dies early from an OOM-killer > or later when all other processes are gone don't really matter. Indeed. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andrea Arcangeli wrote: > > On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote: > > If you want init to live - prove that it don't eat too much memory. > > I don't see why the machine should be stable only if init is small. > My kernel won't be stable only if init is small since it doesn't cost > anything to handle correctly the big init case. > Nothing wrong with a big init - the problem is a memory-leaking init. That one will die anyway, wether it dies early from an OOM-killer or later when all other processes are gone don't really matter. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Andrea Arcangeli wrote: On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote: If you want init to live - prove that it don't eat too much memory. I don't see why the machine should be stable only if init is small. My kernel won't be stable only if init is small since it doesn't cost anything to handle correctly the big init case. Nothing wrong with a big init - the problem is a memory-leaking init. That one will die anyway, wether it dies early from an OOM-killer or later when all other processes are gone don't really matter. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Wed, Oct 11, 2000 at 11:08:41AM +0200, Helge Hafting wrote: Nothing wrong with a big init - the problem is a memory-leaking init. That one will die anyway, wether it dies early from an OOM-killer or later when all other processes are gone don't really matter. Indeed. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Olaf Titz wrote: > > > > Still, it would be nice to recover that 4 MB when the system > > > doesn't have any memory left. > > Yup. The X server could give back the memory for some cases like the > > background without too much hackery. > > Then Linux only needs to implement SIGDANGER, which has been talked > about for years... > > X would be a good candidate to implement a handler for it. Others are > Emacs, Mozilla or JVMs - basically everything which has a GC of some > sort. It could even be used to implement a configurable user mode OOM > killer. It would be good to talk to the KDE and Gnome folks about this as well. I am pretty sure they have large blocks of memory that could be flushed or freed in a low-memory or OOM condition. Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, 10 Oct 2000, Rogier Wolff wrote: > > So if Netscape can "pump" 40 extra megabytes of memory out of X, this > can be exploited. > > Now we're back to the point that a heuristic can never be right all > the time.. I agree. In fact, we never left that. Nothing is perfect. In fact, a lot of engineering is _recognizing_ that you can never achieve "perfect", and you're much better off not even trying - and having a simple system that is "good enough". This is the old adage of "perfect is the enemy of good" - trying too hard is actually _detrimental_ in 99% of all cases. We should have simple heuristics that work most of the time, instead of trying to cajole a complex system like X to help us do some complicated resource management system. Complexity will just result in the OOM killer failing in surprising ways. A simple heuristic will mean that the OOM killer will still fail, but at least it won't be be in subtle and surprising ways. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, Oct 10, 2000 at 12:30:51PM -0300, Rik van Riel wrote: > Not killing init when we "should" definately prevents > embedded systems from auto-rebooting when they should > do so. > > (OTOH, I don't think embedded systems will run into > this OOM issue too much) but when they do, they're hard to fix. Think about an elevator control system with a single process that happens to implement a somewhat broken version of the elevator algorithm ;) > > that's what I said. we need to be sure to _get_ a panic() though. > > I believe the kernel automatically panic()s when init > dies ... from kernel/exit.c::do_exit() > > if (tsk->pid == 1) > panic("Attempted to kill init!"); guess who added that code. We still kill init with SIGTERM which doesn't seem to work though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, 10 Oct 2000, Philipp Rumpf wrote: > On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote: > > On Tue, 10 Oct 2000, Philipp Rumpf wrote: > > > > > The algorithm you posted on the list in this thread will kill > > > > > init if on 4Mbyte machine without swap init is large 3 Mbytes > > > > > and you execute a task that grows over 1M. > > > > > > > > This sounds suspiciously like the description of a DEAD system ;) > > > > > > But wouldn't a watchdog daemon which doesn't allocate any memory > > > still get run ? > > > > Indeed, it would. It would also /prevent/ the system > > from automatically rebooting itself into a usable state ;) > > So it's not dead in the "oh, it'll be back in 30 seconds" sense. > So our behaviour is broken (more so than random process > killing). *nod* Not killing init when we "should" definately prevents embedded systems from auto-rebooting when they should do so. (OTOH, I don't think embedded systems will run into this OOM issue too much) > > > You care about getting an automatic reboot. So you need to be sure the > > > watchdog daemon gets killed first or you panic() after some time. > > > > echo 30 > /proc/sys/kernel/panic > > that's what I said. we need to be sure to _get_ a panic() though. I believe the kernel automatically panic()s when init dies ... from kernel/exit.c::do_exit() if (tsk->pid == 1) panic("Attempted to kill init!"); [which will make our system auto-reboot and be back on its feet in a healty state again soon] regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote: > On Tue, 10 Oct 2000, Philipp Rumpf wrote: > > > > The algorithm you posted on the list in this thread will kill > > > > init if on 4Mbyte machine without swap init is large 3 Mbytes > > > > and you execute a task that grows over 1M. > > > > > > This sounds suspiciously like the description of a DEAD system ;) > > > > But wouldn't a watchdog daemon which doesn't allocate any memory > > still get run ? > > Indeed, it would. It would also /prevent/ the system > from automatically rebooting itself into a usable state ;) So it's not dead in the "oh, it'll be back in 30 seconds" sense. So our behaviour is broken (more so than random process killing). > > You care about getting an automatic reboot. So you need to be sure the > > watchdog daemon gets killed first or you panic() after some time. > > echo 30 > /proc/sys/kernel/panic that's what I said. we need to be sure to _get_ a panic() though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, 10 Oct 2000, Philipp Rumpf wrote: > > > The algorithm you posted on the list in this thread will kill > > > init if on 4Mbyte machine without swap init is large 3 Mbytes > > > and you execute a task that grows over 1M. > > > > This sounds suspiciously like the description of a DEAD system ;) > > But wouldn't a watchdog daemon which doesn't allocate any memory > still get run ? Indeed, it would. It would also /prevent/ the system from automatically rebooting itself into a usable state ;) > > (in which case you simply don't care if init is being killed or not) > > You care about getting an automatic reboot. So you need to be sure the > watchdog daemon gets killed first or you panic() after some time. echo 30 > /proc/sys/kernel/panic regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Linus Torvalds wrote: > Basically, the only thing _I_ think X can do is to really say "oh, please > don't count my memory, because everything I do I do for my clients, not > for myself". > > THAT is my argument. Basically there is nothing we can reliably account. > > So we might as well fall back on just saying "X is more important than > some random client", and have a mm niceness level. Which right now is > obviously approximated by the IO capabilities tests etc. FYI: I ran my machine out of memory (without crashing by the way) this weekend by loading a whole bunch of large images into netscape. I noticed not being able to open more windows when I saw my swapspace exhausted. I noticed the large netscape, and killed it. At that moment my X was still taking 80Mb of RAM. I manually killed it and restarted it to get rid of that memory. So if Netscape can "pump" 40 extra megabytes of memory out of X, this can be exploited. Now we're back to the point that a heuristic can never be right all the time.. Roger. -- ** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * Common sense is the collection of* ** prejudices acquired by age eighteen. -- Albert Einstein - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote: > If you want init to live - prove that it don't eat too much memory. I don't see why the machine should be stable only if init is small. My kernel won't be stable only if init is small since it doesn't cost anything to handle correctly the big init case. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, Oct 10, 2000 at 04:38:02AM +0100, Philipp Rumpf wrote: > Init should never die. If we get to do_exit in init we'll panic which is > the right thing to do (reboot on critical systems). If the page fault can fail with OOM on init, init will get a SIGSEGV while running a signal handler (copy-user will return -EFAULT regardless it was an oom or a real segfault) and it _won't_ panic and the system is unusable. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > > I'd prefer just X having a higher "mm nice level" or something. > > > > Which it has, because: > > > > 1) CAP_RAW_IO > > 2) p->euid == 0 > > Oh, I agree, but we might want to generalize this a bit so that root could > say "this process is important" and then drop root privileges and still > get "credited" for the fact that it's important. > > It's not a big deal. It works for X right now. How about using p->rlim[RLIMIT_AS].rlim_cur to weight the badness point for a process? On my system, a 128MB RAM + 256MB swap, it defaults to some (insane?) value: bash$ ulimit -vH -vS virtual memory (kbytes) 4194302 virtual memory (kbytes) 2105343 for every process, which just means it is unused. The idea is: 1) set default for rlim[RLIMIT_AS].rlim_max to a saner value; 2) processes with higher rlim[RLIMIT_AS].rlim_cur get lower badness. This way, the badness of a process is not proportional to its absolute size, but to the fraction of allowed AS it is using. Processes that are capable(CAP_SYS_RESOURCE) can set RLIMIT_AS to a very high value, so they get less badness point. X is a perfect candidate. User's runaway processes (netscape) will have lower rlim[RLIMIT_AS].rlim_cur, thus will get higher badness. Something like: - points = p->mm->total_vm; + points = p->mm->total_vm / (p->rlim[RLIMIT_AS].rlim_cur << AS_FACTOR); with #define AS_FACTOR 30 maybe? (this is Rik's call, he knows better than me how to balance it...) It's simple, it's configurable. 1) may be enforced by the kernel, or completely left to user space. On my system, in its default configuration (no use of RLIMIT_AS), it has no impact at all (all processes have the same limit). Sounds good or am I missing something? > > Linus > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
--On 09 October 2000, 17:40 -0300 Rik van Riel <[EMAIL PROTECTED]> wrote: > On Mon, 9 Oct 2000, James Sutherland wrote: >> On Mon, 9 Oct 2000, Ingo Molnar wrote: >> > On Mon, 9 Oct 2000, Rik van Riel wrote: >> > >> > > > so dns helper is killed first, then netscape. (my idea might not >> > > > make sense though.) >> > > >> > > It makes some sense, but I don't think OOM is something that >> > > occurs often enough to care about it /that/ much... >> > >> > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup >> > case, with 4MB RAM and no swap, where the admin tries to exec a 2MB >> > process. I think it's a legitimate concern - i cannot know in advance >> > whether a freshly started process would trigger an OOM or not. >> >> Shouldn't the runtime factor handle this, making sure the new >> process is killed? (Maybe not if you're almost OOM right from >> the word go, and run this process straight off... Hrm.) > > It should. > > Also, the example is a tad unrealistic since init seems to be > around 70 kB in size on my systems ;) In extreme cases, though, you could arrange things so the machine only has 100K of RAM when it loads init, at which point init tries running, say, rc.sysinit - and everything goes bang. Of course, a machine like that won't be very much use anyway... More realistically, though, I could be running with something like init=/bin/sash - does your statically linked sash binary fit in 70K? :-) James. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andreas Dilger wrote: > Having a SIGDANGER handler is good for 2 reasons: > 1) Lets processes know when memory is short so they can free needless cache. > 2) Mark process with a SIGDANGER handler as "more important" than those >without. Most people won't care about this, but init, and X, and >long-running simulations might. For point 1, it would be much nicer to have user processes participate in memory balancing _before_ getting anywhere near an OOM state. A nice way is to send SIGDANGER with siginfo saying how much memory the kernel wants back (or how fast). Applications that don't know to use that info, but do have a SIGDANGER handler, will still react just rather more severely. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Albert D. Cahalan wrote: > X, and any other big friendly processes, could participate in > memory balancing operations. X could be made to clean out a > font cache when the kernel signals that memory is low. When > the situation becomes serious, X could just mmap /dev/zero over > top of the background image. Haven't we already had this discussion? Quite a lot of programs have cached data (X fonts, Netscape (lots!)), GC-able data (Emacs, Java etc.), data that can simply be discarded (X window backing stores), or data that can be written to disk on demand (Netscape again). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 06:34:29PM -0300, Rik van Riel wrote: > On Mon, 9 Oct 2000, Ingo Molnar wrote: > > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > > Would this complexity /really/ be worth it for the twice-yearly OOM > > > situation? > > > > the only reason i suggested this was the init=/bin/bash, 4MB > > RAM, no swap emergency-bootup case. We must not kill init in > > that case - if the current code doesnt then great and none of > > this is needed. perhaps a boot time option oom=0 ? since oom is such a rare case, this wouldn't impact normal usage... -- john slee <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> > Still, it would be nice to recover that 4 MB when the system > > doesn't have any memory left. > Yup. The X server could give back the memory for some cases like the > background without too much hackery. Then Linux only needs to implement SIGDANGER, which has been talked about for years... X would be a good candidate to implement a handler for it. Others are Emacs, Mozilla or JVMs - basically everything which has a GC of some sort. It could even be used to implement a configurable user mode OOM killer. Olaf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andrea Arcangeli wrote: > > On Mon, Oct 09, 2000 at 08:42:26PM +0200, Ingo Molnar wrote: > > ignoring the kill would just preserve those bugs artificially. > > If the oom killer kills a thing like init by mistake or init has a memleak > you'll notice both problems regardless of having a magic for init in a _very_ > slow path so I don't buy your point. > . > For corretness init must not be killed ever, period. > > So you have two choices: > > o math proof that the current algorithm without the magic can't end > killing init (and I should be able to proof the other way around > instead) > > o have a magic check for init > > So the magic is _strictly_ necessary at the moment. A well-written init will be saved by being the oldest process around. A memory-leaking init _will_ be killed even whith your magic test, when the kernel eventually gets stuck OOM and init is the only process left (all the other have been OOM-killed before.) A deadlocked kernel don't schedule any processes, so they are all dead. If you want init to live - prove that it don't eat too much memory. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Andrea Arcangeli wrote: On Mon, Oct 09, 2000 at 08:42:26PM +0200, Ingo Molnar wrote: ignoring the kill would just preserve those bugs artificially. If the oom killer kills a thing like init by mistake or init has a memleak you'll notice both problems regardless of having a magic for init in a _very_ slow path so I don't buy your point. . For corretness init must not be killed ever, period. So you have two choices: o math proof that the current algorithm without the magic can't end killing init (and I should be able to proof the other way around instead) o have a magic check for init So the magic is _strictly_ necessary at the moment. A well-written init will be saved by being the oldest process around. A memory-leaking init _will_ be killed even whith your magic test, when the kernel eventually gets stuck OOM and init is the only process left (all the other have been OOM-killed before.) A deadlocked kernel don't schedule any processes, so they are all dead. If you want init to live - prove that it don't eat too much memory. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Albert D. Cahalan wrote: X, and any other big friendly processes, could participate in memory balancing operations. X could be made to clean out a font cache when the kernel signals that memory is low. When the situation becomes serious, X could just mmap /dev/zero over top of the background image. Haven't we already had this discussion? Quite a lot of programs have cached data (X fonts, Netscape (lots!)), GC-able data (Emacs, Java etc.), data that can simply be discarded (X window backing stores), or data that can be written to disk on demand (Netscape again). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Andreas Dilger wrote: Having a SIGDANGER handler is good for 2 reasons: 1) Lets processes know when memory is short so they can free needless cache. 2) Mark process with a SIGDANGER handler as "more important" than those without. Most people won't care about this, but init, and X, and long-running simulations might. For point 1, it would be much nicer to have user processes participate in memory balancing _before_ getting anywhere near an OOM state. A nice way is to send SIGDANGER with siginfo saying how much memory the kernel wants back (or how fast). Applications that don't know to use that info, but do have a SIGDANGER handler, will still react just rather more severely. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
--On 09 October 2000, 17:40 -0300 Rik van Riel [EMAIL PROTECTED] wrote: On Mon, 9 Oct 2000, James Sutherland wrote: On Mon, 9 Oct 2000, Ingo Molnar wrote: On Mon, 9 Oct 2000, Rik van Riel wrote: so dns helper is killed first, then netscape. (my idea might not make sense though.) It makes some sense, but I don't think OOM is something that occurs often enough to care about it /that/ much... i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case, with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I think it's a legitimate concern - i cannot know in advance whether a freshly started process would trigger an OOM or not. Shouldn't the runtime factor handle this, making sure the new process is killed? (Maybe not if you're almost OOM right from the word go, and run this process straight off... Hrm.) It should. Also, the example is a tad unrealistic since init seems to be around 70 kB in size on my systems ;) In extreme cases, though, you could arrange things so the machine only has 100K of RAM when it loads init, at which point init tries running, say, rc.sysinit - and everything goes bang. Of course, a machine like that won't be very much use anyway... More realistically, though, I could be running with something like init=/bin/sash - does your statically linked sash binary fit in 70K? :-) James. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: On Mon, 9 Oct 2000, Rik van Riel wrote: I'd prefer just X having a higher "mm nice level" or something. Which it has, because: 1) CAP_RAW_IO 2) p-euid == 0 Oh, I agree, but we might want to generalize this a bit so that root could say "this process is important" and then drop root privileges and still get "credited" for the fact that it's important. It's not a big deal. It works for X right now. How about using p-rlim[RLIMIT_AS].rlim_cur to weight the badness point for a process? On my system, a 128MB RAM + 256MB swap, it defaults to some (insane?) value: bash$ ulimit -vH -vS virtual memory (kbytes) 4194302 virtual memory (kbytes) 2105343 for every process, which just means it is unused. The idea is: 1) set default for rlim[RLIMIT_AS].rlim_max to a saner value; 2) processes with higher rlim[RLIMIT_AS].rlim_cur get lower badness. This way, the badness of a process is not proportional to its absolute size, but to the fraction of allowed AS it is using. Processes that are capable(CAP_SYS_RESOURCE) can set RLIMIT_AS to a very high value, so they get less badness point. X is a perfect candidate. User's runaway processes (netscape) will have lower rlim[RLIMIT_AS].rlim_cur, thus will get higher badness. Something like: - points = p-mm-total_vm; + points = p-mm-total_vm / (p-rlim[RLIMIT_AS].rlim_cur AS_FACTOR); with #define AS_FACTOR 30 maybe? (this is Rik's call, he knows better than me how to balance it...) It's simple, it's configurable. 1) may be enforced by the kernel, or completely left to user space. On my system, in its default configuration (no use of RLIMIT_AS), it has no impact at all (all processes have the same limit). Sounds good or am I missing something? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, Oct 10, 2000 at 04:38:02AM +0100, Philipp Rumpf wrote: Init should never die. If we get to do_exit in init we'll panic which is the right thing to do (reboot on critical systems). If the page fault can fail with OOM on init, init will get a SIGSEGV while running a signal handler (copy-user will return -EFAULT regardless it was an oom or a real segfault) and it _won't_ panic and the system is unusable. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote: If you want init to live - prove that it don't eat too much memory. I don't see why the machine should be stable only if init is small. My kernel won't be stable only if init is small since it doesn't cost anything to handle correctly the big init case. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Linus Torvalds wrote: Basically, the only thing _I_ think X can do is to really say "oh, please don't count my memory, because everything I do I do for my clients, not for myself". THAT is my argument. Basically there is nothing we can reliably account. So we might as well fall back on just saying "X is more important than some random client", and have a mm niceness level. Which right now is obviously approximated by the IO capabilities tests etc. FYI: I ran my machine out of memory (without crashing by the way) this weekend by loading a whole bunch of large images into netscape. I noticed not being able to open more windows when I saw my swapspace exhausted. I noticed the large netscape, and killed it. At that moment my X was still taking 80Mb of RAM. I manually killed it and restarted it to get rid of that memory. So if Netscape can "pump" 40 extra megabytes of memory out of X, this can be exploited. Now we're back to the point that a heuristic can never be right all the time.. Roger. -- ** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * Common sense is the collection of* ** prejudices acquired by age eighteen. -- Albert Einstein - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, 10 Oct 2000, Philipp Rumpf wrote: The algorithm you posted on the list in this thread will kill init if on 4Mbyte machine without swap init is large 3 Mbytes and you execute a task that grows over 1M. This sounds suspiciously like the description of a DEAD system ;) But wouldn't a watchdog daemon which doesn't allocate any memory still get run ? Indeed, it would. It would also /prevent/ the system from automatically rebooting itself into a usable state ;) (in which case you simply don't care if init is being killed or not) You care about getting an automatic reboot. So you need to be sure the watchdog daemon gets killed first or you panic() after some time. echo 30 /proc/sys/kernel/panic regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, 10 Oct 2000, Philipp Rumpf wrote: On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote: On Tue, 10 Oct 2000, Philipp Rumpf wrote: The algorithm you posted on the list in this thread will kill init if on 4Mbyte machine without swap init is large 3 Mbytes and you execute a task that grows over 1M. This sounds suspiciously like the description of a DEAD system ;) But wouldn't a watchdog daemon which doesn't allocate any memory still get run ? Indeed, it would. It would also /prevent/ the system from automatically rebooting itself into a usable state ;) So it's not dead in the "oh, it'll be back in 30 seconds" sense. So our behaviour is broken (more so than random process killing). *nod* Not killing init when we "should" definately prevents embedded systems from auto-rebooting when they should do so. (OTOH, I don't think embedded systems will run into this OOM issue too much) You care about getting an automatic reboot. So you need to be sure the watchdog daemon gets killed first or you panic() after some time. echo 30 /proc/sys/kernel/panic that's what I said. we need to be sure to _get_ a panic() though. I believe the kernel automatically panic()s when init dies ... from kernel/exit.c::do_exit() if (tsk-pid == 1) panic("Attempted to kill init!"); [which will make our system auto-reboot and be back on its feet in a healty state again soon] regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, Oct 10, 2000 at 12:30:51PM -0300, Rik van Riel wrote: Not killing init when we "should" definately prevents embedded systems from auto-rebooting when they should do so. (OTOH, I don't think embedded systems will run into this OOM issue too much) but when they do, they're hard to fix. Think about an elevator control system with a single process that happens to implement a somewhat broken version of the elevator algorithm ;) that's what I said. we need to be sure to _get_ a panic() though. I believe the kernel automatically panic()s when init dies ... from kernel/exit.c::do_exit() if (tsk-pid == 1) panic("Attempted to kill init!"); guess who added that code. We still kill init with SIGTERM which doesn't seem to work though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, 10 Oct 2000, Rogier Wolff wrote: So if Netscape can "pump" 40 extra megabytes of memory out of X, this can be exploited. Now we're back to the point that a heuristic can never be right all the time.. I agree. In fact, we never left that. Nothing is perfect. In fact, a lot of engineering is _recognizing_ that you can never achieve "perfect", and you're much better off not even trying - and having a simple system that is "good enough". This is the old adage of "perfect is the enemy of good" - trying too hard is actually _detrimental_ in 99% of all cases. We should have simple heuristics that work most of the time, instead of trying to cajole a complex system like X to help us do some complicated resource management system. Complexity will just result in the OOM killer failing in surprising ways. A simple heuristic will mean that the OOM killer will still fail, but at least it won't be be in subtle and surprising ways. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Olaf Titz wrote: Still, it would be nice to recover that 4 MB when the system doesn't have any memory left. Yup. The X server could give back the memory for some cases like the background without too much hackery. Then Linux only needs to implement SIGDANGER, which has been talked about for years... X would be a good candidate to implement a handler for it. Others are Emacs, Mozilla or JVMs - basically everything which has a GC of some sort. It could even be used to implement a configurable user mode OOM killer. It would be good to talk to the KDE and Gnome folks about this as well. I am pretty sure they have large blocks of memory that could be flushed or freed in a low-memory or OOM condition. Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andreas Dilger wrote: > Albert D. Cahalan wrote: > > X, and any other big friendly processes, could participate in > > memory balancing operations. X could be made to clean out a > > Gerrit Huizenga wrote: > > Anyway, there is/was an API in PTX to say (either from in-kernel or through > > some user machinations) "I Am a System Process". Turns on a bit in the > > On AIX there is a signal called SIGDANGER, which is basically what you > are looking for. By default it is ignored, but for processes that care > (e.g. init, X, whatever) they can register a SIGDANGER handler. At an > "urgent" (as oposed to "critical") OOM situation, all processes get a > SIGDANGER sent to them. Most will ignore it, but ones with handlers > can free caches, try to do a clean shutdown, whatever. Any process with > a SIGDANGER handler get a reduction of "badness" (as the OOM killer calls > it) when looking for processes to kill. > > Having a SIGDANGER handler is good for 2 reasons: > 1) Lets processes know when memory is short so they can free needless cache. > 2) Mark process with a SIGDANGER handler as "more important" than those >without. Most people won't care about this, but init, and X, and >long-running simulations might. Is there any reason why we can't do something like this for 2.5? -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> Rik van Riel wrote: > > > How about SIGTERM a bit before SIGKILL then re-evaluate the OOM > > > N usecs later? > > > > And run the risk of having to kill /another/ process as well ? > > > > I really don't know if that would be a wise thing to do > > (but feel free to do some tests to see if your idea would > > work ... I'd love to hear some test results with your idea). David Ford writes: > I was thinking (dangerous) about an urgent v.s. critical OOM. urgent could > trigger a SIGTERM which would give advance notice to the offending process. > I don't think we have a signal method of notifying processes when resources > are critically low, feel free to correct me. > > Is there a signal that -might- be used for this? Albert D. Cahalan wrote: > X, and any other big friendly processes, could participate in > memory balancing operations. X could be made to clean out a > font cache when the kernel signals that memory is low. When > the situation becomes serious, X could just mmap /dev/zero over > top of the background image. > > Netscape could even be hacked to dump old junk... or if it is > just too leaky, it could exec itself to fix the problem. Gerrit Huizenga wrote: > Anyway, there is/was an API in PTX to say (either from in-kernel or through > some user machinations) "I Am a System Process". Turns on a bit in the > proc struct (task struct) that made it exempt from death from a variety > of sources, e.g. OOM, generic user signals, portions of system shutdown, > etc. > > Then, the code looking for things to kill simply skips those that are > intelligently marked, taking most of the decision making/policy making > out of the scheduler/memory manager. On AIX there is a signal called SIGDANGER, which is basically what you are looking for. By default it is ignored, but for processes that care (e.g. init, X, whatever) they can register a SIGDANGER handler. At an "urgent" (as oposed to "critical") OOM situation, all processes get a SIGDANGER sent to them. Most will ignore it, but ones with handlers can free caches, try to do a clean shutdown, whatever. Any process with a SIGDANGER handler get a reduction of "badness" (as the OOM killer calls it) when looking for processes to kill. Having a SIGDANGER handler is good for 2 reasons: 1) Lets processes know when memory is short so they can free needless cache. 2) Mark process with a SIGDANGER handler as "more important" than those without. Most people won't care about this, but init, and X, and long-running simulations might. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> If init dies the kernel hangs solid anyway Init should never die. If we get to do_exit in init we'll panic which is the right thing to do (reboot on critical systems). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> (but I'd be curious if somebody actually manages to > trick the OOM killer into killing init ... please > test a bit more to see if this really happens ;)) In a non-real-world situation, yes. (mem=3500k, many drivers, init=/bin/bash, tried to enter a command). Since the process in question (bash) ignores SIGTERM, I actually got a hard hang. We really should turn this into a panic() (panic means your elevator control system reboots and maybe misses the right floor. hard hang means you need to reboot manually). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
"Albert D. Cahalan" <[EMAIL PROTECTED]> writes: > Date: Mon, 9 Oct 2000 19:13:25 -0400 (EDT) > > >> From: Linus Torvalds <[EMAIL PROTECTED]> > > >> One of the biggest bitmaps is the background bitmap. So you have a > >> client that uploads it to X and then goes away. There's nobody to > >> un-count to by the time X decides to switch to another background. > > > > Actually, the big offenders are things other than the background > > bitmap: things like E do absolutely insane things, you would not > > believe (or maybe you would). The background pixmap is generally > > in the worst case typically no worse than 4 megabytes (for those > > people who are crazy enough to put images up as their root window > > on 32 bit deep displays, at 1kX1k resolution). > > Still, it would be nice to recover that 4 MB when the system > doesn't have any memory left. > Yup. The X server could give back the memory for some cases like the background without too much hackery. > X, and any other big friendly processes, could participate in > memory balancing operations. X could be made to clean out a > font cache when the kernel signals that memory is low. When > the situation becomes serious, X could just mmap /dev/zero over > top of the background image. I agree in principle, though the problem is difficult, as the memory pool may get fragmented... Most memory usage is less monolithic than the background pixmap. And maintaining separate memory pools often wastes more memory than it saves. > > Netscape could even be hacked to dump old junk... or if it is > just too leaky, it could exec itself to fix the problem. Netscape 4.x is hopeless; it is leakier than the Titanic. There is hope for Mozilla. - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote: > > If the oom killer kills a thing like init by mistake > That only happens in the "random" OOM killer 2.2 has ... [OOM killer war] Hi there, before you argue endlessly about the "Right OOM Killer (TM)", I did a small patch to allow replacing the OOM killer at runtime. You can even use modules, if you are careful (see khttpd on how to do this without refcouting). So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) PS: Patch is against test9 with Rik's latest vmpatch applied. Thanks for listening Ingo Oeser diff -Naur linux-2.4.0-test9-vmpatch/include/linux/swap.h linux-2.4.0-test9-vmpatch-ioe/include/linux/swap.h --- linux-2.4.0-test9-vmpatch/include/linux/swap.h Sun Oct 8 00:49:17 2000 +++ linux-2.4.0-test9-vmpatch-ioe/include/linux/swap.h Tue Oct 10 00:50:17 2000 @@ -129,6 +129,9 @@ /* linux/mm/oom_kill.c */ extern int out_of_memory(void); extern void oom_kill(void); +void install_oom_killer(void (*new_oom_kill)(void)); +void reset_default_oom_killer(void); + /* * Make these inline later once they are working properly. diff -Naur linux-2.4.0-test9-vmpatch/mm/Makefile linux-2.4.0-test9-vmpatch-ioe/mm/Makefile --- linux-2.4.0-test9-vmpatch/mm/Makefile Sun Oct 8 00:49:17 2000 +++ linux-2.4.0-test9-vmpatch-ioe/mm/Makefile Tue Oct 10 00:10:07 2000 @@ -10,7 +10,8 @@ O_TARGET := mm.o O_OBJS := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \ vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \ - page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o + page_alloc.o swap_state.o swapfile.o numa.o +OX_OBJS := oom_kill.o ifeq ($(CONFIG_HIGHMEM),y) O_OBJS += highmem.o diff -Naur linux-2.4.0-test9-vmpatch/mm/oom_kill.c linux-2.4.0-test9-vmpatch-ioe/mm/oom_kill.c --- linux-2.4.0-test9-vmpatch/mm/oom_kill.c Sun Oct 8 00:49:17 2000 +++ linux-2.4.0-test9-vmpatch-ioe/mm/oom_kill.c Tue Oct 10 00:35:32 2000 @@ -13,6 +13,8 @@ * machine) this file will double as a 'coding guide' and a signpost * for newbie kernel hackers. It features several pointers to major * kernel subsystems and hints as to where to find out what things do. + * + * Added oom_killer API for special needs - Ingo Oeser */ #include @@ -147,7 +149,9 @@ * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that * we select a process with CAP_SYS_RAW_IO set). */ -void oom_kill(void) + + +static void oom_kill_rik(void) { struct task_struct *p = select_bad_process(); @@ -207,4 +211,26 @@ /* Else... */ return 1; +} + +/* Protects oom_killer against resetting during its execution */ +static rwlock_t oom_kill_lock; + +static void (*oom_killer)(void)=oom_kill_rik; + +void oom_kill(void) { + read_lock(_kill_lock); + oom_killer(); + read_unlock(_kill_lock); +} + +void install_oom_killer(void (*new_oom_kill)(void)) { + if (!new_oom_kill) return; + write_lock(_kill_lock); + oom_killer=new_oom_kill; + write_unlock(_kill_lock); +} + +void reset_default_oom_killer(void) { + install_oom_killer(_kill_rik); } -- Feel the power of the penguin - run [EMAIL PROTECTED] :x - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Albert D. Cahalan wrote: > Jim Gettys writes: > >> From: Linus Torvalds <[EMAIL PROTECTED]> > > >> One of the biggest bitmaps is the background bitmap. So you have a > >> client that uploads it to X and then goes away. There's nobody to > >> un-count to by the time X decides to switch to another background. > > > > Actually, the big offenders are things other than the background > > bitmap: things like E do absolutely insane things, you would not > > believe (or maybe you would). The background pixmap is generally > > in the worst case typically no worse than 4 megabytes (for those > > people who are crazy enough to put images up as their root window > > on 32 bit deep displays, at 1kX1k resolution). > > Still, it would be nice to recover that 4 MB when the system > doesn't have any memory left. > > X, and any other big friendly processes, could participate in > memory balancing operations. X could be made to clean out a > font cache when the kernel signals that memory is low. When > the situation becomes serious, X could just mmap /dev/zero over > top of the background image. > > Netscape could even be hacked to dump old junk... or if it is > just too leaky, it could exec itself to fix the problem. Which is all good and well to DELAY the task of the OOM killer for a few more minutes. But in the end, there will be a point where you REALLY run out of memory and you have no other choice than the OOM killer... (not that I'm against alternative measures, I just think they're orthagonal to this whole discussion) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Jim Gettys writes: >> From: Linus Torvalds <[EMAIL PROTECTED]> >> One of the biggest bitmaps is the background bitmap. So you have a >> client that uploads it to X and then goes away. There's nobody to >> un-count to by the time X decides to switch to another background. > > Actually, the big offenders are things other than the background > bitmap: things like E do absolutely insane things, you would not > believe (or maybe you would). The background pixmap is generally > in the worst case typically no worse than 4 megabytes (for those > people who are crazy enough to put images up as their root window > on 32 bit deep displays, at 1kX1k resolution). Still, it would be nice to recover that 4 MB when the system doesn't have any memory left. X, and any other big friendly processes, could participate in memory balancing operations. X could be made to clean out a font cache when the kernel signals that memory is low. When the situation becomes serious, X could just mmap /dev/zero over top of the background image. Netscape could even be hacked to dump old junk... or if it is just too leaky, it could exec itself to fix the problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, 10 Oct 2000, bert hubert wrote: > On Mon, Oct 09, 2000 at 02:38:10PM -0700, Linus Torvalds wrote: > > > So the process that gave X the bitmap dies. What now? Are we going to > > depend on X un-counting the resources? > > > > I'd prefer just X having a higher "mm nice level" or something. > > I wonder how many megabytes we can fill with all messages about > an OOM killer. I remember threads about this from '94 onwards. > Perhaps we can finally have a sane one now :-) In reality, the OOM killer I mailed a few days ago behaves quite well in the real world. I hope Linus will be as sensitive to theoretical arguments with no foundation in reality as I am (ie. not), so we'll have SOMETHING in the kernel soon. If we later find out there are some problems with the OOM killer, we can always change it then. No need to hold up a reasonable solution when the current kernel has NO solution to the problem at all ... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Byron Stanoszek wrote: > On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote: > > > Anyway, there is/was an API in PTX to say (either from in-kernel or through > > some user machinations) "I Am a System Process". Turns on a bit in the > > proc struct (task struct) that made it exempt from death from a variety > > of sources, e.g. OOM, generic user signals, portions of system shutdown, > > etc. > > The current OOM killer does this, except for init. Checking to > see if the process has a page table is equivalent to checking > for the kernel threads that are integral to the system (PIDs > 2-5). These will never be killed by the OOM. Init, however, > still can be killed, and there should be an additional statement > that doesn't kill if PID == 1. Only if you can demonstrate any real-world scenario where init will be chosen with the current algorithm. The "3 MB init on 4MB machine" kind of theoretical argument just isn't convincing if nobody can show that there is a problem in reality. > I think we need to sit down and write a better OOM proposal, > something that doesn't use CPU time and the NICE flag. The nice flag has been removed from my current kernel tree. The CPU time used, however, is a different matter. You really don't want to have the OOM killer kill your 6-week-old running simulation because a newly started netscape explodes ... > How about we start by everyone in this discussion give their > opinion on what the OOM selection process should do, Quoting from mm/oom_kill.c: /** * oom_badness - calculate a numeric value for how bad this task has been * @p: task struct of which task we should calculate * * The formula used is relatively simple and documented inline in the * function. The main rationale is that we want to select a good task * to kill when we run out of memory. * * Good in this context means that: * 1) we lose the minimum amount of work done * 2) we recover a large amount of memory * 3) we don't kill anything innocent of eating tons of memory * 4) we want to kill the minimum amount of processes (one) * 5) we try to kill the process the user expects us to kill, this *algorithm has been meticulously tuned to meet the priniciple *of least surprise ... (be careful when you change it) */ Do you have any additional requirements? regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 02:38:10PM -0700, Linus Torvalds wrote: > So the process that gave X the bitmap dies. What now? Are we going to > depend on X un-counting the resources? > > I'd prefer just X having a higher "mm nice level" or something. I wonder how many megabytes we can fill with all messages about an OOM killer. I remember threads about this from '94 onwards. Perhaps we can finally have a sane one now :-) Regards, bert hubert -- PowerDNS Versatile DNS Services Trilab The Technology People 'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote: > Anyway, there is/was an API in PTX to say (either from in-kernel or through > some user machinations) "I Am a System Process". Turns on a bit in the > proc struct (task struct) that made it exempt from death from a variety > of sources, e.g. OOM, generic user signals, portions of system shutdown, > etc. The current OOM killer does this, except for init. Checking to see if the process has a page table is equivalent to checking for the kernel threads that are integral to the system (PIDs 2-5). These will never be killed by the OOM. Init, however, still can be killed, and there should be an additional statement that doesn't kill if PID == 1. I think we need to sit down and write a better OOM proposal, something that doesn't use CPU time and the NICE flag. Lets concentrate our efforts on what constitutes a good selection method instead of bickering with each other. How about we start by everyone in this discussion give their opinion on what the OOM selection process should do, listing them in both order of importance and severity, giving a rational reason for each choice. Maybe then we can get somewhere. -Byron -- Byron Stanoszek Ph: (330) 644-3059 Systems Programmer Fax: (330) 644-8110 Commercial Timesharing Inc. Email: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
At Sequent, we found that there are a small set of processes which are "critical" to the system's operation in that they should not be killed on swap shortage, memory shortage, etc. This included things like init, potentially inetd, the swapper, page daemon, clusters heartbeat daemon, and generally any core system service which had a user process component. If there wasn't enough memory for those processes, or if those processes weren't already responsible in their use of memory/resources, you were already toast. Anyway, there is/was an API in PTX to say (either from in-kernel or through some user machinations) "I Am a System Process". Turns on a bit in the proc struct (task struct) that made it exempt from death from a variety of sources, e.g. OOM, generic user signals, portions of system shutdown, etc. Then, the code looking for things to kill simply skips those that are intelligently marked, taking most of the decision making/policy making out of the scheduler/memory manager. gerrit > On Mon, 9 Oct 2000, Linus Torvalds wrote: > > On Mon, 9 Oct 2000, Andi Kleen wrote: > > > > > > netscape usually has child processes: the dns helper. > > > > Yeah. > > > > One thing we _can_ (and probably should do) is to do a per-user > > memory pressure thing - we have easy access to the "struct > > user_struct" (every process has a direct pointer to it), and it > > should not be too bad to maintain a per-user "VM pressure" > > counter. > > > > Then, instead of trying to use heuristics like "does this > > process have children" etc, you'd have things like "is this user > > a nasty user", which is a much more valid thing to do and can be > > used to find people who fork tons of processes that are > > mid-sized but use a lot of memory due to just being many.. > > Sure we could do all of this, but does OOM really happen that > often that we want to make the algorithm this complex ? > > The current algorithm seems to work quite well and is already > at the limit of how complex I'd like to see it. Having a less > complex OOM killer turned out to not work very well, but having > a more complex one is - IMHO - probably overkill ... > > regards, > > Rik > -- > "What you're running that piece of shit Gnome?!?!" >-- Miguel de Icaza, UKUUG 2000 > > http://www.conectiva.com/ http://www.surriel.com/ > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to [EMAIL PROTECTED] For more info on Linux MM, > see: http://www.linux.eu.org/Linux-MM/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> From: Linus Torvalds <[EMAIL PROTECTED]> > Date: Mon, 9 Oct 2000 14:50:51 -0700 (PDT) > To: Jim Gettys <[EMAIL PROTECTED]> > Cc: Alan Cox <[EMAIL PROTECTED]>, Andi Kleen <[EMAIL PROTECTED]>, > Ingo Molnar <[EMAIL PROTECTED]>, Andrea Arcangeli <[EMAIL PROTECTED]>, > Rik van Riel <[EMAIL PROTECTED]>, > Byron Stanoszek <[EMAIL PROTECTED]>, > MM mailing list <[EMAIL PROTECTED]>, [EMAIL PROTECTED] > Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler > - > On Mon, 9 Oct 2000, Jim Gettys wrote: > > > > > > On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds > <[EMAIL PROTECTED]> > > said: > > > > > > > > The problem is that there is no way to keep track of them afterwards. > > > > > > So the process that gave X the bitmap dies. What now? Are we going to > > > depend on X un-counting the resources? > > > > > > > X has to uncount the resources already, to free the memory in the X server > > allocated on behalf of that client. X has to get this right, to be a long > > lived server (properly debugged X servers last many months without problems: > > unfortunately, a fair number of DDX's are buggy). > > No, but my point is that it doesn't really work. > > One of the biggest bitmaps is the background bitmap. So you have a client > that uploads it to X and then goes away. There's nobody to un-count to by > the time X decides to switch to another background. Actually, the big offenders are things other than the background bitmap: things like E do absolutely insane things, you would not believe (or maybe you would). The background pixmap is generally in the worst case typically no worse than 4 megabytes (for those people who are crazy enough to put images up as their root window on 32 bit deep displays, at 1kX1k resolution). > > Does that memory just disappear as far as the resource handling is > concerned when the client that originated it dies? No, X recovers the memory when a connection dies, unless the client has gone out of its way to arrange to preserve things across connection termination. Few, if any clients do this: it is primarily possible mostly for debugging purposes, that (fortunately, or unfortunately, depending on your opinion) what happens not just vanish before you can see what happened. So the X server does extensive bookkeeping of its memory usage, and retrieves all memory used by clients when they terminate (with the above rare exception). > > What happens with TCP connections? They might be local. Or they might not. > In either case X doesn't know whom to blame. At least on BSD kernels, it was reasonably straightforward to determine if a TCP connection was local: in that case, the code actually did an upcall and delivered data directly to the appropriate socket. Dunno about the insides of Linux. I suspect it should not be hard to find the right process for local connections. Distant connections are, indeed, a challenge. > > Basically, the only thing _I_ think X can do is to really say "oh, please > don't count my memory, because everything I do I do for my clients, not > for myself". > > THAT is my argument. Basically there is nothing we can reliably account. Your argument has alot of validity, though the X server does a better job of accounting than you might think. BUT, I'm actually more interested in dealing with scheduling preferences, to get really first rate interactive feel. > > So we might as well fall back on just saying "X is more important than > some random client", and have a mm niceness level. Which right now is > obviously approximated by the IO capabilities tests etc. > As I say above, the principle here may be more useful than for the memory example, but for controlling scheduling so we can get great interactive feel. THAT is what is really worth discussing. - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Aaron Sethman wrote: > I think the run time should probably be accounted into to this > as well. Basically start knocking off recent processes first, > which are likely to be childless, and start working your way up > in age. I'm almost getting USENET flashbacks ... ;) Please look at the code before suggesting something that is already there (and has been in the code for some 2 years). regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> > across AF_UNIX sockets so the mechanism is notionally there to provide the > > credentials to X, just not to use them > > The problem is that there is no way to keep track of them afterwards. If you use mmap for your allocator then beancounter will get it right. Every resource knows which beancounter it was charged too. It adds an overhead the average desktop user won't like but which is pretty much essential to do real mainframe world operation. So it would become seteuid(Client->passed_euid); mmap(buffer in pages) seteuid(getuid()); With lightwait counting semantics its hard to make any tracking system work well in the corner cases like resources that survive process death. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, James Sutherland wrote: > On Mon, 9 Oct 2000, Ingo Molnar wrote: > > > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > > > so dns helper is killed first, then netscape. (my idea might not > > > > make sense though.) > > > > > > It makes some sense, but I don't think OOM is something that > > > occurs often enough to care about it /that/ much... > > > > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case, > > with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I > > think it's a legitimate concern - i cannot know in advance whether a > > freshly started process would trigger an OOM or not. > > Shouldn't the runtime factor handle this, making sure the new process is > killed? (Maybe not if you're almost OOM right from the word go, and run > this process straight off... Hrm.) I think the run time should probably be accounted into to this as well. Basically start knocking off recent processes first, which are likely to be childless, and start working your way up in age. The reasoning here is that your less likely an important, long running service. Of course you could probably account for whether the process is childless or not as well. Just my $0.02 on it.. Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Jim Gettys wrote: > > > On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds ><[EMAIL PROTECTED]> > said: > > > > > The problem is that there is no way to keep track of them afterwards. > > > > So the process that gave X the bitmap dies. What now? Are we going to > > depend on X un-counting the resources? > > > > X has to uncount the resources already, to free the memory in the X server > allocated on behalf of that client. X has to get this right, to be a long > lived server (properly debugged X servers last many months without problems: > unfortunately, a fair number of DDX's are buggy). No, but my point is that it doesn't really work. One of the biggest bitmaps is the background bitmap. So you have a client that uploads it to X and then goes away. There's nobody to un-count to by the time X decides to switch to another background. Does that memory just disappear as far as the resource handling is concerned when the client that originated it dies? What happens with TCP connections? They might be local. Or they might not. In either case X doesn't know whom to blame. Basically, the only thing _I_ think X can do is to really say "oh, please don't count my memory, because everything I do I do for my clients, not for myself". THAT is my argument. Basically there is nothing we can reliably account. So we might as well fall back on just saying "X is more important than some random client", and have a mm niceness level. Which right now is obviously approximated by the IO capabilities tests etc. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds <[EMAIL PROTECTED]> said: > > The problem is that there is no way to keep track of them afterwards. > > So the process that gave X the bitmap dies. What now? Are we going to > depend on X un-counting the resources? > X has to uncount the resources already, to free the memory in the X server allocated on behalf of that client. X has to get this right, to be a long lived server (properly debugged X servers last many months without problems: unfortunately, a fair number of DDX's are buggy). - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Rik van Riel wrote: > > > I'd prefer just X having a higher "mm nice level" or something. > > Which it has, because: > > 1) CAP_RAW_IO > 2) p->euid == 0 Oh, I agree, but we might want to generalize this a bit so that root could say "this process is important" and then drop root privileges and still get "credited" for the fact that it's important. It's not a big deal. It works for X right now. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> > Sounds like one needs in addition some mechanism for servers to "charge" > clients for > > consumption. X certainly knows on behalf of which connection resources > > are created; the OS could then transfer this back to the appropriate client > > (at least when on machine). > > Definitely - and this is present in some non Unix OS's. We do pass credentials > across AF_UNIX sockets so the mechanism is notionally there to provide the > credentials to X, just not to use them Stephen Tweedie, Dave Rosenthal, Keith Packard and myself had an extensive discussion on similar ideas around process quantum scheduling (the X server would like to be able to forward quantum to clients) as well at Usenix. This is closely related, and needed to finally fully control interactive feel in the face of "greedy" clients. My memory is that it sounded like things could become very interesting with such a facility, and might be ripe for 2.5. Keith, Stephen, Dave, do you remember the details of our discussion? - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: > On Mon, 9 Oct 2000, Alan Cox wrote: > > > consumption. X certainly knows on behalf of which connection resources > > > are created; the OS could then transfer this back to the appropriate client > > > (at least when on machine). > > > > Definitely - and this is present in some non Unix OS's. We do pass credentials > > across AF_UNIX sockets so the mechanism is notionally there to provide the > > credentials to X, just not to use them > > The problem is that there is no way to keep track of them afterwards. > > So the process that gave X the bitmap dies. What now? Are we going to > depend on X un-counting the resources? > > I'd prefer just X having a higher "mm nice level" or something. Which it has, because: 1) CAP_RAW_IO 2) p->euid == 0 regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Alan Cox wrote: > > consumption. X certainly knows on behalf of which connection resources > > are created; the OS could then transfer this back to the appropriate client > > (at least when on machine). > > Definitely - and this is present in some non Unix OS's. We do pass credentials > across AF_UNIX sockets so the mechanism is notionally there to provide the > credentials to X, just not to use them The problem is that there is no way to keep track of them afterwards. So the process that gave X the bitmap dies. What now? Are we going to depend on X un-counting the resources? I'd prefer just X having a higher "mm nice level" or something. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > Would this complexity /really/ be worth it for the twice-yearly OOM > > situation? > > the only reason i suggested this was the init=/bin/bash, 4MB > RAM, no swap emergency-bootup case. We must not kill init in > that case - if the current code doesnt then great and none of > this is needed. I guess this requires some testing. If anybody can reproduce the bad effects without going /too/ much out of the way of a realistic scenario, the code needs to be fixed. If it turns out to be a non-issue in all scenarios, there's no need to make the code any more complex. regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 10:28:38PM +0100, Alan Cox wrote: > > Sounds like one needs in addition some mechanism for servers to "charge" clients >for > > consumption. X certainly knows on behalf of which connection resources > > are created; the OS could then transfer this back to the appropriate client > > (at least when on machine). > > Definitely - and this is present in some non Unix OS's. We do pass credentials > across AF_UNIX sockets so the mechanism is notionally there to provide the > credentials to X, just not to use them X can get the pid using SO_PEERCRED for unix connections. When the oom killer maintains some kind of badness value in the task_struct it would be possible to add a charge() systemcall that manipulates it. int charge(pid_t pid, int memorytobecharged) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, David Ford wrote: > Not if "init" is a particular program running on a router floppy for > example. The system may be designed to be a router and the userland > monitor/control program is the only thing that runs and consumes 90% of the > memory. If a forked or spawned process starts up with high CPU that just > tips it over the OOM edge, we don't really want to kill init even if it's > taking "all" the memory and or "all" the cpu. this is such a special case it is not worth considering - rather leave it up to the designer of the router floppy to get his stuff right. the one thing that is clear from the many OOM flamewars is that no OOM reaper algorithm will satisfy 100% of conditions 100% of the time. So all Rik can do is optimise for the common case. (roll on beancounting and proper resource limiting - the true but heavyweight solution) regards, -- Paul Jakma [EMAIL PROTECTED] PGP5 key: http://www.clubi.ie/jakma/publickey.txt --- Fortune: Individualists unite! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Rik van Riel wrote: > Would this complexity /really/ be worth it for the twice-yearly OOM > situation? the only reason i suggested this was the init=/bin/bash, 4MB RAM, no swap emergency-bootup case. We must not kill init in that case - if the current code doesnt then great and none of this is needed. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> Sounds like one needs in addition some mechanism for servers to "charge" clients for > consumption. X certainly knows on behalf of which connection resources > are created; the OS could then transfer this back to the appropriate client > (at least when on machine). Definitely - and this is present in some non Unix OS's. We do pass credentials across AF_UNIX sockets so the mechanism is notionally there to provide the credentials to X, just not to use them - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > On Mon, 9 Oct 2000, Alan Cox wrote: > > > Lets kill a 6 week long typical background compute job because > > netscape exploded (and yes netscape has a child process) > > in the paragraph you didnt quote i pointed this out and > suggested adding all parent's badness value to children as well > - so we'd end up killing netscape. Would this complexity /really/ be worth it for the twice-yearly OOM situation? Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> Sender: [EMAIL PROTECTED] > From: "Andi Kleen" <[EMAIL PROTECTED]> > Date: Mon, 9 Oct 2000 22:58:22 +0200 > To: Linus Torvalds <[EMAIL PROTECTED]> > Cc: Andi Kleen <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL PROTECTED]>, > Andrea Arcangeli <[EMAIL PROTECTED]>, > Rik van Riel <[EMAIL PROTECTED]>, > Byron Stanoszek <[EMAIL PROTECTED]>, > MM mailing list <[EMAIL PROTECTED]>, [EMAIL PROTECTED] > Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler > - > On Mon, Oct 09, 2000 at 01:52:21PM -0700, Linus Torvalds wrote: > > One thing we _can_ (and probably should do) is to do a per-user memory > > pressure thing - we have easy access to the "struct user_struct" (every > > process has a direct pointer to it), and it should not be too bad to > > maintain a per-user "VM pressure" counter. > > > > Then, instead of trying to use heuristics like "does this process have > > children" etc, you'd have things like "is this user a nasty user", which > > is a much more valid thing to do and can be used to find people who fork > > tons of processes that are mid-sized but use a lot of memory due to just > > being many.. > > Would not help much when "they" eat your memory by loading big bitmaps > into the X server which runs as root (it seems there are many programs > which are very good at this particular DOS ;) > This is generic to any server program, not unique to X. Sounds like one needs in addition some mechanism for servers to "charge" clients for consumption. X certainly knows on behalf of which connection resources are created; the OS could then transfer this back to the appropriate client (at least when on machine). - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Alan Cox wrote: > Lets kill a 6 week long typical background compute job because > netscape exploded (and yes netscape has a child process) in the paragraph you didnt quote i pointed this out and suggested adding all parent's badness value to children as well - so we'd end up killing netscape. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> i think the OOM algorithm should not kill processes that have > child-processes, it should first kill child-less 'leaves'. Killing a > process that has child processes likely results in unexpected behavior of > those child-processes. (and equals to effective killing of those > child-processes as well.) Lets kill a 6 week long typical background compute job because netscape exploded (and yes netscape has a child process) Rik's current OOM killer works very well but its a heuristic, so like all heuristics you can always find a problem case Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> Then spam the console loudly with printk, but don't destroy the whole machine. > Init should only get killed if it REALLY is taking a lot of memory. On a 4 or 8meg If init dies the kernel hangs solid anyway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: > On Mon, 9 Oct 2000, Andi Kleen wrote: > > > > netscape usually has child processes: the dns helper. > > Yeah. > > One thing we _can_ (and probably should do) is to do a per-user > memory pressure thing - we have easy access to the "struct > user_struct" (every process has a direct pointer to it), and it > should not be too bad to maintain a per-user "VM pressure" > counter. > > Then, instead of trying to use heuristics like "does this > process have children" etc, you'd have things like "is this user > a nasty user", which is a much more valid thing to do and can be > used to find people who fork tons of processes that are > mid-sized but use a lot of memory due to just being many.. Sure we could do all of this, but does OOM really happen that often that we want to make the algorithm this complex ? The current algorithm seems to work quite well and is already at the limit of how complex I'd like to see it. Having a less complex OOM killer turned out to not work very well, but having a more complex one is - IMHO - probably overkill ... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 01:52:21PM -0700, Linus Torvalds wrote: > One thing we _can_ (and probably should do) is to do a per-user memory > pressure thing - we have easy access to the "struct user_struct" (every > process has a direct pointer to it), and it should not be too bad to > maintain a per-user "VM pressure" counter. > > Then, instead of trying to use heuristics like "does this process have > children" etc, you'd have things like "is this user a nasty user", which > is a much more valid thing to do and can be used to find people who fork > tons of processes that are mid-sized but use a lot of memory due to just > being many.. Would not help much when "they" eat your memory by loading big bitmaps into the X server which runs as root (it seems there are many programs which are very good at this particular DOS ;) Also I think most oom situations are accidents anyways, not malicious users. When you're the only user of the machine sophisticated per user accouting won't be very useful. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Andi Kleen wrote: > > netscape usually has child processes: the dns helper. Yeah. One thing we _can_ (and probably should do) is to do a per-user memory pressure thing - we have easy access to the "struct user_struct" (every process has a direct pointer to it), and it should not be too bad to maintain a per-user "VM pressure" counter. Then, instead of trying to use heuristics like "does this process have children" etc, you'd have things like "is this user a nasty user", which is a much more valid thing to do and can be used to find people who fork tons of processes that are mid-sized but use a lot of memory due to just being many.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: > I disagree - if we start adding these kinds of heuristics to it, it > wil just be a way for people to try to confuse the OOM code. Imagine > some bad guy that does 15 fork()'s and then tries to OOM... yep. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: > On Mon, 9 Oct 2000, Ingo Molnar wrote: > > > > i think the OOM algorithm should not kill processes that have > > child-processes, it should first kill child-less 'leaves'. Killing a > > process that has child processes likely results in unexpected behavior of > > those child-processes. (and equals to effective killing of those > > child-processes as well.) > > I disagree - if we start adding these kinds of heuristics to it, > it wil just be a way for people to try to confuse the OOM code. > Imagine some bad guy that does 15 fork()'s and then tries to > OOM... Also, the only way to prevent bad things like this is userbeans, the per-user resource quotas; until we have that there will ALWAYS be ways to fool the OOM killer. It is just a stop-gap measure to recover from a very bad situation... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Rik van Riel wrote: > > How about SIGTERM a bit before SIGKILL then re-evaluate the OOM > > N usecs later? > > And run the risk of having to kill /another/ process as well ? > > I really don't know if that would be a wise thing to do > (but feel free to do some tests to see if your idea would > work ... I'd love to hear some test results with your idea). I was thinking (dangerous) about an urgent v.s. critical OOM. urgent could trigger a SIGTERM which would give advance notice to the offending process. I don't think we have a signal method of notifying processes when resources are critically low, feel free to correct me. Is there a signal that -might- be used for this? -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 09:38:08PM +0100, James Sutherland wrote: > Shouldn't the runtime factor handle this, making sure the new process is The runtime factor in the algorithm will make the first difference only after lots lots of time (and the run_time can as well be wrong because of jiffies wrap around). But even if it would make a difference after 1 second, there would be a 1 second window where init can be killed. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, James Sutherland wrote: > On Mon, 9 Oct 2000, Ingo Molnar wrote: > > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > > > so dns helper is killed first, then netscape. (my idea might not > > > > make sense though.) > > > > > > It makes some sense, but I don't think OOM is something that > > > occurs often enough to care about it /that/ much... > > > > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case, > > with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I > > think it's a legitimate concern - i cannot know in advance whether a > > freshly started process would trigger an OOM or not. > > Shouldn't the runtime factor handle this, making sure the new > process is killed? (Maybe not if you're almost OOM right from > the word go, and run this process straight off... Hrm.) It should. Also, the example is a tad unrealistic since init seems to be around 70 kB in size on my systems ;) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > > i think the OOM algorithm should not kill processes that have > child-processes, it should first kill child-less 'leaves'. Killing a > process that has child processes likely results in unexpected behavior of > those child-processes. (and equals to effective killing of those > child-processes as well.) I disagree - if we start adding these kinds of heuristics to it, it wil just be a way for people to try to confuse the OOM code. Imagine some bad guy that does 15 fork()'s and then tries to OOM... Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > so dns helper is killed first, then netscape. (my idea might not > > > make sense though.) > > > > It makes some sense, but I don't think OOM is something that > > occurs often enough to care about it /that/ much... > > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case, > with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I > think it's a legitimate concern - i cannot know in advance whether a > freshly started process would trigger an OOM or not. Shouldn't the runtime factor handle this, making sure the new process is killed? (Maybe not if you're almost OOM right from the word go, and run this process straight off... Hrm.) James. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, David Ford wrote: > Ingo Molnar wrote: > > > > a good idea to have SIGKILL delivery speeded up for every SIGKILL ... > > > > yep. > > How about SIGTERM a bit before SIGKILL then re-evaluate the OOM > N usecs later? And run the risk of having to kill /another/ process as well ? I really don't know if that would be a wise thing to do (but feel free to do some tests to see if your idea would work ... I'd love to hear some test results with your idea). regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Ingo Molnar wrote: > > a good idea to have SIGKILL delivery speeded up for every SIGKILL ... > > yep. How about SIGTERM a bit before SIGKILL then re-evaluate the OOM N usecs later? -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Rik van Riel wrote: > On Mon, 9 Oct 2000, Andrea Arcangeli wrote: > > On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote: > > > No. It's only needed if your OOM algorithm is so crappy that > > > it might end up killing init by mistake. > > > > The algorithm you posted on the list in this thread will kill > > init if on 4Mbyte machine without swap init is large 3 Mbytes > > and you execute a task that grows over 1M. > > This sounds suspiciously like the description of a DEAD system ;) > > (in which case you simply don't care if init is being killed or not) Not if "init" is a particular program running on a router floppy for example. The system may be designed to be a router and the userland monitor/control program is the only thing that runs and consumes 90% of the memory. If a forked or spawned process starts up with high CPU that just tips it over the OOM edge, we don't really want to kill init even if it's taking "all" the memory and or "all" the cpu. > -- > "There is a natural aristocracy among men. The grounds of this are > virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 05:06:48PM -0300, Rik van Riel wrote: > On Mon, 9 Oct 2000, Andrea Arcangeli wrote: > > On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote: > > > No. It's only needed if your OOM algorithm is so crappy that > > > it might end up killing init by mistake. > > > > The algorithm you posted on the list in this thread will kill > > init if on 4Mbyte machine without swap init is large 3 Mbytes > > and you execute a task that grows over 1M. > > This sounds suspiciously like the description of a DEAD system ;) The system will be DEAD only when your current algorithm will kill init. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > so dns helper is killed first, then netscape. (my idea might not > > > make sense though.) > > > > It makes some sense, but I don't think OOM is something that > > occurs often enough to care about it /that/ much... > > i'm trying to handle Andrea's case, the init=/bin/bash > manual-bootup case, with 4MB RAM and no swap, where the admin > tries to exec a 2MB process. I think it's a legitimate concern - > i cannot know in advance whether a freshly started process would > trigger an OOM or not. In that case the time running and the cpu time used factors should give the new process a heavy penalty compared to init. (but I'd be curious if somebody actually manages to trick the OOM killer into killing init ... please test a bit more to see if this really happens ;)) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Andrea Arcangeli wrote: > On Mon, Oct 09, 2000 at 10:06:02PM +0200, Ingo Molnar wrote: > > i think the OOM algorithm should not kill processes that have > > process that has child processes likely results in unexpected behavior of > > You just know what I think about those heuristics. I think all > we need is a per-task pagefault/allocation rate avoiding any > other complication that tries to do the right thing but that it > will end doing the wrong thing eventually, but obviously nobody > agreeed with me and before I implement that myself it will still > take some time. Furthermore, keeping track of these allocations will mean that you /ALWAYS/ rack up the overhead of keeping track of this, even though most machines probably won't run out of memory ever, or no more than twice a year or so ;) > Even the total_vm information will be wrong for example if the > task was a netscape iconized and completly swapped out that > wasn't running since two days. Killing it is going to only delay > the killing of the real offender that is generating a flood of > page faults at high frequency. However true this may be, I wonder if we really care /that/ much. OOM is a very rare situation and as long as you don't do something that's really a bad surprise to the user, everything should be ok. regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Rik van Riel wrote: > > so dns helper is killed first, then netscape. (my idea might not > > make sense though.) > > It makes some sense, but I don't think OOM is something that > occurs often enough to care about it /that/ much... i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case, with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I think it's a legitimate concern - i cannot know in advance whether a freshly started process would trigger an OOM or not. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andrea Arcangeli wrote: > On Mon, Oct 09, 2000 at 12:30:20PM -0700, David Ford wrote: > > Init should only get killed if it REALLY is taking a lot of memory. On a 4 or 8meg > > Init should never get killed. Killing init can be compared to destroy the TCP > stack. Some app can keep to run right for some minute until they run socket() > and then they will hang. Same with init, some task may still run right for > some time but the machine will die eventually. We simply must not pass the > point of not return or we're buggy and after the bug triggered we have to force > the user to reboot the machine as only way to recover. After 1/2 a second of deep reflection, I concur. Pretty much all interactive processes will die immediately. That just doesn't make for happy penguins. -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Rik van Riel wrote: > Note that the OOM killer already has this code built-in, but it may be oops, i didnt notice (really!). One comment: 5*HZ in your code is way too much for counter, and it might break the scheduler in the future. (right now those counter values are unused, RT priorities start at 1000, so it cannot cause harm, but one never knows.) Please use MAX_COUNTER instead. The SCHED_YIELD thing is a nice trick, it should be added to my signal.c change as well, without the schedule(). > a good idea to have SIGKILL delivery speeded up for every SIGKILL ... yep. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > On Mon, 9 Oct 2000, Andi Kleen wrote: > > > netscape usually has child processes: the dns helper. > > so dns helper is killed first, then netscape. (my idea might not > make sense though.) It makes some sense, but I don't think OOM is something that occurs often enough to care about it /that/ much... My algorithm is already complex enough for my tastes (but seems to work quite well in the sense that it usually picks the "right" process in one shot and kills the process the user expects to be killed). regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 10:06:02PM +0200, Ingo Molnar wrote: > i think the OOM algorithm should not kill processes that have > process that has child processes likely results in unexpected behavior of You just know what I think about those heuristics. I think all we need is a per-task pagefault/allocation rate avoiding any other complication that tries to do the right thing but that it will end doing the wrong thing eventually, but obviously nobody agreeed with me and before I implement that myself it will still take some time. Even the total_vm information will be wrong for example if the task was a netscape iconized and completly swapped out that wasn't running since two days. Killing it is going to only delay the killing of the real offender that is generating a flood of page faults at high frequency. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > what do you think about the attached patch? It increases the effective > priority of a (kernel-) killed process, and initiates a reschedule, so > that it gets selected ASAP. (except if there are RT processes around.) > This should make OOM decisions 'visible' much more quickly. Note that the OOM killer already has this code built-in, but it may be a good idea to have SIGKILL delivery speeded up for every SIGKILL ... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ --- linux/kernel/signal.c.orig Mon Oct 9 12:56:45 2000 +++ linux/kernel/signal.c Mon Oct 9 13:00:20 2000 @@ -569,6 +569,14 @@ spin_unlock_irqrestore(>sigmask_lock, flags); return -ESRCH; } + /* +* Special case, kernel is forcing SIGKILL. +* Decrease signal delivery latency. +*/ + if (sig == SIGKILL && (t->policy == SCHED_OTHER)) { + t->counter = MAX_COUNTER; + current->need_resched = 1; + } if (t->sig->action[sig-1].sa.sa_handler == SIG_IGN) t->sig->action[sig-1].sa.sa_handler = SIG_DFL;
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Andi Kleen wrote: > netscape usually has child processes: the dns helper. so dns helper is killed first, then netscape. (my idea might not make sense though.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Andrea Arcangeli wrote: > On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote: > > No. It's only needed if your OOM algorithm is so crappy that > > it might end up killing init by mistake. > > The algorithm you posted on the list in this thread will kill > init if on 4Mbyte machine without swap init is large 3 Mbytes > and you execute a task that grows over 1M. This sounds suspiciously like the description of a DEAD system ;) (in which case you simply don't care if init is being killed or not) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, David Ford wrote: > Then spam the console loudly with printk, but don't destroy the > whole machine. Init should only get killed if it REALLY is > taking a lot of memory. On a 4 or 8meg machine tho, the > probability of init getting killed is simply too high for > comfort. I have never ever seen init start consuming memory > like this so I'd rather get spammed on the console a LOT then > have my entire machine instantly go dead. Please TEST THIS before spreading Wild Rumours(tm) On 2.2 a /random/ process gets killed when the system gets tight, so you'll see init killed on (pre-kludge) 2.2 kernels, but I don't believe you'll see this with 2.4... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 10:06:02PM +0200, Ingo Molnar wrote: > > On Mon, 9 Oct 2000, Andrea Arcangeli wrote: > > > > No. It's only needed if your OOM algorithm is so crappy that > > > it might end up killing init by mistake. > > > > The algorithm you posted on the list in this thread will kill init if > > on 4Mbyte machine without swap init is large 3 Mbytes and you execute > > a task that grows over 1M. > > i think the OOM algorithm should not kill processes that have > child-processes, it should first kill child-less 'leaves'. Killing a > process that has child processes likely results in unexpected behavior of > those child-processes. (and equals to effective killing of those > child-processes as well.) netscape usually has child processes: the dns helper. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Rik, what do you think about the attached patch? It increases the effective priority of a (kernel-) killed process, and initiates a reschedule, so that it gets selected ASAP. (except if there are RT processes around.) This should make OOM decisions 'visible' much more quickly. Ingo --- linux/kernel/signal.c.orig Mon Oct 9 12:56:45 2000 +++ linux/kernel/signal.c Mon Oct 9 13:00:20 2000 @@ -569,6 +569,14 @@ spin_unlock_irqrestore(>sigmask_lock, flags); return -ESRCH; } + /* +* Special case, kernel is forcing SIGKILL. +* Decrease signal delivery latency. +*/ + if (sig == SIGKILL && (t->policy == SCHED_OTHER)) { + t->counter = MAX_COUNTER; + current->need_resched = 1; + } if (t->sig->action[sig-1].sa.sa_handler == SIG_IGN) t->sig->action[sig-1].sa.sa_handler = SIG_DFL;
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 12:30:20PM -0700, David Ford wrote: > Init should only get killed if it REALLY is taking a lot of memory. On a 4 or 8meg Init should never get killed. Killing init can be compared to destroy the TCP stack. Some app can keep to run right for some minute until they run socket() and then they will hang. Same with init, some task may still run right for some time but the machine will die eventually. We simply must not pass the point of not return or we're buggy and after the bug triggered we have to force the user to reboot the machine as only way to recover. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Andrea Arcangeli wrote: > > No. It's only needed if your OOM algorithm is so crappy that > > it might end up killing init by mistake. > > The algorithm you posted on the list in this thread will kill init if > on 4Mbyte machine without swap init is large 3 Mbytes and you execute > a task that grows over 1M. i think the OOM algorithm should not kill processes that have child-processes, it should first kill child-less 'leaves'. Killing a process that has child processes likely results in unexpected behavior of those child-processes. (and equals to effective killing of those child-processes as well.) But this mechanizm can be abused (a malicious memory hog can create a child-process just to avoid the OOM-killer) - but there are ways to avoid this, eg. to add all the 'MM badness' points to children? Ie. a child which has MM-abuser parent(s) will definitely be killed first. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote: > No. It's only needed if your OOM algorithm is so crappy that > it might end up killing init by mistake. The algorithm you posted on the list in this thread will kill init if on 4Mbyte machine without swap init is large 3 Mbytes and you execute a task that grows over 1M. So I repeat again: for correctness you should either fix the oom algorithm and demonstrate with math that it can't kill init or fix the bug using a magic check. Since it's not going to be possible to proof that an oom algorithm won't kill init (also considering init isn't always /sbin/init) the magic check is going to be the only bugfix possible. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Rik van Riel wrote: > On Mon, 9 Oct 2000, Marco Colombo wrote: > > On Fri, 6 Oct 2000, Rik van Riel wrote: > > > > [...] > > > They are niced because the user thinks them a bit less > > > important. > > > > Please don't, this assumption is quite wrong. I use nice just to > > be 'nice' to other users. I can run my *important* CPU hog > > simulation nice +10 in order to let other people get more CPU > > when the need it. > > In that case the time the process has been running and the > CPU time used will save the process if it's been running for > a long time. > > Please read the /entire/ algorithm before making rash > conclusions like this. What "conclusions"? YOU stated that "They are niced because the user thinks them a bit less important", and I was only commenting on that. I've never said your /entire/ algorithm is failing, the point was on the purpose of the 'nice' level. Users don't use nice to mark less important processes. It's completely orthogonal. And if you really want to correlate nice level and importance, I'd rather say that niced processes tend to be more important that "normal" processes, on average. > If nice is used for important, long-running tasks, the fact > that they are long-running will save them (and be honest, > would you really care if a simulation would be killed after > 5 minutes? it's only inconvenient if it gets killed after > a few hours...) Ok. Now tell me what's the purpose to run your 'ls' at nice +5 at all. You nice processes that are going to take a while, otherwise nicing them has hardly a measurable effect, if any. > > But if you put the logic "niced == not important" somewhere into > > the kernel, nobody will use nice anymore. I'd rather give a > > bonus to niced processes. > > This doesn't make ANY sense at all. The objective is to destroy > the least amount of work, which means giving a bonus to processes > which have used a lot of CPU time already ... regardless of nice > value. 'regardless of nice value' is the part I like. > > all. But my point here is that you do, and you take it as an hint for > > process importance as percieved by the user that run it, and I believe > > it's just wrong guessing). > > If you have a better algorithm, feel free to send patches. No need. Either reverse the weight you give to nice level or just ignore it, which probably is easier. I agree that giving a bonus to niced processed it's nearly useless. As I've written in my previous message, I don't think it's a big issue. OOM should not happen, full stop. OOM killer is a last resort measure, so it needs not to be *too* careful. > > regards, > > Rik > -- > "What you're running that piece of shit Gnome?!?!" >-- Miguel de Icaza, UKUUG 2000 > > http://www.conectiva.com/ http://www.surriel.com/ > > .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Then spam the console loudly with printk, but don't destroy the whole machine. Init should only get killed if it REALLY is taking a lot of memory. On a 4 or 8meg machine tho, the probability of init getting killed is simply too high for comfort. I have never ever seen init start consuming memory like this so I'd rather get spammed on the console a LOT then have my entire machine instantly go dead. We get enough reports about innocuous messages on the console, I'm sure these would get reported to LKML as well...and in short order as is usual. -d Ingo Molnar wrote: > On Mon, 9 Oct 2000, Andrea Arcangeli wrote: > > > On Fri, Oct 06, 2000 at 04:19:55PM -0400, Byron Stanoszek wrote: > > > In the OOM killer, shouldn't there be a check for PID 1 just to enforce that > > > > Init can't be killed in 2.2.x latest, the same bugfix should be forward > > ported to 2.4.x. > > I believe we should not special-case init in this case. If the OOM would > kill init then we *want* to know about it ASAP, because it's either a bug > in the OOM code or a memory leak in init. Both things are very bad, and > ignoring the kill would just preserve those bugs artificially. -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Here's an idea, farfetched as it may be. Page the entire process out to disk into a user defined area, SIGHALT it and use printk or a kthread/userproc to notify the user that something was kicked out of the sandbox for playing bad. The user can add more swap if desired, then use a userland tool w/ the kthread/userproc to pop it back into memory or destroy it and SIGCONT. This allows the user to make almost all decisions about what gets killed and what remains running. Necessarily if you're out of disk space as well we'll have to resort to simply killing. Hmm, sounds like 2.5. -d Jamie Lokier wrote: > Kurt Garloff wrote: > > I could not agree more. Normally, you'd better kill a foreground task > > (running nice 0) than selecting one of those background jobs for some > > reasons: > > * The foreground job can be restarted by the interactive user > > (Most likely, it will be only netscape anyway) > > * The background job probably is the more useful one which has been running > > since a longer time (computations, ...) > > Ick. A background job that's been running for a long time will be saved > by that, as Rik pointed out. > > If I've got a background process running for 30 minutes, and a Netscape > with 5 windows open that I'm using (for long or not, doesn't matter), > guess which one I'd rather died? Not Netscape -- I'm using that and > I'll never remember how to find those 5 windows again if it just dies. > > -- Jamie > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Rik van Riel wrote: > On Mon, 9 Oct 2000, Marco Colombo wrote: > > On Fri, 6 Oct 2000, Rik van Riel wrote: > > > > [...] > > > They are niced because the user thinks them a bit less > > > important. > > > > Please don't, this assumption is quite wrong. I use nice just to > > be 'nice' to other users. I can run my *important* CPU hog > > simulation nice +10 in order to let other people get more CPU > > when the need it. > > In that case the time the process has been running and the > CPU time used will save the process if it's been running for > a long time. Please base this on more on real time, not CPU time. Netscrape consumes an ungodly amount of CPU time and memory and I'd much rather have it killed before anything else on the system. If it wasn't blatantly obvious, I'd check the argv[0] to see if it was "netscape" and kill it. :] > This doesn't make ANY sense at all. The objective is to destroy > the least amount of work, which means giving a bonus to processes > which have used a lot of CPU time already ... regardless of nice > value. But that favors ill written programs if it's based on CPU time. I.e. netscape and all the gnome/kde "tasklets" that take 8 megs w/ 6megs RSS and .9% of a pIII-450 to manage the keyboard and mouse properties. > If you have a better algorithm, feel free to send patches. Step A) A heuristic that tags the program or process group consuming the most RAM in the last N real minutes. A run-away process is the highest target here and is most correctly the guilty party. Step B) A sliding scale of the youngest real program or process group consuming the most RAM. Step C) ... -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/