Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On 2000-10-11 19:53:50 -0700, [EMAIL PROTECTED] wrote: > On other machines I'd set RLIMIT_DATA and my OOM problems went away, > but on linux this didn't work RLIMIT_DATA appears to only be checked for aout format executables. Looking at the 2.4.0-test10pre1 sources for fs/binfmt_aout.c and fs/binfmt_elf.c you'll note the difference in load_aout_binary() and load_elf_binary(), both just above the comment of "OK, This is the point of no return" Does putting a similar check to the aout one make sense for ELF? I'm just trying to avoid Rik having to pull his hair out implementing a system that conceptually already exists in the kernel (nasty processes being terminated before they do some damage). Especially when that existing system is far more configurable. Cheers, -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On 2000-10-11 11:45:06 -0400, Bruce A. Locke wrote: > This manpage shows me functions and structs. What were you expecting from the system call section of the Linux Programmer's Manual? Dancing girls? (h...) > I'm assuming you want these used by the offending program or the shell > under which the program is being called. That's usually what happens. > In the first case, a person might not have source to the program and > if thats the case, it doesn't help much. Closed-source software is *so* 20th century... ;-) Anyway, when run from the shell it'll inherit its parent's limits (which leads to your next question...) > And in the second case, if the shell sets it, does it affect children > of a process (aka fork()'d)? Certainly. Maybe if more distributions took Debian's stance and set the default limits so anal that you frequently can't even read email let alone recompile the kernel without getting the process terminated for tripping one limit or another, then more people would know this functionality exists and set the limits more appropriately. -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On 2000-10-11 12:48:54 -0400, Andrew Pimlott wrote: > No way should a desktop user be responsible for micro-managing the > resource usage of his applications. That's right. The systems administrator should, and will set appropriate limits for users on his/her system that apply from login. This is how the systems I first used were configured (lucky me had a damn fine sysadmin), and so this is how I configure mine. > The only thing that knows what's right for Netscape is Netscape. I would disagree with this, I believe this is exactly the root of people's problems with Netscape (and the same theory should apply to other apps). The application doesn't know what's _right_ - it knows what it _wants_. Big difference. -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On Thu, Oct 12, 2000 at 01:58:49AM +1100, Matthew Hawkins wrote: > On 2000-10-11 10:33:39 -0400, Bruce A. Locke wrote: > > > > Your making the deadly assumption that all applications behave themselves > > exactly the same all the time. Oops... netscape decided to freak out and > > take up all your memory... guess its the admins fault. > > Yep, for not setting appropriate resource limits. No way should a desktop user be responsible for micro-managing the resource usage of his applications. How can he decide what's reasonable for Netscape to consume? Shouldn't Netscape be allowed to take up most of memory, if it's the only major application and the memory will improve its performance? The only thing that knows what's right for Netscape is Netscape. If Netscape were clever and kind, perhaps it would estimate what's reasonable and set limits on itself, adjusting them from time to time based on user behavior and environmental factors. But Netscape's a pretty mature program, and it doesn't do this; it can hardly be expected of the zillions of immature (and probably leaky) applications a user might run. So, we inevitably need an automated low-memory or out-of-memory algorithm. I tend to think it may need to be more adjustable than Rik's--people will be much more comfortable if they can say "spare this simulation at all cost!" or "kill off one of these processes in an emergency" or "this system has no business coming within 90% of RAM+swap capacity, so start killing things at that point--oh, and mail me". Some of this has no place in the kernel, obviously. But Rik has a good start, and perhaps his work will be part of a more complete solution. Andrew - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
- Received message begins Here - > > On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote: > > Until user memory resource quotas are included in the kernel, there will be > > nothing else that can be done. Even with resource quotas, if the total of > > active users exceeds the resource then the same/equivalent situation occurs. > > So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS, > RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op? > > If so, I wish to register a complaint ;-) Not exactly. As I have seen it, each process gets a copy of these limits. A single process cannot exceed the limit, but the sum of all processes can. One of the problems is cause by COW: given trivially small limits (1 MB) The first process allocates and initializes up to one MB, then forks. The second process begins updating data - .5MB. Neither process exceeds the limits, but the sum is now 1.5MB. If this is repeated enough, then the system can go OOM, with none of the processes at or over the limits set. Another problem occurs on multi-user servers. Each user logs in and gets "reasonable" rlimit values - each user uses one medium sized process. If the #users * rlimits exceeds the system capacity then OOM could occur, and still none may have exceeded the rlimit. I've always treated rlimit values as "suggestions" to the user process to aid in debugging. (this is more applicable to the ulimits though). The users process will not exeed the value, and when they do it is a strong suggestion that a bug may be present. (I first saw this with a leakey X server.) There have been some patches (the beancounter stuff) that does relate to resource control, but a more integrated resource accounting will make it work better. I do believe it should be available as an option, especially for multi-user servers, clusters, and other large systems. It isn't that usefull on single user workstations. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On Thu, 12 Oct 2000, Matthew Hawkins wrote: > > Seriously, am I missing something obvious or is it far simpler just to > keel over and die if the system goes OOM? I mean, seriously, if the > administrator lets it get to that state then he/she/it deserves a dead > system. It's akin to having your car run out of petrol - you don't > start shooting passengers because their extra load made the engine chew > more. You pack up your kitty and go to the nearest petrol station and > buy more, plug it into the car then learn from the experience so this > fringe case of it happening doesn't happen again. I don't really see > much difference between a car going "OOP" and a computer going OOM. > Should we start deleting files according to some randomly-chosen > heueristic if a filesystem goes "OOS" ? Excellent point. However, the idea is to kill an attacker if your 'car' is being hijacked. Whatever is being designed should ideally have zero impact on the usual performance and only come into play if something runs away, deliberately or by accident. If Linux doesn't track down and kill deliberate attempts to kill the system, there will always be those who say; "Linux is no good because a user can readily kill it". Of course we could track down and kill those who say this, but it'd get messy. FYI, a fork() bomb on my Sun Workstation does not kill it. Also malloc()ing and writing all over the place doesn't kill it either. Script started on Wed Oct 11 10:41:38 2000 # cat xxx.c main() { for(;;) fork(); } # gcc -o xxx xxx.c # ./xxx ^C # # ^C # ps PID TTY TIME CMD 24800 pts/10:00 xxx 24335 pts/10:00 sh 24688 pts/10:00 xxx 24690 pts/10:00 xxx 24692 pts/10:00 xxx 24694 pts/10:00 xxx 24696 pts/10:00 xxx 24697 pts/10:00 xxx 24699 pts/10:00 xxx 24701 pts/10:00 xxx 24703 pts/10:00 xxx 24704 pts/10:00 xxx 24706 pts/10:00 xxx 24708 pts/10:00 xxx 24710 pts/10:00 xxx 24712 pts/10:00 xxx 24714 pts/10:00 xxx 24716 pts/10:00 xxx 24717 pts/10:00 xxx 24719 pts/10:00 xxx 24720 pts/10:00 xxx 24721 pts/10:00 xxx 24722 pts/10:00 xxx 24723 pts/10:00 xxx 24724 pts/10:00 xxx 24725 pts/10:00 xxx 24726 pts/10:00 xxx 24727 pts/10:00 xxx 24728 pts/10:00 xxx 24729 pts/10:00 xxx 24730 pts/10:00 xxx 24731 pts/10:00 xxx 24732 pts/10:00 xxx 24733 pts/10:00 xxx 24734 pts/10:00 xxx 24735 pts/10:00 xxx 24736 pts/10:00 xxx 24737 pts/10:00 xxx 24738 pts/10:00 xxx 24739 pts/10:00 xxx 24740 pts/10:00 xxx 24741 pts/10:00 xxx 24742 pts/10:00 xxx 24743 pts/10:00 xxx 24744 pts/10:00 xxx 24801 pts/10:00 ps 24687 pts/10:00 xxx 24689 pts/10:00 xxx 24691 pts/10:00 xxx 24693 pts/10:00 xxx 24695 pts/10:00 xxx 24698 pts/10:00 xxx 24700 pts/10:00 xxx 24702 pts/10:00 xxx 24705 pts/10:00 xxx 24707 pts/10:00 xxx 24709 pts/10:00 xxx 24711 pts/10:00 xxx 24713 pts/10:00 xxx 24715 pts/10:00 xxx 24718 pts/10:00 xxx 24653 pts/10:00 xxx 24610 pts/10:00 xxx 24614 pts/10:00 xxx 24615 pts/10:00 xxx 24616 pts/10:00 xxx 24617 pts/10:00 xxx 24618 pts/10:00 xxx 24619 pts/10:00 xxx 24620 pts/10:00 xxx 24621 pts/10:00 xxx 24622 pts/10:00 xxx 24623 pts/10:00 xxx 24624 pts/10:00 xxx 24625 pts/10:00 xxx 24626 pts/10:00 xxx 24627 pts/10:00 xxx 24628 pts/10:00 xxx 24629 pts/10:00 xxx 24630 pts/10:00 xxx 24631 pts/10:00 xxx 24632 pts/10:00 xxx 24686 pts/10:00 xxx 24685 pts/10:00 xxx 24684 pts/10:00 xxx 24683 pts/10:00 xxx 24682 pts/10:00 xxx 24681 pts/10:00 xxx 24680 pts/10:00 xxx 24679 pts/10:00 xxx 24678 pts/10:00 xxx 24677 pts/10:00 xxx 24676 pts/10:00 xxx 24675 pts/10:00 xxx 24674 pts/10:00 xxx 24673 pts/10:00 xxx 24672 pts/10:00 xxx 24671 pts/10:00 xxx 24670 pts/10:00 xxx 24669 pts/10:00 xxx 24668 pts/10:00 xxx 24667 pts/10:00 xxx 24666 pts/10:00 xxx 24665 pts/10:00 xxx 24664 pts/10:00 xxx 24663 pts/10:00 xxx 24662 pts/10:00 xxx 24661 pts/10:00 xxx 24660 pts/10:00 xxx 24659 pts/10:00 xxx 24658 pts/10:00 xxx 24657 pts/10:00 xxx 24656 pts/10:00 xxx 24655 pts/10:00 xxx 24654 pts/10:00 xxx 24652 pts/10:00 xxx 24651 pts/10:00 xxx 24650 pts/10:00 xxx 24649 pts/10:00 xxx 24648 pts/10:00 xxx 24647 pts/10:00 xxx 24646 pts/10:00 xxx 24645 pts/10:00 xxx 24644 pts/10:00 xxx 24643 pts/10:00 xxx 24642 pts/10:00 xxx 24634 pts/10:00 xxx 24633 pts/10:00 xxx 24641 pts/10:00 xxx 24640 pts/10:00 xxx 24639 pts/10:00 xxx 24638 pts/10:00 xxx 24637 pts/10:00 xxx 24636 pts/10:00 xxx 24635 pts/10:00 xxx 24613 pts/1
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On 2000-10-11 10:33:39 -0400, Bruce A. Locke wrote: > > Your making the deadly assumption that all applications behave themselves > exactly the same all the time. Oops... netscape decided to freak out and > take up all your memory... guess its the admins fault. Yep, for not setting appropriate resource limits. man 2 setrlimit Of course, if its a kernel bug that causes it I think you're SOL ;) -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote: > Until user memory resource quotas are included in the kernel, there will be > nothing else that can be done. Even with resource quotas, if the total of > active users exceeds the resource then the same/equivalent situation occurs. So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS, RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op? If so, I wish to register a complaint ;-) -- * Matthew Hawkins <[EMAIL PROTECTED]> :(){ :|:&};: ** Information Specialist, tSA Group Pty. Ltd. Ph: +61 2 6257 7111 *** 1 Hall Street, Lyneham ACT 2602 Australia. Fx: +61 2 6257 7311 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
- Received message begins Here - > > > Heh.. now all we need is some smart-arse to make something similar to > apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy > ;) > > Seriously, am I missing something obvious or is it far simpler just to > keel over and die if the system goes OOM? I mean, seriously, if the > administrator lets it get to that state then he/she/it deserves a dead > system. It's akin to having your car run out of petrol - you don't > start shooting passengers because their extra load made the engine chew > more. You pack up your kitty and go to the nearest petrol station and > buy more, plug it into the car then learn from the experience so this > fringe case of it happening doesn't happen again. I don't really see > much difference between a car going "OOP" and a computer going OOM. > Should we start deleting files according to some randomly-chosen > heueristic if a filesystem goes "OOS" ? Not deleting files, but your system may crash :) The problem with memory is that the tools are not available (ie already included in the kernel) to do anything else. In the example of running out of file space, there are quota limits. You can still run out of space, but only when the sum of all users quota allocations exceed the disk capacity. Until user memory resource quotas are included in the kernel, there will be nothing else that can be done. Even with resource quotas, if the total of active users exceeds the resource then the same/equivalent situation occurs. What is being done is still necessary, but in the long term it will end up addressing the case where a single user runs out, rather than the system as a whole. User memory resource quota control is needed in large clusters, and in large systems with multiple users. In a single user environment, resource quotas are less important than providing a consistant (and hopefully intutitive) process abort. That keeps the system going, and becomes up to the user to choose what else may need to be aborted. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
Heh.. now all we need is some smart-arse to make something similar to apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy ;) Seriously, am I missing something obvious or is it far simpler just to keel over and die if the system goes OOM? I mean, seriously, if the administrator lets it get to that state then he/she/it deserves a dead system. It's akin to having your car run out of petrol - you don't start shooting passengers because their extra load made the engine chew more. You pack up your kitty and go to the nearest petrol station and buy more, plug it into the car then learn from the experience so this fringe case of it happening doesn't happen again. I don't really see much difference between a car going "OOP" and a computer going OOM. Should we start deleting files according to some randomly-chosen heueristic if a filesystem goes "OOS" ? -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Wed, Oct 11, 2000 at 11:08:41AM +0200, Helge Hafting wrote: > Nothing wrong with a big init - the problem is a memory-leaking init. > That one will die anyway, wether it dies early from an OOM-killer > or later when all other processes are gone don't really matter. Indeed. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andrea Arcangeli wrote: > > On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote: > > If you want init to live - prove that it don't eat too much memory. > > I don't see why the machine should be stable only if init is small. > My kernel won't be stable only if init is small since it doesn't cost > anything to handle correctly the big init case. > Nothing wrong with a big init - the problem is a memory-leaking init. That one will die anyway, wether it dies early from an OOM-killer or later when all other processes are gone don't really matter. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Andrea Arcangeli wrote: On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote: If you want init to live - prove that it don't eat too much memory. I don't see why the machine should be stable only if init is small. My kernel won't be stable only if init is small since it doesn't cost anything to handle correctly the big init case. Nothing wrong with a big init - the problem is a memory-leaking init. That one will die anyway, wether it dies early from an OOM-killer or later when all other processes are gone don't really matter. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Wed, Oct 11, 2000 at 11:08:41AM +0200, Helge Hafting wrote: Nothing wrong with a big init - the problem is a memory-leaking init. That one will die anyway, wether it dies early from an OOM-killer or later when all other processes are gone don't really matter. Indeed. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
Heh.. now all we need is some smart-arse to make something similar to apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy ;) Seriously, am I missing something obvious or is it far simpler just to keel over and die if the system goes OOM? I mean, seriously, if the administrator lets it get to that state then he/she/it deserves a dead system. It's akin to having your car run out of petrol - you don't start shooting passengers because their extra load made the engine chew more. You pack up your kitty and go to the nearest petrol station and buy more, plug it into the car then learn from the experience so this fringe case of it happening doesn't happen again. I don't really see much difference between a car going "OOP" and a computer going OOM. Should we start deleting files according to some randomly-chosen heueristic if a filesystem goes "OOS" ? -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
Your making the deadly assumption that all applications behave themselves exactly the same all the time. Oops... netscape decided to freak out and take up all your memory... guess its the admins fault. Oops... some mod_perl script decided to freak out and an apache process decides to suck all of your CPU and MEM. Crap like this does happen. An example of this is a webboard package called "Blackboard" consisting of various mod_perl scripts, apache, and mysql. It is an educational online conferencing system being used in conjunction with many college classes and thus is quite vital to the campus. Unfortunatly its buggy as hell and the memory sucking bug didn't pop up until we were a couple weeks into classes and locked into the system. A mod_perl script freaks out, the copy of apache goes nuts, and we get a bunch of lovely out of memory related messages to the console. Its times like these that an OOM killer like Rik's would be very useful. I feel Rik's OOM backported to 2.2.x would do wonders for situation. After playing with Rik's OOM system, I know it would do the right thing on this system but unfortunatly 2.4.x isn't trustworthy yet Yes, the software is buggy and should be fixed. Do I have the power to fix a broken commerical package that I'm locked into? No. The point of an OOM killer is if all hell breaks loose and you have a choice between a locked up system, a system thats slow as hell because its spending all its time swapping, or a system that kills the offender and gets back to buisness. I choose the third option. I can't think of any situation (either on desktop or server) where a system lockup or panic due to OOM would be acceptible w/ 2.4.x. On Thu, 12 Oct 2000, Matthew Hawkins wrote: Heh.. now all we need is some smart-arse to make something similar to apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy ;) Seriously, am I missing something obvious or is it far simpler just to keel over and die if the system goes OOM? I mean, seriously, if the administrator lets it get to that state then he/she/it deserves a dead system. It's akin to having your car run out of petrol - you don't start shooting passengers because their extra load made the engine chew more. You pack up your kitty and go to the nearest petrol station and buy more, plug it into the car then learn from the experience so this fringe case of it happening doesn't happen again. I don't really see much difference between a car going "OOP" and a computer going OOM. Should we start deleting files according to some randomly-chosen heueristic if a filesystem goes "OOS" ? -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ -- Bruce A. Locke [EMAIL PROTECTED] "The Internet views censorship as damage and routes around it" www.eff.org www.peacefire.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
- Received message begins Here - Heh.. now all we need is some smart-arse to make something similar to apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy ;) Seriously, am I missing something obvious or is it far simpler just to keel over and die if the system goes OOM? I mean, seriously, if the administrator lets it get to that state then he/she/it deserves a dead system. It's akin to having your car run out of petrol - you don't start shooting passengers because their extra load made the engine chew more. You pack up your kitty and go to the nearest petrol station and buy more, plug it into the car then learn from the experience so this fringe case of it happening doesn't happen again. I don't really see much difference between a car going "OOP" and a computer going OOM. Should we start deleting files according to some randomly-chosen heueristic if a filesystem goes "OOS" ? Not deleting files, but your system may crash :) The problem with memory is that the tools are not available (ie already included in the kernel) to do anything else. In the example of running out of file space, there are quota limits. You can still run out of space, but only when the sum of all users quota allocations exceed the disk capacity. Until user memory resource quotas are included in the kernel, there will be nothing else that can be done. Even with resource quotas, if the total of active users exceeds the resource then the same/equivalent situation occurs. What is being done is still necessary, but in the long term it will end up addressing the case where a single user runs out, rather than the system as a whole. User memory resource quota control is needed in large clusters, and in large systems with multiple users. In a single user environment, resource quotas are less important than providing a consistant (and hopefully intutitive) process abort. That keeps the system going, and becomes up to the user to choose what else may need to be aborted. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote: Until user memory resource quotas are included in the kernel, there will be nothing else that can be done. Even with resource quotas, if the total of active users exceeds the resource then the same/equivalent situation occurs. So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS, RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op? If so, I wish to register a complaint ;-) -- * Matthew Hawkins [EMAIL PROTECTED] :(){ :|:};: ** Information Specialist, tSA Group Pty. Ltd. Ph: +61 2 6257 7111 *** 1 Hall Street, Lyneham ACT 2602 Australia. Fx: +61 2 6257 7311 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Wed, 11 Oct 2000, Bruce A. Locke wrote: Your making the deadly assumption that all applications behave themselves exactly the same all the time. Oops... netscape decided to freak out and take up all your memory... guess its the admins fault. Oops... some mod_perl script decided to freak out and an apache process decides to suck all of your CPU and MEM. that's why you have per process limits set. Eg, PAM makes this exceedingly easy with pam_limit.so - edit /etc/security/limit.conf. this prevents at least 90% of OOM situations (ie individual leaky processes). eg netscape will then pop-up "can not allocate memory" messages and stop rendering pages instead of crashing your system. --paulj - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On 2000-10-11 10:33:39 -0400, Bruce A. Locke wrote: Your making the deadly assumption that all applications behave themselves exactly the same all the time. Oops... netscape decided to freak out and take up all your memory... guess its the admins fault. Yep, for not setting appropriate resource limits. man 2 setrlimit Of course, if its a kernel bug that causes it I think you're SOL ;) -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Thu, 12 Oct 2000, Matthew Hawkins wrote: Seriously, am I missing something obvious or is it far simpler just to keel over and die if the system goes OOM? I mean, seriously, if the administrator lets it get to that state then he/she/it deserves a dead system. It's akin to having your car run out of petrol - you don't start shooting passengers because their extra load made the engine chew more. You pack up your kitty and go to the nearest petrol station and buy more, plug it into the car then learn from the experience so this fringe case of it happening doesn't happen again. I don't really see much difference between a car going "OOP" and a computer going OOM. Should we start deleting files according to some randomly-chosen heueristic if a filesystem goes "OOS" ? Excellent point. However, the idea is to kill an attacker if your 'car' is being hijacked. Whatever is being designed should ideally have zero impact on the usual performance and only come into play if something runs away, deliberately or by accident. If Linux doesn't track down and kill deliberate attempts to kill the system, there will always be those who say; "Linux is no good because a user can readily kill it". Of course we could track down and kill those who say this, but it'd get messy. FYI, a fork() bomb on my Sun Workstation does not kill it. Also malloc()ing and writing all over the place doesn't kill it either. Script started on Wed Oct 11 10:41:38 2000 # cat xxx.c main() { for(;;) fork(); } # gcc -o xxx xxx.c # ./xxx ^C # # ^C # ps PID TTY TIME CMD 24800 pts/10:00 xxx 24335 pts/10:00 sh 24688 pts/10:00 xxx 24690 pts/10:00 xxx 24692 pts/10:00 xxx 24694 pts/10:00 xxx 24696 pts/10:00 xxx 24697 pts/10:00 xxx 24699 pts/10:00 xxx 24701 pts/10:00 xxx 24703 pts/10:00 xxx 24704 pts/10:00 xxx 24706 pts/10:00 xxx 24708 pts/10:00 xxx 24710 pts/10:00 xxx 24712 pts/10:00 xxx 24714 pts/10:00 xxx 24716 pts/10:00 xxx 24717 pts/10:00 xxx 24719 pts/10:00 xxx 24720 pts/10:00 xxx 24721 pts/10:00 xxx 24722 pts/10:00 xxx 24723 pts/10:00 xxx 24724 pts/10:00 xxx 24725 pts/10:00 xxx 24726 pts/10:00 xxx 24727 pts/10:00 xxx 24728 pts/10:00 xxx 24729 pts/10:00 xxx 24730 pts/10:00 xxx 24731 pts/10:00 xxx 24732 pts/10:00 xxx 24733 pts/10:00 xxx 24734 pts/10:00 xxx 24735 pts/10:00 xxx 24736 pts/10:00 xxx 24737 pts/10:00 xxx 24738 pts/10:00 xxx 24739 pts/10:00 xxx 24740 pts/10:00 xxx 24741 pts/10:00 xxx 24742 pts/10:00 xxx 24743 pts/10:00 xxx 24744 pts/10:00 xxx 24801 pts/10:00 ps 24687 pts/10:00 xxx 24689 pts/10:00 xxx 24691 pts/10:00 xxx 24693 pts/10:00 xxx 24695 pts/10:00 xxx 24698 pts/10:00 xxx 24700 pts/10:00 xxx 24702 pts/10:00 xxx 24705 pts/10:00 xxx 24707 pts/10:00 xxx 24709 pts/10:00 xxx 24711 pts/10:00 xxx 24713 pts/10:00 xxx 24715 pts/10:00 xxx 24718 pts/10:00 xxx 24653 pts/10:00 xxx 24610 pts/10:00 xxx 24614 pts/10:00 xxx 24615 pts/10:00 xxx 24616 pts/10:00 xxx 24617 pts/10:00 xxx 24618 pts/10:00 xxx 24619 pts/10:00 xxx 24620 pts/10:00 xxx 24621 pts/10:00 xxx 24622 pts/10:00 xxx 24623 pts/10:00 xxx 24624 pts/10:00 xxx 24625 pts/10:00 xxx 24626 pts/10:00 xxx 24627 pts/10:00 xxx 24628 pts/10:00 xxx 24629 pts/10:00 xxx 24630 pts/10:00 xxx 24631 pts/10:00 xxx 24632 pts/10:00 xxx 24686 pts/10:00 xxx 24685 pts/10:00 xxx 24684 pts/10:00 xxx 24683 pts/10:00 xxx 24682 pts/10:00 xxx 24681 pts/10:00 xxx 24680 pts/10:00 xxx 24679 pts/10:00 xxx 24678 pts/10:00 xxx 24677 pts/10:00 xxx 24676 pts/10:00 xxx 24675 pts/10:00 xxx 24674 pts/10:00 xxx 24673 pts/10:00 xxx 24672 pts/10:00 xxx 24671 pts/10:00 xxx 24670 pts/10:00 xxx 24669 pts/10:00 xxx 24668 pts/10:00 xxx 24667 pts/10:00 xxx 24666 pts/10:00 xxx 24665 pts/10:00 xxx 24664 pts/10:00 xxx 24663 pts/10:00 xxx 24662 pts/10:00 xxx 24661 pts/10:00 xxx 24660 pts/10:00 xxx 24659 pts/10:00 xxx 24658 pts/10:00 xxx 24657 pts/10:00 xxx 24656 pts/10:00 xxx 24655 pts/10:00 xxx 24654 pts/10:00 xxx 24652 pts/10:00 xxx 24651 pts/10:00 xxx 24650 pts/10:00 xxx 24649 pts/10:00 xxx 24648 pts/10:00 xxx 24647 pts/10:00 xxx 24646 pts/10:00 xxx 24645 pts/10:00 xxx 24644 pts/10:00 xxx 24643 pts/10:00 xxx 24642 pts/10:00 xxx 24634 pts/10:00 xxx 24633 pts/10:00 xxx 24641 pts/10:00 xxx 24640 pts/10:00 xxx 24639 pts/10:00 xxx 24638 pts/10:00 xxx 24637 pts/10:00 xxx 24636 pts/10:00 xxx 24635 pts/10:00 xxx 24613 pts/10:00 xxx
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
- Received message begins Here - On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote: Until user memory resource quotas are included in the kernel, there will be nothing else that can be done. Even with resource quotas, if the total of active users exceeds the resource then the same/equivalent situation occurs. So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS, RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op? If so, I wish to register a complaint ;-) Not exactly. As I have seen it, each process gets a copy of these limits. A single process cannot exceed the limit, but the sum of all processes can. One of the problems is cause by COW: given trivially small limits (1 MB) The first process allocates and initializes up to one MB, then forks. The second process begins updating data - .5MB. Neither process exceeds the limits, but the sum is now 1.5MB. If this is repeated enough, then the system can go OOM, with none of the processes at or over the limits set. Another problem occurs on multi-user servers. Each user logs in and gets "reasonable" rlimit values - each user uses one medium sized process. If the #users * rlimits exceeds the system capacity then OOM could occur, and still none may have exceeded the rlimit. I've always treated rlimit values as "suggestions" to the user process to aid in debugging. (this is more applicable to the ulimits though). The users process will not exeed the value, and when they do it is a strong suggestion that a bug may be present. (I first saw this with a leakey X server.) There have been some patches (the beancounter stuff) that does relate to resource control, but a more integrated resource accounting will make it work better. I do believe it should be available as an option, especially for multi-user servers, clusters, and other large systems. It isn't that usefull on single user workstations. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Thu, 12 Oct 2000, Matthew Hawkins wrote: Yep, for not setting appropriate resource limits. man 2 setrlimit Of course, if its a kernel bug that causes it I think you're SOL ;) This manpage shows me functions and structs. I'm assuming you want these used by the offending program or the shell under which the program is being called. In the first case, a person might not have source to the program and if thats the case, it doesn't help much. And in the second case, if the shell sets it, does it affect children of a process (aka fork()'d)? Thanks for yout time... -- Matt -- Bruce A. Locke [EMAIL PROTECTED] "The Internet views censorship as damage and routes around it" www.eff.org www.peacefire.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Wed, 11 Oct 2000, Paul Jakma wrote: that's why you have per process limits set. Eg, PAM makes this exceedingly easy with pam_limit.so - edit /etc/security/limit.conf. this prevents at least 90% of OOM situations (ie individual leaky processes). eg netscape will then pop-up "can not allocate memory" messages and stop rendering pages instead of crashing your system. I wasn't aware PAM settings affected daemons started up during boottime but I will check into it, thank you. BTW, you said it works only 90%, what are the other 10% of times it doesn't work? --paulj -- Bruce A. Locke [EMAIL PROTECTED] "The Internet views censorship as damage and routes around it" www.eff.org www.peacefire.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Wed, 11 Oct 2000, Bruce A. Locke wrote: I wasn't aware PAM settings affected daemons started up during boottime but I will check into it, thank you. daemons generally don't need to be PAM aware (unless they deal with authorising things). The script that launches it however (if started by a PAM aware app such as su) can set limits - which the daemon should inherit. BTW, you said it works only 90%, what are the other 10% of times it doesn't work? malicious processes, or a collection of processes. --paulj - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Thu, Oct 12, 2000 at 01:58:49AM +1100, Matthew Hawkins wrote: On 2000-10-11 10:33:39 -0400, Bruce A. Locke wrote: Your making the deadly assumption that all applications behave themselves exactly the same all the time. Oops... netscape decided to freak out and take up all your memory... guess its the admins fault. Yep, for not setting appropriate resource limits. No way should a desktop user be responsible for micro-managing the resource usage of his applications. How can he decide what's reasonable for Netscape to consume? Shouldn't Netscape be allowed to take up most of memory, if it's the only major application and the memory will improve its performance? The only thing that knows what's right for Netscape is Netscape. If Netscape were clever and kind, perhaps it would estimate what's reasonable and set limits on itself, adjusting them from time to time based on user behavior and environmental factors. But Netscape's a pretty mature program, and it doesn't do this; it can hardly be expected of the zillions of immature (and probably leaky) applications a user might run. So, we inevitably need an automated low-memory or out-of-memory algorithm. I tend to think it may need to be more adjustable than Rik's--people will be much more comfortable if they can say "spare this simulation at all cost!" or "kill off one of these processes in an emergency" or "this system has no business coming within 90% of RAM+swap capacity, so start killing things at that point--oh, and mail me". Some of this has no place in the kernel, obviously. But Rik has a good start, and perhaps his work will be part of a more complete solution. Andrew - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Thu, 12 Oct 2000, Matthew Hawkins wrote: On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote: Until user memory resource quotas are included in the kernel, there will be nothing else that can be done. Even with resource quotas, if the total of active users exceeds the resource then the same/equivalent situation occurs. So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS, RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op? If so, I wish to register a complaint ;-) Don't send a complaint, send patches ... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On 2000-10-11 12:48:54 -0400, Andrew Pimlott wrote: No way should a desktop user be responsible for micro-managing the resource usage of his applications. That's right. The systems administrator should, and will set appropriate limits for users on his/her system that apply from login. This is how the systems I first used were configured (lucky me had a damn fine sysadmin), and so this is how I configure mine. The only thing that knows what's right for Netscape is Netscape. I would disagree with this, I believe this is exactly the root of people's problems with Netscape (and the same theory should apply to other apps). The application doesn't know what's _right_ - it knows what it _wants_. Big difference. -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On 2000-10-11 11:45:06 -0400, Bruce A. Locke wrote: This manpage shows me functions and structs. What were you expecting from the system call section of the Linux Programmer's Manual? Dancing girls? (h...) I'm assuming you want these used by the offending program or the shell under which the program is being called. That's usually what happens. In the first case, a person might not have source to the program and if thats the case, it doesn't help much. Closed-source software is *so* 20th century... ;-) Anyway, when run from the shell it'll inherit its parent's limits (which leads to your next question...) And in the second case, if the shell sets it, does it affect children of a process (aka fork()'d)? Certainly. Maybe if more distributions took Debian's stance and set the default limits so anal that you frequently can't even read email let alone recompile the kernel without getting the process terminated for tripping one limit or another, then more people would know this functionality exists and set the limits more appropriately. -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
I've had to support an app running as a back-end to a webserver that would malloc() different amounts of memory depending on user input, up to multiple gigabytes of memory which vastly exceeded the 512k the machine had as main memory. The app was a program that would scan genetic sequence looking for 'repeats' in the sequence, and one sequence would malloc a hundred megs while a similar sequence of the same size would cause the algorithm to try to malloc over a gig. Part of the algorithm was actually to simply try to malloc all the memory it could and if it ran out, it would bump down the resolution that it was scanning with and try again. And it would regularly push the machine into OOM and take it down because daemons got killed long before the OOM killer got around to taking out the process that was malloc()ing all the memory. On other machines I'd set RLIMIT_DATA and my OOM problems went away, but on linux this didn't work (and i wasn't comfortable enough with kernel sources back then to manage to find RLIMIT_AS). On Thu, 12 Oct 2000, Matthew Hawkins wrote: Heh.. now all we need is some smart-arse to make something similar to apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy ;) Seriously, am I missing something obvious or is it far simpler just to keel over and die if the system goes OOM? I mean, seriously, if the administrator lets it get to that state then he/she/it deserves a dead system. It's akin to having your car run out of petrol - you don't start shooting passengers because their extra load made the engine chew more. You pack up your kitty and go to the nearest petrol station and buy more, plug it into the car then learn from the experience so this fringe case of it happening doesn't happen again. I don't really see much difference between a car going "OOP" and a computer going OOM. Should we start deleting files according to some randomly-chosen heueristic if a filesystem goes "OOS" ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On 2000-10-11 19:53:50 -0700, [EMAIL PROTECTED] wrote: On other machines I'd set RLIMIT_DATA and my OOM problems went away, but on linux this didn't work RLIMIT_DATA appears to only be checked for aout format executables. Looking at the 2.4.0-test10pre1 sources for fs/binfmt_aout.c and fs/binfmt_elf.c you'll note the difference in load_aout_binary() and load_elf_binary(), both just above the comment of "OK, This is the point of no return" Does putting a similar check to the aout one make sense for ELF? I'm just trying to avoid Rik having to pull his hair out implementing a system that conceptually already exists in the kernel (nasty processes being terminated before they do some damage). Especially when that existing system is far more configurable. Cheers, -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On Tue, Oct 10, 2000 at 05:58:46PM -0300, Rik van Riel wrote: > On Tue, 10 Oct 2000, Tom Rini wrote: > > On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote: > > > On Tue, 10 Oct 2000, Ingo Oeser wrote: > > > > > > > before you argue endlessly about the "Right OOM Killer (TM)", I > > > > did a small patch to allow replacing the OOM killer at runtime. > > > > > > > > So now you can stop arguing about the one and only OOM killer, > > > > implement it, provide it as module and get back to the important > > > > stuff ;-) > > > > > > This is definately a cool toy for people who have doubts > > > that my OOM killer will do the wrong thing in their > > > workloads. > > > > I think this can be useful for more than just a cool toy. I > > think that the main thing that this discusion has shown is no > > OOM killer will please 100% of the people 100% of the time. I > > think we should try and have a good generic OOM killer that > > kills the right process most of the time. People can impliment > > (and submit) different-style OOM killers as needed. > > Indeed, though I suspect most of the people trying this would > fall into the trap of over-engineering their OOM killer, after > which it mostly becomes less predictable ;) I was thinking more along the lines of ones w/ "safety" features that not everyone might like/need (ie /usr/local/bin/foo is always good, those sugjestions). It seems like useful functionality at little/no cost. And a neat toy for now. :) -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote: > On Tue, 10 Oct 2000, Ingo Oeser wrote: > > > before you argue endlessly about the "Right OOM Killer (TM)", I > > did a small patch to allow replacing the OOM killer at runtime. > > > > So now you can stop arguing about the one and only OOM killer, > > implement it, provide it as module and get back to the important > > stuff ;-) > > This is definately a cool toy for people who have doubts > that my OOM killer will do the wrong thing in their > workloads. I think this can be useful for more than just a cool toy. I think that the main thing that this discusion has shown is no OOM killer will please 100% of the people 100% of the time. I think we should try and have a good generic OOM killer that kills the right process most of the time. People can impliment (and submit) different-style OOM killers as needed. Or at least get 'em on freshmeat. :) -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Olaf Titz wrote: > > > > Still, it would be nice to recover that 4 MB when the system > > > doesn't have any memory left. > > Yup. The X server could give back the memory for some cases like the > > background without too much hackery. > > Then Linux only needs to implement SIGDANGER, which has been talked > about for years... > > X would be a good candidate to implement a handler for it. Others are > Emacs, Mozilla or JVMs - basically everything which has a GC of some > sort. It could even be used to implement a configurable user mode OOM > killer. It would be good to talk to the KDE and Gnome folks about this as well. I am pretty sure they have large blocks of memory that could be flushed or freed in a low-memory or OOM condition. Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, 10 Oct 2000, Rogier Wolff wrote: > > So if Netscape can "pump" 40 extra megabytes of memory out of X, this > can be exploited. > > Now we're back to the point that a heuristic can never be right all > the time.. I agree. In fact, we never left that. Nothing is perfect. In fact, a lot of engineering is _recognizing_ that you can never achieve "perfect", and you're much better off not even trying - and having a simple system that is "good enough". This is the old adage of "perfect is the enemy of good" - trying too hard is actually _detrimental_ in 99% of all cases. We should have simple heuristics that work most of the time, instead of trying to cajole a complex system like X to help us do some complicated resource management system. Complexity will just result in the OOM killer failing in surprising ways. A simple heuristic will mean that the OOM killer will still fail, but at least it won't be be in subtle and surprising ways. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote: > > So now you can stop arguing about the one and only OOM killer, > > implement it, provide it as module and get back to the important > > stuff ;-) > > This is definately a cool toy for people who have doubts > that my OOM killer will do the wrong thing in their > workloads. Thanks ;-) But I forgot to include my changes to the mm/Makefile (to export the API for modules). Here is a _working_ one: --- linux-2.4.0-test10-pre1/mm/oom_kill.c Tue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c Tue Oct 10 16:59:27 2000 @@ -13,6 +13,8 @@ * machine) this file will double as a 'coding guide' and a signpost * for newbie kernel hackers. It features several pointers to major * kernel subsystems and hints as to where to find out what things do. + * + * Added oom_killer API for special needs - Ingo Oeser */ #include @@ -136,7 +138,7 @@ } /** - * oom_kill - kill the "best" process when we run out of memory + * oom_kill_rik - kill the "best" process when we run out of memory * * If we run out of memory, we have the choice between either * killing a random task (bad), letting the system crash (worse) @@ -147,7 +149,9 @@ * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that * we select a process with CAP_SYS_RAW_IO set). */ -void oom_kill(void) + + +static void oom_kill_rik(void) { struct task_struct *p = select_bad_process(); @@ -207,4 +211,63 @@ /* Else... */ return 1; +} + +/* Protects oom_killer against resetting during its execution */ +static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED; + +static oom_killer_t oom_killer = oom_kill_rik; + +/** + * oom_kill - the oom_kill wrapper for installable OOM killers + * + * Wraper around the OOM killers, that can be installed via + * install_oom_killer and reset_default_oom_killer. + * + * This gets called from kswapd() in linux/mm/vmscan.c when we + * really run out of memory. + */ +void oom_kill(void) { + read_lock(_kill_lock); + oom_killer(); + read_unlock(_kill_lock); +} + +/** + * install_oom_killer - install alternate OOM killer + * @new_oom_kill: the alternate OOM killer provided by the caller + * + * Since the default OOM killer (oom_kill_rik) is not suitable + * for everyone, we provide an interface to install custom OOM killers. + * + * You can take the most appropriate action for your application if the + * kernel goes OOM. + * + * Providing an NULL argument just returns the current OOM killer. + * + * Returns: The OOM killer, which has been installed so far. + * + * NOTE: We don't do refcounting on OOM killers, so be careful with + * modules + */ +oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) { + oom_killer_t tmp; + write_lock(_kill_lock); + tmp=oom_killer; + if (new_oom_kill) + oom_killer=new_oom_kill; + write_unlock(_kill_lock); + return tmp; +} + +/** + * reset_default_oom_killer - reset back to default OOM killer + * + * If you are going to unload the module which provided + * your OOM killer, you can install the default one by this. + * + * Returns: The OOM killer, which has been installed so far. + */ +oom_killer_t reset_default_oom_killer(void) { + return install_oom_killer(_kill_rik); } --- linux-2.4.0-test10-pre1/include/linux/swap.hTue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.hTue Oct 10 16:44:22 2000 @@ -127,8 +127,14 @@ #define read_swap_cache(entry) read_swap_cache_async(entry, 1); /* linux/mm/oom_kill.c */ +typedef void (*oom_killer_t)(void); + extern int out_of_memory(void); extern void oom_kill(void); + +oom_killer_t install_oom_killer(oom_killer_t new_oom_kill); +oom_killer_t reset_default_oom_killer(void); + /* * Make these inline later once they are working properly. --- linux-2.4.0-test10-pre1/mm/Makefile Tue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/mm/Makefile Tue Oct 10 16:34:06 2000 @@ -10,7 +10,8 @@ O_TARGET := mm.o O_OBJS := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \ vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \ - page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o + page_alloc.o swap_state.o swapfile.o numa.o +OX_OBJS := oom_kill.o ifeq ($(CONFIG_HIGHMEM),y) O_OBJS += highmem.o Regards Ingo Oeser -- Feel the power of the penguin - run [EMAIL PROTECTED] :x - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, Oct 10, 2000 at 12:30:51PM -0300, Rik van Riel wrote: > Not killing init when we "should" definately prevents > embedded systems from auto-rebooting when they should > do so. > > (OTOH, I don't think embedded systems will run into > this OOM issue too much) but when they do, they're hard to fix. Think about an elevator control system with a single process that happens to implement a somewhat broken version of the elevator algorithm ;) > > that's what I said. we need to be sure to _get_ a panic() though. > > I believe the kernel automatically panic()s when init > dies ... from kernel/exit.c::do_exit() > > if (tsk->pid == 1) > panic("Attempted to kill init!"); guess who added that code. We still kill init with SIGTERM which doesn't seem to work though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, 10 Oct 2000, Philipp Rumpf wrote: > On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote: > > On Tue, 10 Oct 2000, Philipp Rumpf wrote: > > > > > The algorithm you posted on the list in this thread will kill > > > > > init if on 4Mbyte machine without swap init is large 3 Mbytes > > > > > and you execute a task that grows over 1M. > > > > > > > > This sounds suspiciously like the description of a DEAD system ;) > > > > > > But wouldn't a watchdog daemon which doesn't allocate any memory > > > still get run ? > > > > Indeed, it would. It would also /prevent/ the system > > from automatically rebooting itself into a usable state ;) > > So it's not dead in the "oh, it'll be back in 30 seconds" sense. > So our behaviour is broken (more so than random process > killing). *nod* Not killing init when we "should" definately prevents embedded systems from auto-rebooting when they should do so. (OTOH, I don't think embedded systems will run into this OOM issue too much) > > > You care about getting an automatic reboot. So you need to be sure the > > > watchdog daemon gets killed first or you panic() after some time. > > > > echo 30 > /proc/sys/kernel/panic > > that's what I said. we need to be sure to _get_ a panic() though. I believe the kernel automatically panic()s when init dies ... from kernel/exit.c::do_exit() if (tsk->pid == 1) panic("Attempted to kill init!"); [which will make our system auto-reboot and be back on its feet in a healty state again soon] regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote: > On Tue, 10 Oct 2000, Philipp Rumpf wrote: > > > > The algorithm you posted on the list in this thread will kill > > > > init if on 4Mbyte machine without swap init is large 3 Mbytes > > > > and you execute a task that grows over 1M. > > > > > > This sounds suspiciously like the description of a DEAD system ;) > > > > But wouldn't a watchdog daemon which doesn't allocate any memory > > still get run ? > > Indeed, it would. It would also /prevent/ the system > from automatically rebooting itself into a usable state ;) So it's not dead in the "oh, it'll be back in 30 seconds" sense. So our behaviour is broken (more so than random process killing). > > You care about getting an automatic reboot. So you need to be sure the > > watchdog daemon gets killed first or you panic() after some time. > > echo 30 > /proc/sys/kernel/panic that's what I said. we need to be sure to _get_ a panic() though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, 10 Oct 2000, Philipp Rumpf wrote: > > > The algorithm you posted on the list in this thread will kill > > > init if on 4Mbyte machine without swap init is large 3 Mbytes > > > and you execute a task that grows over 1M. > > > > This sounds suspiciously like the description of a DEAD system ;) > > But wouldn't a watchdog daemon which doesn't allocate any memory > still get run ? Indeed, it would. It would also /prevent/ the system from automatically rebooting itself into a usable state ;) > > (in which case you simply don't care if init is being killed or not) > > You care about getting an automatic reboot. So you need to be sure the > watchdog daemon gets killed first or you panic() after some time. echo 30 > /proc/sys/kernel/panic regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
[OOM killer war] Hi there, before you argue endlessly about the "Right OOM Killer (TM)", I did a small patch to allow replacing the OOM killer at runtime. You can even use modules, if you are careful (see khttpd on how to do this without refcouting). So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) PS: Patch is against test10-pre1. Thanks for listening Ingo Oeser --- linux-2.4.0-test10-pre1/mm/oom_kill.c Tue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c Tue Oct 10 16:59:27 2000 @@ -13,6 +13,8 @@ * machine) this file will double as a 'coding guide' and a signpost * for newbie kernel hackers. It features several pointers to major * kernel subsystems and hints as to where to find out what things do. + * + * Added oom_killer API for special needs - Ingo Oeser */ #include @@ -136,7 +138,7 @@ } /** - * oom_kill - kill the "best" process when we run out of memory + * oom_kill_rik - kill the "best" process when we run out of memory * * If we run out of memory, we have the choice between either * killing a random task (bad), letting the system crash (worse) @@ -147,7 +149,9 @@ * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that * we select a process with CAP_SYS_RAW_IO set). */ -void oom_kill(void) + + +static void oom_kill_rik(void) { struct task_struct *p = select_bad_process(); @@ -207,4 +211,63 @@ /* Else... */ return 1; +} + +/* Protects oom_killer against resetting during its execution */ +static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED; + +static oom_killer_t oom_killer = oom_kill_rik; + +/** + * oom_kill - the oom_kill wrapper for installable OOM killers + * + * Wraper around the OOM killers, that can be installed via + * install_oom_killer and reset_default_oom_killer. + * + * This gets called from kswapd() in linux/mm/vmscan.c when we + * really run out of memory. + */ +void oom_kill(void) { + read_lock(_kill_lock); + oom_killer(); + read_unlock(_kill_lock); +} + +/** + * install_oom_killer - install alternate OOM killer + * @new_oom_kill: the alternate OOM killer provided by the caller + * + * Since the default OOM killer (oom_kill_rik) is not suitable + * for everyone, we provide an interface to install custom OOM killers. + * + * You can take the most appropriate action for your application if the + * kernel goes OOM. + * + * Providing an NULL argument just returns the current OOM killer. + * + * Returns: The OOM killer, which has been installed so far. + * + * NOTE: We don't do refcounting on OOM killers, so be careful with + * modules + */ +oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) { + oom_killer_t tmp; + write_lock(_kill_lock); + tmp=oom_killer; + if (new_oom_kill) + oom_killer=new_oom_kill; + write_unlock(_kill_lock); + return tmp; +} + +/** + * reset_default_oom_killer - reset back to default OOM killer + * + * If you are going to unload the module which provided + * your OOM killer, you can install the default one by this. + * + * Returns: The OOM killer, which has been installed so far. + */ +oom_killer_t reset_default_oom_killer(void) { + return install_oom_killer(_kill_rik); } --- linux-2.4.0-test10-pre1/include/linux/swap.hTue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.hTue Oct 10 16:44:22 2000 @@ -127,8 +127,14 @@ #define read_swap_cache(entry) read_swap_cache_async(entry, 1); /* linux/mm/oom_kill.c */ +typedef void (*oom_killer_t)(void); + extern int out_of_memory(void); extern void oom_kill(void); + +oom_killer_t install_oom_killer(oom_killer_t new_oom_kill); +oom_killer_t reset_default_oom_killer(void); + /* * Make these inline later once they are working properly. -- Feel the power of the penguin - run [EMAIL PROTECTED] :x - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Linus Torvalds wrote: > Basically, the only thing _I_ think X can do is to really say "oh, please > don't count my memory, because everything I do I do for my clients, not > for myself". > > THAT is my argument. Basically there is nothing we can reliably account. > > So we might as well fall back on just saying "X is more important than > some random client", and have a mm niceness level. Which right now is > obviously approximated by the IO capabilities tests etc. FYI: I ran my machine out of memory (without crashing by the way) this weekend by loading a whole bunch of large images into netscape. I noticed not being able to open more windows when I saw my swapspace exhausted. I noticed the large netscape, and killed it. At that moment my X was still taking 80Mb of RAM. I manually killed it and restarted it to get rid of that memory. So if Netscape can "pump" 40 extra megabytes of memory out of X, this can be exploited. Now we're back to the point that a heuristic can never be right all the time.. Roger. -- ** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * Common sense is the collection of* ** prejudices acquired by age eighteen. -- Albert Einstein - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote: > If you want init to live - prove that it don't eat too much memory. I don't see why the machine should be stable only if init is small. My kernel won't be stable only if init is small since it doesn't cost anything to handle correctly the big init case. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, Oct 10, 2000 at 04:38:02AM +0100, Philipp Rumpf wrote: > Init should never die. If we get to do_exit in init we'll panic which is > the right thing to do (reboot on critical systems). If the page fault can fail with OOM on init, init will get a SIGSEGV while running a signal handler (copy-user will return -EFAULT regardless it was an oom or a real segfault) and it _won't_ panic and the system is unusable. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > > I'd prefer just X having a higher "mm nice level" or something. > > > > Which it has, because: > > > > 1) CAP_RAW_IO > > 2) p->euid == 0 > > Oh, I agree, but we might want to generalize this a bit so that root could > say "this process is important" and then drop root privileges and still > get "credited" for the fact that it's important. > > It's not a big deal. It works for X right now. How about using p->rlim[RLIMIT_AS].rlim_cur to weight the badness point for a process? On my system, a 128MB RAM + 256MB swap, it defaults to some (insane?) value: bash$ ulimit -vH -vS virtual memory (kbytes) 4194302 virtual memory (kbytes) 2105343 for every process, which just means it is unused. The idea is: 1) set default for rlim[RLIMIT_AS].rlim_max to a saner value; 2) processes with higher rlim[RLIMIT_AS].rlim_cur get lower badness. This way, the badness of a process is not proportional to its absolute size, but to the fraction of allowed AS it is using. Processes that are capable(CAP_SYS_RESOURCE) can set RLIMIT_AS to a very high value, so they get less badness point. X is a perfect candidate. User's runaway processes (netscape) will have lower rlim[RLIMIT_AS].rlim_cur, thus will get higher badness. Something like: - points = p->mm->total_vm; + points = p->mm->total_vm / (p->rlim[RLIMIT_AS].rlim_cur << AS_FACTOR); with #define AS_FACTOR 30 maybe? (this is Rik's call, he knows better than me how to balance it...) It's simple, it's configurable. 1) may be enforced by the kernel, or completely left to user space. On my system, in its default configuration (no use of RLIMIT_AS), it has no impact at all (all processes have the same limit). Sounds good or am I missing something? > > Linus > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
--On 09 October 2000, 17:40 -0300 Rik van Riel <[EMAIL PROTECTED]> wrote: > On Mon, 9 Oct 2000, James Sutherland wrote: >> On Mon, 9 Oct 2000, Ingo Molnar wrote: >> > On Mon, 9 Oct 2000, Rik van Riel wrote: >> > >> > > > so dns helper is killed first, then netscape. (my idea might not >> > > > make sense though.) >> > > >> > > It makes some sense, but I don't think OOM is something that >> > > occurs often enough to care about it /that/ much... >> > >> > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup >> > case, with 4MB RAM and no swap, where the admin tries to exec a 2MB >> > process. I think it's a legitimate concern - i cannot know in advance >> > whether a freshly started process would trigger an OOM or not. >> >> Shouldn't the runtime factor handle this, making sure the new >> process is killed? (Maybe not if you're almost OOM right from >> the word go, and run this process straight off... Hrm.) > > It should. > > Also, the example is a tad unrealistic since init seems to be > around 70 kB in size on my systems ;) In extreme cases, though, you could arrange things so the machine only has 100K of RAM when it loads init, at which point init tries running, say, rc.sysinit - and everything goes bang. Of course, a machine like that won't be very much use anyway... More realistically, though, I could be running with something like init=/bin/sash - does your statically linked sash binary fit in 70K? :-) James. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andreas Dilger wrote: > Having a SIGDANGER handler is good for 2 reasons: > 1) Lets processes know when memory is short so they can free needless cache. > 2) Mark process with a SIGDANGER handler as "more important" than those >without. Most people won't care about this, but init, and X, and >long-running simulations might. For point 1, it would be much nicer to have user processes participate in memory balancing _before_ getting anywhere near an OOM state. A nice way is to send SIGDANGER with siginfo saying how much memory the kernel wants back (or how fast). Applications that don't know to use that info, but do have a SIGDANGER handler, will still react just rather more severely. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Albert D. Cahalan wrote: > X, and any other big friendly processes, could participate in > memory balancing operations. X could be made to clean out a > font cache when the kernel signals that memory is low. When > the situation becomes serious, X could just mmap /dev/zero over > top of the background image. Haven't we already had this discussion? Quite a lot of programs have cached data (X fonts, Netscape (lots!)), GC-able data (Emacs, Java etc.), data that can simply be discarded (X window backing stores), or data that can be written to disk on demand (Netscape again). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 06:34:29PM -0300, Rik van Riel wrote: > On Mon, 9 Oct 2000, Ingo Molnar wrote: > > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > > Would this complexity /really/ be worth it for the twice-yearly OOM > > > situation? > > > > the only reason i suggested this was the init=/bin/bash, 4MB > > RAM, no swap emergency-bootup case. We must not kill init in > > that case - if the current code doesnt then great and none of > > this is needed. perhaps a boot time option oom=0 ? since oom is such a rare case, this wouldn't impact normal usage... -- john slee <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> > Still, it would be nice to recover that 4 MB when the system > > doesn't have any memory left. > Yup. The X server could give back the memory for some cases like the > background without too much hackery. Then Linux only needs to implement SIGDANGER, which has been talked about for years... X would be a good candidate to implement a handler for it. Others are Emacs, Mozilla or JVMs - basically everything which has a GC of some sort. It could even be used to implement a configurable user mode OOM killer. Olaf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andrea Arcangeli wrote: > > On Mon, Oct 09, 2000 at 08:42:26PM +0200, Ingo Molnar wrote: > > ignoring the kill would just preserve those bugs artificially. > > If the oom killer kills a thing like init by mistake or init has a memleak > you'll notice both problems regardless of having a magic for init in a _very_ > slow path so I don't buy your point. > . > For corretness init must not be killed ever, period. > > So you have two choices: > > o math proof that the current algorithm without the magic can't end > killing init (and I should be able to proof the other way around > instead) > > o have a magic check for init > > So the magic is _strictly_ necessary at the moment. A well-written init will be saved by being the oldest process around. A memory-leaking init _will_ be killed even whith your magic test, when the kernel eventually gets stuck OOM and init is the only process left (all the other have been OOM-killed before.) A deadlocked kernel don't schedule any processes, so they are all dead. If you want init to live - prove that it don't eat too much memory. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Andrea Arcangeli wrote: On Mon, Oct 09, 2000 at 08:42:26PM +0200, Ingo Molnar wrote: ignoring the kill would just preserve those bugs artificially. If the oom killer kills a thing like init by mistake or init has a memleak you'll notice both problems regardless of having a magic for init in a _very_ slow path so I don't buy your point. . For corretness init must not be killed ever, period. So you have two choices: o math proof that the current algorithm without the magic can't end killing init (and I should be able to proof the other way around instead) o have a magic check for init So the magic is _strictly_ necessary at the moment. A well-written init will be saved by being the oldest process around. A memory-leaking init _will_ be killed even whith your magic test, when the kernel eventually gets stuck OOM and init is the only process left (all the other have been OOM-killed before.) A deadlocked kernel don't schedule any processes, so they are all dead. If you want init to live - prove that it don't eat too much memory. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Albert D. Cahalan wrote: X, and any other big friendly processes, could participate in memory balancing operations. X could be made to clean out a font cache when the kernel signals that memory is low. When the situation becomes serious, X could just mmap /dev/zero over top of the background image. Haven't we already had this discussion? Quite a lot of programs have cached data (X fonts, Netscape (lots!)), GC-able data (Emacs, Java etc.), data that can simply be discarded (X window backing stores), or data that can be written to disk on demand (Netscape again). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Andreas Dilger wrote: Having a SIGDANGER handler is good for 2 reasons: 1) Lets processes know when memory is short so they can free needless cache. 2) Mark process with a SIGDANGER handler as "more important" than those without. Most people won't care about this, but init, and X, and long-running simulations might. For point 1, it would be much nicer to have user processes participate in memory balancing _before_ getting anywhere near an OOM state. A nice way is to send SIGDANGER with siginfo saying how much memory the kernel wants back (or how fast). Applications that don't know to use that info, but do have a SIGDANGER handler, will still react just rather more severely. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
--On 09 October 2000, 17:40 -0300 Rik van Riel [EMAIL PROTECTED] wrote: On Mon, 9 Oct 2000, James Sutherland wrote: On Mon, 9 Oct 2000, Ingo Molnar wrote: On Mon, 9 Oct 2000, Rik van Riel wrote: so dns helper is killed first, then netscape. (my idea might not make sense though.) It makes some sense, but I don't think OOM is something that occurs often enough to care about it /that/ much... i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case, with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I think it's a legitimate concern - i cannot know in advance whether a freshly started process would trigger an OOM or not. Shouldn't the runtime factor handle this, making sure the new process is killed? (Maybe not if you're almost OOM right from the word go, and run this process straight off... Hrm.) It should. Also, the example is a tad unrealistic since init seems to be around 70 kB in size on my systems ;) In extreme cases, though, you could arrange things so the machine only has 100K of RAM when it loads init, at which point init tries running, say, rc.sysinit - and everything goes bang. Of course, a machine like that won't be very much use anyway... More realistically, though, I could be running with something like init=/bin/sash - does your statically linked sash binary fit in 70K? :-) James. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: On Mon, 9 Oct 2000, Rik van Riel wrote: I'd prefer just X having a higher "mm nice level" or something. Which it has, because: 1) CAP_RAW_IO 2) p-euid == 0 Oh, I agree, but we might want to generalize this a bit so that root could say "this process is important" and then drop root privileges and still get "credited" for the fact that it's important. It's not a big deal. It works for X right now. How about using p-rlim[RLIMIT_AS].rlim_cur to weight the badness point for a process? On my system, a 128MB RAM + 256MB swap, it defaults to some (insane?) value: bash$ ulimit -vH -vS virtual memory (kbytes) 4194302 virtual memory (kbytes) 2105343 for every process, which just means it is unused. The idea is: 1) set default for rlim[RLIMIT_AS].rlim_max to a saner value; 2) processes with higher rlim[RLIMIT_AS].rlim_cur get lower badness. This way, the badness of a process is not proportional to its absolute size, but to the fraction of allowed AS it is using. Processes that are capable(CAP_SYS_RESOURCE) can set RLIMIT_AS to a very high value, so they get less badness point. X is a perfect candidate. User's runaway processes (netscape) will have lower rlim[RLIMIT_AS].rlim_cur, thus will get higher badness. Something like: - points = p-mm-total_vm; + points = p-mm-total_vm / (p-rlim[RLIMIT_AS].rlim_cur AS_FACTOR); with #define AS_FACTOR 30 maybe? (this is Rik's call, he knows better than me how to balance it...) It's simple, it's configurable. 1) may be enforced by the kernel, or completely left to user space. On my system, in its default configuration (no use of RLIMIT_AS), it has no impact at all (all processes have the same limit). Sounds good or am I missing something? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, Oct 10, 2000 at 04:38:02AM +0100, Philipp Rumpf wrote: Init should never die. If we get to do_exit in init we'll panic which is the right thing to do (reboot on critical systems). If the page fault can fail with OOM on init, init will get a SIGSEGV while running a signal handler (copy-user will return -EFAULT regardless it was an oom or a real segfault) and it _won't_ panic and the system is unusable. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote: If you want init to live - prove that it don't eat too much memory. I don't see why the machine should be stable only if init is small. My kernel won't be stable only if init is small since it doesn't cost anything to handle correctly the big init case. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Linus Torvalds wrote: Basically, the only thing _I_ think X can do is to really say "oh, please don't count my memory, because everything I do I do for my clients, not for myself". THAT is my argument. Basically there is nothing we can reliably account. So we might as well fall back on just saying "X is more important than some random client", and have a mm niceness level. Which right now is obviously approximated by the IO capabilities tests etc. FYI: I ran my machine out of memory (without crashing by the way) this weekend by loading a whole bunch of large images into netscape. I noticed not being able to open more windows when I saw my swapspace exhausted. I noticed the large netscape, and killed it. At that moment my X was still taking 80Mb of RAM. I manually killed it and restarted it to get rid of that memory. So if Netscape can "pump" 40 extra megabytes of memory out of X, this can be exploited. Now we're back to the point that a heuristic can never be right all the time.. Roger. -- ** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * Common sense is the collection of* ** prejudices acquired by age eighteen. -- Albert Einstein - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, 10 Oct 2000, Philipp Rumpf wrote: The algorithm you posted on the list in this thread will kill init if on 4Mbyte machine without swap init is large 3 Mbytes and you execute a task that grows over 1M. This sounds suspiciously like the description of a DEAD system ;) But wouldn't a watchdog daemon which doesn't allocate any memory still get run ? Indeed, it would. It would also /prevent/ the system from automatically rebooting itself into a usable state ;) (in which case you simply don't care if init is being killed or not) You care about getting an automatic reboot. So you need to be sure the watchdog daemon gets killed first or you panic() after some time. echo 30 /proc/sys/kernel/panic regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
[OOM killer war] Hi there, before you argue endlessly about the "Right OOM Killer (TM)", I did a small patch to allow replacing the OOM killer at runtime. You can even use modules, if you are careful (see khttpd on how to do this without refcouting). So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) PS: Patch is against test10-pre1. Thanks for listening Ingo Oeser --- linux-2.4.0-test10-pre1/mm/oom_kill.c Tue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c Tue Oct 10 16:59:27 2000 @@ -13,6 +13,8 @@ * machine) this file will double as a 'coding guide' and a signpost * for newbie kernel hackers. It features several pointers to major * kernel subsystems and hints as to where to find out what things do. + * + * Added oom_killer API for special needs - Ingo Oeser */ #include linux/mm.h @@ -136,7 +138,7 @@ } /** - * oom_kill - kill the "best" process when we run out of memory + * oom_kill_rik - kill the "best" process when we run out of memory * * If we run out of memory, we have the choice between either * killing a random task (bad), letting the system crash (worse) @@ -147,7 +149,9 @@ * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that * we select a process with CAP_SYS_RAW_IO set). */ -void oom_kill(void) + + +static void oom_kill_rik(void) { struct task_struct *p = select_bad_process(); @@ -207,4 +211,63 @@ /* Else... */ return 1; +} + +/* Protects oom_killer against resetting during its execution */ +static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED; + +static oom_killer_t oom_killer = oom_kill_rik; + +/** + * oom_kill - the oom_kill wrapper for installable OOM killers + * + * Wraper around the OOM killers, that can be installed via + * install_oom_killer and reset_default_oom_killer. + * + * This gets called from kswapd() in linux/mm/vmscan.c when we + * really run out of memory. + */ +void oom_kill(void) { + read_lock(oom_kill_lock); + oom_killer(); + read_unlock(oom_kill_lock); +} + +/** + * install_oom_killer - install alternate OOM killer + * @new_oom_kill: the alternate OOM killer provided by the caller + * + * Since the default OOM killer (oom_kill_rik) is not suitable + * for everyone, we provide an interface to install custom OOM killers. + * + * You can take the most appropriate action for your application if the + * kernel goes OOM. + * + * Providing an NULL argument just returns the current OOM killer. + * + * Returns: The OOM killer, which has been installed so far. + * + * NOTE: We don't do refcounting on OOM killers, so be careful with + * modules + */ +oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) { + oom_killer_t tmp; + write_lock(oom_kill_lock); + tmp=oom_killer; + if (new_oom_kill) + oom_killer=new_oom_kill; + write_unlock(oom_kill_lock); + return tmp; +} + +/** + * reset_default_oom_killer - reset back to default OOM killer + * + * If you are going to unload the module which provided + * your OOM killer, you can install the default one by this. + * + * Returns: The OOM killer, which has been installed so far. + */ +oom_killer_t reset_default_oom_killer(void) { + return install_oom_killer(oom_kill_rik); } --- linux-2.4.0-test10-pre1/include/linux/swap.hTue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.hTue Oct 10 16:44:22 2000 @@ -127,8 +127,14 @@ #define read_swap_cache(entry) read_swap_cache_async(entry, 1); /* linux/mm/oom_kill.c */ +typedef void (*oom_killer_t)(void); + extern int out_of_memory(void); extern void oom_kill(void); + +oom_killer_t install_oom_killer(oom_killer_t new_oom_kill); +oom_killer_t reset_default_oom_killer(void); + /* * Make these inline later once they are working properly. -- Feel the power of the penguin - run [EMAIL PROTECTED] esc:x - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, 10 Oct 2000, Philipp Rumpf wrote: On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote: On Tue, 10 Oct 2000, Philipp Rumpf wrote: The algorithm you posted on the list in this thread will kill init if on 4Mbyte machine without swap init is large 3 Mbytes and you execute a task that grows over 1M. This sounds suspiciously like the description of a DEAD system ;) But wouldn't a watchdog daemon which doesn't allocate any memory still get run ? Indeed, it would. It would also /prevent/ the system from automatically rebooting itself into a usable state ;) So it's not dead in the "oh, it'll be back in 30 seconds" sense. So our behaviour is broken (more so than random process killing). *nod* Not killing init when we "should" definately prevents embedded systems from auto-rebooting when they should do so. (OTOH, I don't think embedded systems will run into this OOM issue too much) You care about getting an automatic reboot. So you need to be sure the watchdog daemon gets killed first or you panic() after some time. echo 30 /proc/sys/kernel/panic that's what I said. we need to be sure to _get_ a panic() though. I believe the kernel automatically panic()s when init dies ... from kernel/exit.c::do_exit() if (tsk-pid == 1) panic("Attempted to kill init!"); [which will make our system auto-reboot and be back on its feet in a healty state again soon] regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Tue, 10 Oct 2000, Ingo Oeser wrote: before you argue endlessly about the "Right OOM Killer (TM)", I did a small patch to allow replacing the OOM killer at runtime. So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) This is definately a cool toy for people who have doubts that my OOM killer will do the wrong thing in their workloads. If anyone can demonstrate that the current OOM killer is doing the wrong thing and has a replacement algorithm available, please let us know ... ;) [lets move the discussion back to a less theoretical and more practical point of view] regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, Oct 10, 2000 at 12:30:51PM -0300, Rik van Riel wrote: Not killing init when we "should" definately prevents embedded systems from auto-rebooting when they should do so. (OTOH, I don't think embedded systems will run into this OOM issue too much) but when they do, they're hard to fix. Think about an elevator control system with a single process that happens to implement a somewhat broken version of the elevator algorithm ;) that's what I said. we need to be sure to _get_ a panic() though. I believe the kernel automatically panic()s when init dies ... from kernel/exit.c::do_exit() if (tsk-pid == 1) panic("Attempted to kill init!"); guess who added that code. We still kill init with SIGTERM which doesn't seem to work though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote: So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) This is definately a cool toy for people who have doubts that my OOM killer will do the wrong thing in their workloads. Thanks ;-) But I forgot to include my changes to the mm/Makefile (to export the API for modules). Here is a _working_ one: --- linux-2.4.0-test10-pre1/mm/oom_kill.c Tue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c Tue Oct 10 16:59:27 2000 @@ -13,6 +13,8 @@ * machine) this file will double as a 'coding guide' and a signpost * for newbie kernel hackers. It features several pointers to major * kernel subsystems and hints as to where to find out what things do. + * + * Added oom_killer API for special needs - Ingo Oeser */ #include linux/mm.h @@ -136,7 +138,7 @@ } /** - * oom_kill - kill the "best" process when we run out of memory + * oom_kill_rik - kill the "best" process when we run out of memory * * If we run out of memory, we have the choice between either * killing a random task (bad), letting the system crash (worse) @@ -147,7 +149,9 @@ * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that * we select a process with CAP_SYS_RAW_IO set). */ -void oom_kill(void) + + +static void oom_kill_rik(void) { struct task_struct *p = select_bad_process(); @@ -207,4 +211,63 @@ /* Else... */ return 1; +} + +/* Protects oom_killer against resetting during its execution */ +static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED; + +static oom_killer_t oom_killer = oom_kill_rik; + +/** + * oom_kill - the oom_kill wrapper for installable OOM killers + * + * Wraper around the OOM killers, that can be installed via + * install_oom_killer and reset_default_oom_killer. + * + * This gets called from kswapd() in linux/mm/vmscan.c when we + * really run out of memory. + */ +void oom_kill(void) { + read_lock(oom_kill_lock); + oom_killer(); + read_unlock(oom_kill_lock); +} + +/** + * install_oom_killer - install alternate OOM killer + * @new_oom_kill: the alternate OOM killer provided by the caller + * + * Since the default OOM killer (oom_kill_rik) is not suitable + * for everyone, we provide an interface to install custom OOM killers. + * + * You can take the most appropriate action for your application if the + * kernel goes OOM. + * + * Providing an NULL argument just returns the current OOM killer. + * + * Returns: The OOM killer, which has been installed so far. + * + * NOTE: We don't do refcounting on OOM killers, so be careful with + * modules + */ +oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) { + oom_killer_t tmp; + write_lock(oom_kill_lock); + tmp=oom_killer; + if (new_oom_kill) + oom_killer=new_oom_kill; + write_unlock(oom_kill_lock); + return tmp; +} + +/** + * reset_default_oom_killer - reset back to default OOM killer + * + * If you are going to unload the module which provided + * your OOM killer, you can install the default one by this. + * + * Returns: The OOM killer, which has been installed so far. + */ +oom_killer_t reset_default_oom_killer(void) { + return install_oom_killer(oom_kill_rik); } --- linux-2.4.0-test10-pre1/include/linux/swap.hTue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.hTue Oct 10 16:44:22 2000 @@ -127,8 +127,14 @@ #define read_swap_cache(entry) read_swap_cache_async(entry, 1); /* linux/mm/oom_kill.c */ +typedef void (*oom_killer_t)(void); + extern int out_of_memory(void); extern void oom_kill(void); + +oom_killer_t install_oom_killer(oom_killer_t new_oom_kill); +oom_killer_t reset_default_oom_killer(void); + /* * Make these inline later once they are working properly. --- linux-2.4.0-test10-pre1/mm/Makefile Tue Oct 10 16:31:08 2000 +++ linux-2.4.0-test10-pre1-ioe/mm/Makefile Tue Oct 10 16:34:06 2000 @@ -10,7 +10,8 @@ O_TARGET := mm.o O_OBJS := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \ vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \ - page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o + page_alloc.o swap_state.o swapfile.o numa.o +OX_OBJS := oom_kill.o ifeq ($(CONFIG_HIGHMEM),y) O_OBJS += highmem.o Regards Ingo Oeser -- Feel the power of the penguin - run [EMAIL PROTECTED] esc:x - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
On Tue, 10 Oct 2000, Rogier Wolff wrote: So if Netscape can "pump" 40 extra megabytes of memory out of X, this can be exploited. Now we're back to the point that a heuristic can never be right all the time.. I agree. In fact, we never left that. Nothing is perfect. In fact, a lot of engineering is _recognizing_ that you can never achieve "perfect", and you're much better off not even trying - and having a simple system that is "good enough". This is the old adage of "perfect is the enemy of good" - trying too hard is actually _detrimental_ in 99% of all cases. We should have simple heuristics that work most of the time, instead of trying to cajole a complex system like X to help us do some complicated resource management system. Complexity will just result in the OOM killer failing in surprising ways. A simple heuristic will mean that the OOM killer will still fail, but at least it won't be be in subtle and surprising ways. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 OOM handler
Olaf Titz wrote: Still, it would be nice to recover that 4 MB when the system doesn't have any memory left. Yup. The X server could give back the memory for some cases like the background without too much hackery. Then Linux only needs to implement SIGDANGER, which has been talked about for years... X would be a good candidate to implement a handler for it. Others are Emacs, Mozilla or JVMs - basically everything which has a GC of some sort. It could even be used to implement a configurable user mode OOM killer. It would be good to talk to the KDE and Gnome folks about this as well. I am pretty sure they have large blocks of memory that could be flushed or freed in a low-memory or OOM condition. Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote: On Tue, 10 Oct 2000, Ingo Oeser wrote: before you argue endlessly about the "Right OOM Killer (TM)", I did a small patch to allow replacing the OOM killer at runtime. So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) This is definately a cool toy for people who have doubts that my OOM killer will do the wrong thing in their workloads. I think this can be useful for more than just a cool toy. I think that the main thing that this discusion has shown is no OOM killer will please 100% of the people 100% of the time. I think we should try and have a good generic OOM killer that kills the right process most of the time. People can impliment (and submit) different-style OOM killers as needed. Or at least get 'em on freshmeat. :) -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Tue, 10 Oct 2000, Tom Rini wrote: On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote: On Tue, 10 Oct 2000, Ingo Oeser wrote: before you argue endlessly about the "Right OOM Killer (TM)", I did a small patch to allow replacing the OOM killer at runtime. So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) This is definately a cool toy for people who have doubts that my OOM killer will do the wrong thing in their workloads. I think this can be useful for more than just a cool toy. I think that the main thing that this discusion has shown is no OOM killer will please 100% of the people 100% of the time. I think we should try and have a good generic OOM killer that kills the right process most of the time. People can impliment (and submit) different-style OOM killers as needed. Indeed, though I suspect most of the people trying this would fall into the trap of over-engineering their OOM killer, after which it mostly becomes less predictable ;) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)
On Tue, Oct 10, 2000 at 05:58:46PM -0300, Rik van Riel wrote: On Tue, 10 Oct 2000, Tom Rini wrote: On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote: On Tue, 10 Oct 2000, Ingo Oeser wrote: before you argue endlessly about the "Right OOM Killer (TM)", I did a small patch to allow replacing the OOM killer at runtime. So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) This is definately a cool toy for people who have doubts that my OOM killer will do the wrong thing in their workloads. I think this can be useful for more than just a cool toy. I think that the main thing that this discusion has shown is no OOM killer will please 100% of the people 100% of the time. I think we should try and have a good generic OOM killer that kills the right process most of the time. People can impliment (and submit) different-style OOM killers as needed. Indeed, though I suspect most of the people trying this would fall into the trap of over-engineering their OOM killer, after which it mostly becomes less predictable ;) I was thinking more along the lines of ones w/ "safety" features that not everyone might like/need (ie /usr/local/bin/foo is always good, those sugjestions). It seems like useful functionality at little/no cost. And a neat toy for now. :) -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Andreas Dilger wrote: > Albert D. Cahalan wrote: > > X, and any other big friendly processes, could participate in > > memory balancing operations. X could be made to clean out a > > Gerrit Huizenga wrote: > > Anyway, there is/was an API in PTX to say (either from in-kernel or through > > some user machinations) "I Am a System Process". Turns on a bit in the > > On AIX there is a signal called SIGDANGER, which is basically what you > are looking for. By default it is ignored, but for processes that care > (e.g. init, X, whatever) they can register a SIGDANGER handler. At an > "urgent" (as oposed to "critical") OOM situation, all processes get a > SIGDANGER sent to them. Most will ignore it, but ones with handlers > can free caches, try to do a clean shutdown, whatever. Any process with > a SIGDANGER handler get a reduction of "badness" (as the OOM killer calls > it) when looking for processes to kill. > > Having a SIGDANGER handler is good for 2 reasons: > 1) Lets processes know when memory is short so they can free needless cache. > 2) Mark process with a SIGDANGER handler as "more important" than those >without. Most people won't care about this, but init, and X, and >long-running simulations might. Is there any reason why we can't do something like this for 2.5? -d -- "There is a natural aristocracy among men. The grounds of this are virtue and talents", Thomas Jefferson [1742-1826], 3rd US President - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> Rik van Riel wrote: > > > How about SIGTERM a bit before SIGKILL then re-evaluate the OOM > > > N usecs later? > > > > And run the risk of having to kill /another/ process as well ? > > > > I really don't know if that would be a wise thing to do > > (but feel free to do some tests to see if your idea would > > work ... I'd love to hear some test results with your idea). David Ford writes: > I was thinking (dangerous) about an urgent v.s. critical OOM. urgent could > trigger a SIGTERM which would give advance notice to the offending process. > I don't think we have a signal method of notifying processes when resources > are critically low, feel free to correct me. > > Is there a signal that -might- be used for this? Albert D. Cahalan wrote: > X, and any other big friendly processes, could participate in > memory balancing operations. X could be made to clean out a > font cache when the kernel signals that memory is low. When > the situation becomes serious, X could just mmap /dev/zero over > top of the background image. > > Netscape could even be hacked to dump old junk... or if it is > just too leaky, it could exec itself to fix the problem. Gerrit Huizenga wrote: > Anyway, there is/was an API in PTX to say (either from in-kernel or through > some user machinations) "I Am a System Process". Turns on a bit in the > proc struct (task struct) that made it exempt from death from a variety > of sources, e.g. OOM, generic user signals, portions of system shutdown, > etc. > > Then, the code looking for things to kill simply skips those that are > intelligently marked, taking most of the decision making/policy making > out of the scheduler/memory manager. On AIX there is a signal called SIGDANGER, which is basically what you are looking for. By default it is ignored, but for processes that care (e.g. init, X, whatever) they can register a SIGDANGER handler. At an "urgent" (as oposed to "critical") OOM situation, all processes get a SIGDANGER sent to them. Most will ignore it, but ones with handlers can free caches, try to do a clean shutdown, whatever. Any process with a SIGDANGER handler get a reduction of "badness" (as the OOM killer calls it) when looking for processes to kill. Having a SIGDANGER handler is good for 2 reasons: 1) Lets processes know when memory is short so they can free needless cache. 2) Mark process with a SIGDANGER handler as "more important" than those without. Most people won't care about this, but init, and X, and long-running simulations might. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> If init dies the kernel hangs solid anyway Init should never die. If we get to do_exit in init we'll panic which is the right thing to do (reboot on critical systems). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> (but I'd be curious if somebody actually manages to > trick the OOM killer into killing init ... please > test a bit more to see if this really happens ;)) In a non-real-world situation, yes. (mem=3500k, many drivers, init=/bin/bash, tried to enter a command). Since the process in question (bash) ignores SIGTERM, I actually got a hard hang. We really should turn this into a panic() (panic means your elevator control system reboots and maybe misses the right floor. hard hang means you need to reboot manually). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
"Albert D. Cahalan" <[EMAIL PROTECTED]> writes: > Date: Mon, 9 Oct 2000 19:13:25 -0400 (EDT) > > >> From: Linus Torvalds <[EMAIL PROTECTED]> > > >> One of the biggest bitmaps is the background bitmap. So you have a > >> client that uploads it to X and then goes away. There's nobody to > >> un-count to by the time X decides to switch to another background. > > > > Actually, the big offenders are things other than the background > > bitmap: things like E do absolutely insane things, you would not > > believe (or maybe you would). The background pixmap is generally > > in the worst case typically no worse than 4 megabytes (for those > > people who are crazy enough to put images up as their root window > > on 32 bit deep displays, at 1kX1k resolution). > > Still, it would be nice to recover that 4 MB when the system > doesn't have any memory left. > Yup. The X server could give back the memory for some cases like the background without too much hackery. > X, and any other big friendly processes, could participate in > memory balancing operations. X could be made to clean out a > font cache when the kernel signals that memory is low. When > the situation becomes serious, X could just mmap /dev/zero over > top of the background image. I agree in principle, though the problem is difficult, as the memory pool may get fragmented... Most memory usage is less monolithic than the background pixmap. And maintaining separate memory pools often wastes more memory than it saves. > > Netscape could even be hacked to dump old junk... or if it is > just too leaky, it could exec itself to fix the problem. Netscape 4.x is hopeless; it is leakier than the Titanic. There is hope for Mozilla. - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote: > > If the oom killer kills a thing like init by mistake > That only happens in the "random" OOM killer 2.2 has ... [OOM killer war] Hi there, before you argue endlessly about the "Right OOM Killer (TM)", I did a small patch to allow replacing the OOM killer at runtime. You can even use modules, if you are careful (see khttpd on how to do this without refcouting). So now you can stop arguing about the one and only OOM killer, implement it, provide it as module and get back to the important stuff ;-) PS: Patch is against test9 with Rik's latest vmpatch applied. Thanks for listening Ingo Oeser diff -Naur linux-2.4.0-test9-vmpatch/include/linux/swap.h linux-2.4.0-test9-vmpatch-ioe/include/linux/swap.h --- linux-2.4.0-test9-vmpatch/include/linux/swap.h Sun Oct 8 00:49:17 2000 +++ linux-2.4.0-test9-vmpatch-ioe/include/linux/swap.h Tue Oct 10 00:50:17 2000 @@ -129,6 +129,9 @@ /* linux/mm/oom_kill.c */ extern int out_of_memory(void); extern void oom_kill(void); +void install_oom_killer(void (*new_oom_kill)(void)); +void reset_default_oom_killer(void); + /* * Make these inline later once they are working properly. diff -Naur linux-2.4.0-test9-vmpatch/mm/Makefile linux-2.4.0-test9-vmpatch-ioe/mm/Makefile --- linux-2.4.0-test9-vmpatch/mm/Makefile Sun Oct 8 00:49:17 2000 +++ linux-2.4.0-test9-vmpatch-ioe/mm/Makefile Tue Oct 10 00:10:07 2000 @@ -10,7 +10,8 @@ O_TARGET := mm.o O_OBJS := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \ vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \ - page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o + page_alloc.o swap_state.o swapfile.o numa.o +OX_OBJS := oom_kill.o ifeq ($(CONFIG_HIGHMEM),y) O_OBJS += highmem.o diff -Naur linux-2.4.0-test9-vmpatch/mm/oom_kill.c linux-2.4.0-test9-vmpatch-ioe/mm/oom_kill.c --- linux-2.4.0-test9-vmpatch/mm/oom_kill.c Sun Oct 8 00:49:17 2000 +++ linux-2.4.0-test9-vmpatch-ioe/mm/oom_kill.c Tue Oct 10 00:35:32 2000 @@ -13,6 +13,8 @@ * machine) this file will double as a 'coding guide' and a signpost * for newbie kernel hackers. It features several pointers to major * kernel subsystems and hints as to where to find out what things do. + * + * Added oom_killer API for special needs - Ingo Oeser */ #include @@ -147,7 +149,9 @@ * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that * we select a process with CAP_SYS_RAW_IO set). */ -void oom_kill(void) + + +static void oom_kill_rik(void) { struct task_struct *p = select_bad_process(); @@ -207,4 +211,26 @@ /* Else... */ return 1; +} + +/* Protects oom_killer against resetting during its execution */ +static rwlock_t oom_kill_lock; + +static void (*oom_killer)(void)=oom_kill_rik; + +void oom_kill(void) { + read_lock(_kill_lock); + oom_killer(); + read_unlock(_kill_lock); +} + +void install_oom_killer(void (*new_oom_kill)(void)) { + if (!new_oom_kill) return; + write_lock(_kill_lock); + oom_killer=new_oom_kill; + write_unlock(_kill_lock); +} + +void reset_default_oom_killer(void) { + install_oom_killer(_kill_rik); } -- Feel the power of the penguin - run [EMAIL PROTECTED] :x - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Albert D. Cahalan wrote: > Jim Gettys writes: > >> From: Linus Torvalds <[EMAIL PROTECTED]> > > >> One of the biggest bitmaps is the background bitmap. So you have a > >> client that uploads it to X and then goes away. There's nobody to > >> un-count to by the time X decides to switch to another background. > > > > Actually, the big offenders are things other than the background > > bitmap: things like E do absolutely insane things, you would not > > believe (or maybe you would). The background pixmap is generally > > in the worst case typically no worse than 4 megabytes (for those > > people who are crazy enough to put images up as their root window > > on 32 bit deep displays, at 1kX1k resolution). > > Still, it would be nice to recover that 4 MB when the system > doesn't have any memory left. > > X, and any other big friendly processes, could participate in > memory balancing operations. X could be made to clean out a > font cache when the kernel signals that memory is low. When > the situation becomes serious, X could just mmap /dev/zero over > top of the background image. > > Netscape could even be hacked to dump old junk... or if it is > just too leaky, it could exec itself to fix the problem. Which is all good and well to DELAY the task of the OOM killer for a few more minutes. But in the end, there will be a point where you REALLY run out of memory and you have no other choice than the OOM killer... (not that I'm against alternative measures, I just think they're orthagonal to this whole discussion) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
Jim Gettys writes: >> From: Linus Torvalds <[EMAIL PROTECTED]> >> One of the biggest bitmaps is the background bitmap. So you have a >> client that uploads it to X and then goes away. There's nobody to >> un-count to by the time X decides to switch to another background. > > Actually, the big offenders are things other than the background > bitmap: things like E do absolutely insane things, you would not > believe (or maybe you would). The background pixmap is generally > in the worst case typically no worse than 4 megabytes (for those > people who are crazy enough to put images up as their root window > on 32 bit deep displays, at 1kX1k resolution). Still, it would be nice to recover that 4 MB when the system doesn't have any memory left. X, and any other big friendly processes, could participate in memory balancing operations. X could be made to clean out a font cache when the kernel signals that memory is low. When the situation becomes serious, X could just mmap /dev/zero over top of the background image. Netscape could even be hacked to dump old junk... or if it is just too leaky, it could exec itself to fix the problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Tue, 10 Oct 2000, bert hubert wrote: > On Mon, Oct 09, 2000 at 02:38:10PM -0700, Linus Torvalds wrote: > > > So the process that gave X the bitmap dies. What now? Are we going to > > depend on X un-counting the resources? > > > > I'd prefer just X having a higher "mm nice level" or something. > > I wonder how many megabytes we can fill with all messages about > an OOM killer. I remember threads about this from '94 onwards. > Perhaps we can finally have a sane one now :-) In reality, the OOM killer I mailed a few days ago behaves quite well in the real world. I hope Linus will be as sensitive to theoretical arguments with no foundation in reality as I am (ie. not), so we'll have SOMETHING in the kernel soon. If we later find out there are some problems with the OOM killer, we can always change it then. No need to hold up a reasonable solution when the current kernel has NO solution to the problem at all ... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Byron Stanoszek wrote: > On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote: > > > Anyway, there is/was an API in PTX to say (either from in-kernel or through > > some user machinations) "I Am a System Process". Turns on a bit in the > > proc struct (task struct) that made it exempt from death from a variety > > of sources, e.g. OOM, generic user signals, portions of system shutdown, > > etc. > > The current OOM killer does this, except for init. Checking to > see if the process has a page table is equivalent to checking > for the kernel threads that are integral to the system (PIDs > 2-5). These will never be killed by the OOM. Init, however, > still can be killed, and there should be an additional statement > that doesn't kill if PID == 1. Only if you can demonstrate any real-world scenario where init will be chosen with the current algorithm. The "3 MB init on 4MB machine" kind of theoretical argument just isn't convincing if nobody can show that there is a problem in reality. > I think we need to sit down and write a better OOM proposal, > something that doesn't use CPU time and the NICE flag. The nice flag has been removed from my current kernel tree. The CPU time used, however, is a different matter. You really don't want to have the OOM killer kill your 6-week-old running simulation because a newly started netscape explodes ... > How about we start by everyone in this discussion give their > opinion on what the OOM selection process should do, Quoting from mm/oom_kill.c: /** * oom_badness - calculate a numeric value for how bad this task has been * @p: task struct of which task we should calculate * * The formula used is relatively simple and documented inline in the * function. The main rationale is that we want to select a good task * to kill when we run out of memory. * * Good in this context means that: * 1) we lose the minimum amount of work done * 2) we recover a large amount of memory * 3) we don't kill anything innocent of eating tons of memory * 4) we want to kill the minimum amount of processes (one) * 5) we try to kill the process the user expects us to kill, this *algorithm has been meticulously tuned to meet the priniciple *of least surprise ... (be careful when you change it) */ Do you have any additional requirements? regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 02:38:10PM -0700, Linus Torvalds wrote: > So the process that gave X the bitmap dies. What now? Are we going to > depend on X un-counting the resources? > > I'd prefer just X having a higher "mm nice level" or something. I wonder how many megabytes we can fill with all messages about an OOM killer. I remember threads about this from '94 onwards. Perhaps we can finally have a sane one now :-) Regards, bert hubert -- PowerDNS Versatile DNS Services Trilab The Technology People 'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote: > Anyway, there is/was an API in PTX to say (either from in-kernel or through > some user machinations) "I Am a System Process". Turns on a bit in the > proc struct (task struct) that made it exempt from death from a variety > of sources, e.g. OOM, generic user signals, portions of system shutdown, > etc. The current OOM killer does this, except for init. Checking to see if the process has a page table is equivalent to checking for the kernel threads that are integral to the system (PIDs 2-5). These will never be killed by the OOM. Init, however, still can be killed, and there should be an additional statement that doesn't kill if PID == 1. I think we need to sit down and write a better OOM proposal, something that doesn't use CPU time and the NICE flag. Lets concentrate our efforts on what constitutes a good selection method instead of bickering with each other. How about we start by everyone in this discussion give their opinion on what the OOM selection process should do, listing them in both order of importance and severity, giving a rational reason for each choice. Maybe then we can get somewhere. -Byron -- Byron Stanoszek Ph: (330) 644-3059 Systems Programmer Fax: (330) 644-8110 Commercial Timesharing Inc. Email: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
At Sequent, we found that there are a small set of processes which are "critical" to the system's operation in that they should not be killed on swap shortage, memory shortage, etc. This included things like init, potentially inetd, the swapper, page daemon, clusters heartbeat daemon, and generally any core system service which had a user process component. If there wasn't enough memory for those processes, or if those processes weren't already responsible in their use of memory/resources, you were already toast. Anyway, there is/was an API in PTX to say (either from in-kernel or through some user machinations) "I Am a System Process". Turns on a bit in the proc struct (task struct) that made it exempt from death from a variety of sources, e.g. OOM, generic user signals, portions of system shutdown, etc. Then, the code looking for things to kill simply skips those that are intelligently marked, taking most of the decision making/policy making out of the scheduler/memory manager. gerrit > On Mon, 9 Oct 2000, Linus Torvalds wrote: > > On Mon, 9 Oct 2000, Andi Kleen wrote: > > > > > > netscape usually has child processes: the dns helper. > > > > Yeah. > > > > One thing we _can_ (and probably should do) is to do a per-user > > memory pressure thing - we have easy access to the "struct > > user_struct" (every process has a direct pointer to it), and it > > should not be too bad to maintain a per-user "VM pressure" > > counter. > > > > Then, instead of trying to use heuristics like "does this > > process have children" etc, you'd have things like "is this user > > a nasty user", which is a much more valid thing to do and can be > > used to find people who fork tons of processes that are > > mid-sized but use a lot of memory due to just being many.. > > Sure we could do all of this, but does OOM really happen that > often that we want to make the algorithm this complex ? > > The current algorithm seems to work quite well and is already > at the limit of how complex I'd like to see it. Having a less > complex OOM killer turned out to not work very well, but having > a more complex one is - IMHO - probably overkill ... > > regards, > > Rik > -- > "What you're running that piece of shit Gnome?!?!" >-- Miguel de Icaza, UKUUG 2000 > > http://www.conectiva.com/ http://www.surriel.com/ > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to [EMAIL PROTECTED] For more info on Linux MM, > see: http://www.linux.eu.org/Linux-MM/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> From: Linus Torvalds <[EMAIL PROTECTED]> > Date: Mon, 9 Oct 2000 14:50:51 -0700 (PDT) > To: Jim Gettys <[EMAIL PROTECTED]> > Cc: Alan Cox <[EMAIL PROTECTED]>, Andi Kleen <[EMAIL PROTECTED]>, > Ingo Molnar <[EMAIL PROTECTED]>, Andrea Arcangeli <[EMAIL PROTECTED]>, > Rik van Riel <[EMAIL PROTECTED]>, > Byron Stanoszek <[EMAIL PROTECTED]>, > MM mailing list <[EMAIL PROTECTED]>, [EMAIL PROTECTED] > Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler > - > On Mon, 9 Oct 2000, Jim Gettys wrote: > > > > > > On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds > <[EMAIL PROTECTED]> > > said: > > > > > > > > The problem is that there is no way to keep track of them afterwards. > > > > > > So the process that gave X the bitmap dies. What now? Are we going to > > > depend on X un-counting the resources? > > > > > > > X has to uncount the resources already, to free the memory in the X server > > allocated on behalf of that client. X has to get this right, to be a long > > lived server (properly debugged X servers last many months without problems: > > unfortunately, a fair number of DDX's are buggy). > > No, but my point is that it doesn't really work. > > One of the biggest bitmaps is the background bitmap. So you have a client > that uploads it to X and then goes away. There's nobody to un-count to by > the time X decides to switch to another background. Actually, the big offenders are things other than the background bitmap: things like E do absolutely insane things, you would not believe (or maybe you would). The background pixmap is generally in the worst case typically no worse than 4 megabytes (for those people who are crazy enough to put images up as their root window on 32 bit deep displays, at 1kX1k resolution). > > Does that memory just disappear as far as the resource handling is > concerned when the client that originated it dies? No, X recovers the memory when a connection dies, unless the client has gone out of its way to arrange to preserve things across connection termination. Few, if any clients do this: it is primarily possible mostly for debugging purposes, that (fortunately, or unfortunately, depending on your opinion) what happens not just vanish before you can see what happened. So the X server does extensive bookkeeping of its memory usage, and retrieves all memory used by clients when they terminate (with the above rare exception). > > What happens with TCP connections? They might be local. Or they might not. > In either case X doesn't know whom to blame. At least on BSD kernels, it was reasonably straightforward to determine if a TCP connection was local: in that case, the code actually did an upcall and delivered data directly to the appropriate socket. Dunno about the insides of Linux. I suspect it should not be hard to find the right process for local connections. Distant connections are, indeed, a challenge. > > Basically, the only thing _I_ think X can do is to really say "oh, please > don't count my memory, because everything I do I do for my clients, not > for myself". > > THAT is my argument. Basically there is nothing we can reliably account. Your argument has alot of validity, though the X server does a better job of accounting than you might think. BUT, I'm actually more interested in dealing with scheduling preferences, to get really first rate interactive feel. > > So we might as well fall back on just saying "X is more important than > some random client", and have a mm niceness level. Which right now is > obviously approximated by the IO capabilities tests etc. > As I say above, the principle here may be more useful than for the memory example, but for controlling scheduling so we can get great interactive feel. THAT is what is really worth discussing. - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Aaron Sethman wrote: > I think the run time should probably be accounted into to this > as well. Basically start knocking off recent processes first, > which are likely to be childless, and start working your way up > in age. I'm almost getting USENET flashbacks ... ;) Please look at the code before suggesting something that is already there (and has been in the code for some 2 years). regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> > across AF_UNIX sockets so the mechanism is notionally there to provide the > > credentials to X, just not to use them > > The problem is that there is no way to keep track of them afterwards. If you use mmap for your allocator then beancounter will get it right. Every resource knows which beancounter it was charged too. It adds an overhead the average desktop user won't like but which is pretty much essential to do real mainframe world operation. So it would become seteuid(Client->passed_euid); mmap(buffer in pages) seteuid(getuid()); With lightwait counting semantics its hard to make any tracking system work well in the corner cases like resources that survive process death. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, James Sutherland wrote: > On Mon, 9 Oct 2000, Ingo Molnar wrote: > > > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > > > > so dns helper is killed first, then netscape. (my idea might not > > > > make sense though.) > > > > > > It makes some sense, but I don't think OOM is something that > > > occurs often enough to care about it /that/ much... > > > > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case, > > with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I > > think it's a legitimate concern - i cannot know in advance whether a > > freshly started process would trigger an OOM or not. > > Shouldn't the runtime factor handle this, making sure the new process is > killed? (Maybe not if you're almost OOM right from the word go, and run > this process straight off... Hrm.) I think the run time should probably be accounted into to this as well. Basically start knocking off recent processes first, which are likely to be childless, and start working your way up in age. The reasoning here is that your less likely an important, long running service. Of course you could probably account for whether the process is childless or not as well. Just my $0.02 on it.. Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Jim Gettys wrote: > > > On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds ><[EMAIL PROTECTED]> > said: > > > > > The problem is that there is no way to keep track of them afterwards. > > > > So the process that gave X the bitmap dies. What now? Are we going to > > depend on X un-counting the resources? > > > > X has to uncount the resources already, to free the memory in the X server > allocated on behalf of that client. X has to get this right, to be a long > lived server (properly debugged X servers last many months without problems: > unfortunately, a fair number of DDX's are buggy). No, but my point is that it doesn't really work. One of the biggest bitmaps is the background bitmap. So you have a client that uploads it to X and then goes away. There's nobody to un-count to by the time X decides to switch to another background. Does that memory just disappear as far as the resource handling is concerned when the client that originated it dies? What happens with TCP connections? They might be local. Or they might not. In either case X doesn't know whom to blame. Basically, the only thing _I_ think X can do is to really say "oh, please don't count my memory, because everything I do I do for my clients, not for myself". THAT is my argument. Basically there is nothing we can reliably account. So we might as well fall back on just saying "X is more important than some random client", and have a mm niceness level. Which right now is obviously approximated by the IO capabilities tests etc. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds <[EMAIL PROTECTED]> said: > > The problem is that there is no way to keep track of them afterwards. > > So the process that gave X the bitmap dies. What now? Are we going to > depend on X un-counting the resources? > X has to uncount the resources already, to free the memory in the X server allocated on behalf of that client. X has to get this right, to be a long lived server (properly debugged X servers last many months without problems: unfortunately, a fair number of DDX's are buggy). - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Rik van Riel wrote: > > > I'd prefer just X having a higher "mm nice level" or something. > > Which it has, because: > > 1) CAP_RAW_IO > 2) p->euid == 0 Oh, I agree, but we might want to generalize this a bit so that root could say "this process is important" and then drop root privileges and still get "credited" for the fact that it's important. It's not a big deal. It works for X right now. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> > Sounds like one needs in addition some mechanism for servers to "charge" > clients for > > consumption. X certainly knows on behalf of which connection resources > > are created; the OS could then transfer this back to the appropriate client > > (at least when on machine). > > Definitely - and this is present in some non Unix OS's. We do pass credentials > across AF_UNIX sockets so the mechanism is notionally there to provide the > credentials to X, just not to use them Stephen Tweedie, Dave Rosenthal, Keith Packard and myself had an extensive discussion on similar ideas around process quantum scheduling (the X server would like to be able to forward quantum to clients) as well at Usenix. This is closely related, and needed to finally fully control interactive feel in the face of "greedy" clients. My memory is that it sounded like things could become very interesting with such a facility, and might be ripe for 2.5. Keith, Stephen, Dave, do you remember the details of our discussion? - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Linus Torvalds wrote: > On Mon, 9 Oct 2000, Alan Cox wrote: > > > consumption. X certainly knows on behalf of which connection resources > > > are created; the OS could then transfer this back to the appropriate client > > > (at least when on machine). > > > > Definitely - and this is present in some non Unix OS's. We do pass credentials > > across AF_UNIX sockets so the mechanism is notionally there to provide the > > credentials to X, just not to use them > > The problem is that there is no way to keep track of them afterwards. > > So the process that gave X the bitmap dies. What now? Are we going to > depend on X un-counting the resources? > > I'd prefer just X having a higher "mm nice level" or something. Which it has, because: 1) CAP_RAW_IO 2) p->euid == 0 regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Alan Cox wrote: > > consumption. X certainly knows on behalf of which connection resources > > are created; the OS could then transfer this back to the appropriate client > > (at least when on machine). > > Definitely - and this is present in some non Unix OS's. We do pass credentials > across AF_UNIX sockets so the mechanism is notionally there to provide the > credentials to X, just not to use them The problem is that there is no way to keep track of them afterwards. So the process that gave X the bitmap dies. What now? Are we going to depend on X un-counting the resources? I'd prefer just X having a higher "mm nice level" or something. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > On Mon, 9 Oct 2000, Rik van Riel wrote: > > > Would this complexity /really/ be worth it for the twice-yearly OOM > > situation? > > the only reason i suggested this was the init=/bin/bash, 4MB > RAM, no swap emergency-bootup case. We must not kill init in > that case - if the current code doesnt then great and none of > this is needed. I guess this requires some testing. If anybody can reproduce the bad effects without going /too/ much out of the way of a realistic scenario, the code needs to be fixed. If it turns out to be a non-issue in all scenarios, there's no need to make the code any more complex. regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, Oct 09, 2000 at 10:28:38PM +0100, Alan Cox wrote: > > Sounds like one needs in addition some mechanism for servers to "charge" clients >for > > consumption. X certainly knows on behalf of which connection resources > > are created; the OS could then transfer this back to the appropriate client > > (at least when on machine). > > Definitely - and this is present in some non Unix OS's. We do pass credentials > across AF_UNIX sockets so the mechanism is notionally there to provide the > credentials to X, just not to use them X can get the pid using SO_PEERCRED for unix connections. When the oom killer maintains some kind of badness value in the task_struct it would be possible to add a charge() systemcall that manipulates it. int charge(pid_t pid, int memorytobecharged) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, David Ford wrote: > Not if "init" is a particular program running on a router floppy for > example. The system may be designed to be a router and the userland > monitor/control program is the only thing that runs and consumes 90% of the > memory. If a forked or spawned process starts up with high CPU that just > tips it over the OOM edge, we don't really want to kill init even if it's > taking "all" the memory and or "all" the cpu. this is such a special case it is not worth considering - rather leave it up to the designer of the router floppy to get his stuff right. the one thing that is clear from the many OOM flamewars is that no OOM reaper algorithm will satisfy 100% of conditions 100% of the time. So all Rik can do is optimise for the common case. (roll on beancounting and proper resource limiting - the true but heavyweight solution) regards, -- Paul Jakma [EMAIL PROTECTED] PGP5 key: http://www.clubi.ie/jakma/publickey.txt --- Fortune: Individualists unite! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Rik van Riel wrote: > Would this complexity /really/ be worth it for the twice-yearly OOM > situation? the only reason i suggested this was the init=/bin/bash, 4MB RAM, no swap emergency-bootup case. We must not kill init in that case - if the current code doesnt then great and none of this is needed. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> Sounds like one needs in addition some mechanism for servers to "charge" clients for > consumption. X certainly knows on behalf of which connection resources > are created; the OS could then transfer this back to the appropriate client > (at least when on machine). Definitely - and this is present in some non Unix OS's. We do pass credentials across AF_UNIX sockets so the mechanism is notionally there to provide the credentials to X, just not to use them - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
On Mon, 9 Oct 2000, Ingo Molnar wrote: > On Mon, 9 Oct 2000, Alan Cox wrote: > > > Lets kill a 6 week long typical background compute job because > > netscape exploded (and yes netscape has a child process) > > in the paragraph you didnt quote i pointed this out and > suggested adding all parent's badness value to children as well > - so we'd end up killing netscape. Would this complexity /really/ be worth it for the twice-yearly OOM situation? Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> Sender: [EMAIL PROTECTED] > From: "Andi Kleen" <[EMAIL PROTECTED]> > Date: Mon, 9 Oct 2000 22:58:22 +0200 > To: Linus Torvalds <[EMAIL PROTECTED]> > Cc: Andi Kleen <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL PROTECTED]>, > Andrea Arcangeli <[EMAIL PROTECTED]>, > Rik van Riel <[EMAIL PROTECTED]>, > Byron Stanoszek <[EMAIL PROTECTED]>, > MM mailing list <[EMAIL PROTECTED]>, [EMAIL PROTECTED] > Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler > - > On Mon, Oct 09, 2000 at 01:52:21PM -0700, Linus Torvalds wrote: > > One thing we _can_ (and probably should do) is to do a per-user memory > > pressure thing - we have easy access to the "struct user_struct" (every > > process has a direct pointer to it), and it should not be too bad to > > maintain a per-user "VM pressure" counter. > > > > Then, instead of trying to use heuristics like "does this process have > > children" etc, you'd have things like "is this user a nasty user", which > > is a much more valid thing to do and can be used to find people who fork > > tons of processes that are mid-sized but use a lot of memory due to just > > being many.. > > Would not help much when "they" eat your memory by loading big bitmaps > into the X server which runs as root (it seems there are many programs > which are very good at this particular DOS ;) > This is generic to any server program, not unique to X. Sounds like one needs in addition some mechanism for servers to "charge" clients for consumption. X certainly knows on behalf of which connection resources are created; the OS could then transfer this back to the appropriate client (at least when on machine). - Jim -- Jim Gettys Technology and Corporate Development Compaq Computer Corporation [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/