Re: OOM Test Case - Failed!
On Sat, 21 Oct 2000, Rik van Riel wrote: > > The oom killer avoided killing your busy, large, root-owned > > process. Don't run gcc compiles as root. Protecting root > > processes is an explicit design goal here. > > Also: > > 1) his system pretty much continued to run > 2) since only httpd children got killed, no work >was lost The system ran, but nothing moved. No process was able to do any activity, because they were all waiting on swapped out space or waiting to use more as-of-yet unallocated virtual memory. I could verify this because one of my daemons writes one line to disk every 5 minutes. That stopped completely during this event. > (only the fact that he ran genattrtab as root screwed > up things a bit and kept the system from killing the > task -- but probably only just) If I would have known, I would have done otherwise. -Byron -- Byron Stanoszek Ph: (330) 644-3059 Systems Programmer Fax: (330) 644-8110 Commercial Timesharing Inc. Email: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: OOM Test Case - Failed!
On Wed, 18 Oct 2000, Stephen Tweedie wrote: > On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote: > > > I am very unimpressed with the current OOM killer. After 10 days of online > > time, I decided to try compiling gcc again, the very culprit that killed my > > last system using 2.4.0-test8 Friday night (to which I was unable to reset > > the system until Monday morning). > > > > root 1099 63.6 61.5 71424 18740 pts/0 R09:39 1:22 ./genattrtab > > The oom killer avoided killing your busy, large, root-owned > process. Don't run gcc compiles as root. Protecting root > processes is an explicit design goal here. Also: 1) his system pretty much continued to run 2) since only httpd children got killed, no work was lost (only the fact that he ran genattrtab as root screwed up things a bit and kept the system from killing the task -- but probably only just) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: OOM Test Case - Failed!
On Wed, 18 Oct 2000, Stephen Tweedie wrote: On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote: I am very unimpressed with the current OOM killer. After 10 days of online time, I decided to try compiling gcc again, the very culprit that killed my last system using 2.4.0-test8 Friday night (to which I was unable to reset the system until Monday morning). root 1099 63.6 61.5 71424 18740 pts/0 R09:39 1:22 ./genattrtab The oom killer avoided killing your busy, large, root-owned process. Don't run gcc compiles as root. Protecting root processes is an explicit design goal here. Also: 1) his system pretty much continued to run 2) since only httpd children got killed, no work was lost (only the fact that he ran genattrtab as root screwed up things a bit and kept the system from killing the task -- but probably only just) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: OOM Test Case - Failed!
On Sat, 21 Oct 2000, Rik van Riel wrote: The oom killer avoided killing your busy, large, root-owned process. Don't run gcc compiles as root. Protecting root processes is an explicit design goal here. Also: 1) his system pretty much continued to run 2) since only httpd children got killed, no work was lost The system ran, but nothing moved. No process was able to do any activity, because they were all waiting on swapped out space or waiting to use more as-of-yet unallocated virtual memory. I could verify this because one of my daemons writes one line to disk every 5 minutes. That stopped completely during this event. (only the fact that he ran genattrtab as root screwed up things a bit and kept the system from killing the task -- but probably only just) If I would have known, I would have done otherwise. -Byron -- Byron Stanoszek Ph: (330) 644-3059 Systems Programmer Fax: (330) 644-8110 Commercial Timesharing Inc. Email: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: OOM Test Case - Failed!
Hi, On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote: > I am very unimpressed with the current OOM killer. After 10 days of online > time, I decided to try compiling gcc again, the very culprit that killed my > last system using 2.4.0-test8 Friday night (to which I was unable to reset > the system until Monday morning). > > root 1099 63.6 61.5 71424 18740 pts/0 R09:39 1:22 ./genattrtab The oom killer avoided killing your busy, large, root-owned process. Don't run gcc compiles as root. Protecting root processes is an explicit design goal here. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: OOM Test Case - Failed!
On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote: > I am very unimpressed with the current OOM killer. [...] > We need to decide on a better algorithm, > albeit simple, that will alleviate this problem before 2.4.0 final comes out. We don't need to decide on one, you can provide and install your own, if your apply my oom-killer-api-patch. It's at: http://www.tu-chemnitz.de/~ioe/oom_kill_api.patch PS: Removed Linus from CC, because every change of MM has to be approved by Rik first. Added linux-mm, because it's an MM issue. PPS: We had an controversal discussion at linux-mm about this last week. So look into the archives. Regards Ingo Oeser -- Feel the power of the penguin - run [EMAIL PROTECTED] :x - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: OOM Test Case - Failed!
On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote: I am very unimpressed with the current OOM killer. [...] We need to decide on a better algorithm, albeit simple, that will alleviate this problem before 2.4.0 final comes out. We don't need to decide on one, you can provide and install your own, if your apply my oom-killer-api-patch. It's at: http://www.tu-chemnitz.de/~ioe/oom_kill_api.patch PS: Removed Linus from CC, because every change of MM has to be approved by Rik first. Added linux-mm, because it's an MM issue. PPS: We had an controversal discussion at linux-mm about this last week. So look into the archives. Regards Ingo Oeser -- Feel the power of the penguin - run [EMAIL PROTECTED] esc:x - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: OOM Test Case - Failed!
Hi, On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote: I am very unimpressed with the current OOM killer. After 10 days of online time, I decided to try compiling gcc again, the very culprit that killed my last system using 2.4.0-test8 Friday night (to which I was unable to reset the system until Monday morning). root 1099 63.6 61.5 71424 18740 pts/0 R09:39 1:22 ./genattrtab The oom killer avoided killing your busy, large, root-owned process. Don't run gcc compiles as root. Protecting root processes is an explicit design goal here. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
OOM Test Case - Failed!
I am very unimpressed with the current OOM killer. After 10 days of online time, I decided to try compiling gcc again, the very culprit that killed my last system using 2.4.0-test8 Friday night (to which I was unable to reset the system until Monday morning). GCC started compiling normally, until it reached the command: ./genattrtab ../../gcc/config/i386/i386.md > tmp-attrtab.c At this time, genattrtab started to accumulate 70+ Megabytes of memory. For comparison, I only have 32MB of RAM and 64MB of swap space. Also during this time were several daemon and user-level programs running, using at most 4MB of ram each and running peacefully in the background. The system slowed down to a crawl. 5 minutes later, the OOM killer finally kicked in and killed 5 processes: . I figure, okay, httpd doesn't need to run, I'd rather give the GCC-compilation the extra RAM it needs to finish its 'genattrtab' program. 10 minutes pass and the system does not get better. Then all of a sudden, the console flashes with more processes killed. "What is going on here," I thought to myself. There were only 6 httpd processes running when I first started the compilation. It appears that the OOM killer destroyed only the children of the Apache web daemon, and not the daemon itself! The web daemon just spawned more httpd processes to fill in the children that it lost earlier. Meanwhile, genattrtab continued to consume RAM in the background. After 10 more minutes of waiting on the OOM killer, I come back to a console that is filled with 'Killing process ' messages. It never had the bright idea to kill the parent or any process OTHER than httpd. The expected process to kill here would be ./genattrtab, which at the time was consuming more RAM than available and had only started 25 minutes prior... root 1099 63.6 61.5 71424 18740 pts/0 R09:39 1:22 ./genattrtab This was my first OOM killer test, run on 2.4.0-test9-final with Rik's VM patches that went into test10-pre1. My prognosis is that the VM runs almost 2x as fast when there is memory available and swapping occurs, compared to the old VM. However, when memory runs out, it takes up to 5 minutes for the OOM killer to start killing processes, and does a bad job at that. Granted, the random OOM killer in 2.2 was better at its job than this because it brought back a usable system. Even something that killed the process that's using the most RAM or the process that allocates the most space in a set period of time would be good in this case. We need to decide on a better algorithm, albeit simple, that will alleviate this problem before 2.4.0 final comes out. Regards, Byron -- Byron Stanoszek Ph: (330) 644-3059 Systems Programmer Fax: (330) 644-8110 Commercial Timesharing Inc. Email: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
OOM Test Case - Failed!
I am very unimpressed with the current OOM killer. After 10 days of online time, I decided to try compiling gcc again, the very culprit that killed my last system using 2.4.0-test8 Friday night (to which I was unable to reset the system until Monday morning). GCC started compiling normally, until it reached the command: ./genattrtab ../../gcc/config/i386/i386.md tmp-attrtab.c At this time, genattrtab started to accumulate 70+ Megabytes of memory. For comparison, I only have 32MB of RAM and 64MB of swap space. Also during this time were several daemon and user-level programs running, using at most 4MB of ram each and running peacefully in the background. The system slowed down to a crawl. 5 minutes later, the OOM killer finally kicked in and killed 5 processes: httpd. I figure, okay, httpd doesn't need to run, I'd rather give the GCC-compilation the extra RAM it needs to finish its 'genattrtab' program. 10 minutes pass and the system does not get better. Then all of a sudden, the console flashes with more httpd processes killed. "What is going on here," I thought to myself. There were only 6 httpd processes running when I first started the compilation. It appears that the OOM killer destroyed only the children of the Apache web daemon, and not the daemon itself! The web daemon just spawned more httpd processes to fill in the children that it lost earlier. Meanwhile, genattrtab continued to consume RAM in the background. After 10 more minutes of waiting on the OOM killer, I come back to a console that is filled with 'Killing process httpd' messages. It never had the bright idea to kill the parent or any process OTHER than httpd. The expected process to kill here would be ./genattrtab, which at the time was consuming more RAM than available and had only started 25 minutes prior... root 1099 63.6 61.5 71424 18740 pts/0 R09:39 1:22 ./genattrtab This was my first OOM killer test, run on 2.4.0-test9-final with Rik's VM patches that went into test10-pre1. My prognosis is that the VM runs almost 2x as fast when there is memory available and swapping occurs, compared to the old VM. However, when memory runs out, it takes up to 5 minutes for the OOM killer to start killing processes, and does a bad job at that. Granted, the random OOM killer in 2.2 was better at its job than this because it brought back a usable system. Even something that killed the process that's using the most RAM or the process that allocates the most space in a set period of time would be good in this case. We need to decide on a better algorithm, albeit simple, that will alleviate this problem before 2.4.0 final comes out. Regards, Byron -- Byron Stanoszek Ph: (330) 644-3059 Systems Programmer Fax: (330) 644-8110 Commercial Timesharing Inc. Email: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/