Re: OOM Test Case - Failed!

2000-10-21 Thread Byron Stanoszek

On Sat, 21 Oct 2000, Rik van Riel wrote:

> > The oom killer avoided killing your busy, large, root-owned
> > process. Don't run gcc compiles as root.  Protecting root
> > processes is an explicit design goal here.
> 
> Also:
> 
> 1) his system pretty much continued to run
> 2) since only httpd children got killed, no work
>was lost

The system ran, but nothing moved. No process was able to do any activity,
because they were all waiting on swapped out space or waiting to use more
as-of-yet unallocated virtual memory. I could verify this because one of
my daemons writes one line to disk every 5 minutes. That stopped completely
during this event.

> (only the fact that he ran genattrtab as root screwed
> up things a bit and kept the system from killing the
> task -- but probably only just)

If I would have known, I would have done otherwise.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-21 Thread Rik van Riel

On Wed, 18 Oct 2000, Stephen Tweedie wrote:
> On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote:
> 
> > I am very unimpressed with the current OOM killer. After 10 days of online
> > time, I decided to try compiling gcc again, the very culprit that killed my
> > last system using 2.4.0-test8 Friday night (to which I was unable to reset
> > the system until Monday morning).
> > 
> > root  1099 63.6 61.5 71424 18740 pts/0   R09:39   1:22 ./genattrtab
> 
> The oom killer avoided killing your busy, large, root-owned
> process. Don't run gcc compiles as root.  Protecting root
> processes is an explicit design goal here.

Also:

1) his system pretty much continued to run
2) since only httpd children got killed, no work
   was lost

(only the fact that he ran genattrtab as root screwed
up things a bit and kept the system from killing the
task -- but probably only just)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-21 Thread Rik van Riel

On Wed, 18 Oct 2000, Stephen Tweedie wrote:
 On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote:
 
  I am very unimpressed with the current OOM killer. After 10 days of online
  time, I decided to try compiling gcc again, the very culprit that killed my
  last system using 2.4.0-test8 Friday night (to which I was unable to reset
  the system until Monday morning).
  
  root  1099 63.6 61.5 71424 18740 pts/0   R09:39   1:22 ./genattrtab
 
 The oom killer avoided killing your busy, large, root-owned
 process. Don't run gcc compiles as root.  Protecting root
 processes is an explicit design goal here.

Also:

1) his system pretty much continued to run
2) since only httpd children got killed, no work
   was lost

(only the fact that he ran genattrtab as root screwed
up things a bit and kept the system from killing the
task -- but probably only just)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-21 Thread Byron Stanoszek

On Sat, 21 Oct 2000, Rik van Riel wrote:

  The oom killer avoided killing your busy, large, root-owned
  process. Don't run gcc compiles as root.  Protecting root
  processes is an explicit design goal here.
 
 Also:
 
 1) his system pretty much continued to run
 2) since only httpd children got killed, no work
was lost

The system ran, but nothing moved. No process was able to do any activity,
because they were all waiting on swapped out space or waiting to use more
as-of-yet unallocated virtual memory. I could verify this because one of
my daemons writes one line to disk every 5 minutes. That stopped completely
during this event.

 (only the fact that he ran genattrtab as root screwed
 up things a bit and kept the system from killing the
 task -- but probably only just)

If I would have known, I would have done otherwise.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-18 Thread Stephen Tweedie

Hi,

On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote:

> I am very unimpressed with the current OOM killer. After 10 days of online
> time, I decided to try compiling gcc again, the very culprit that killed my
> last system using 2.4.0-test8 Friday night (to which I was unable to reset
> the system until Monday morning).
> 
> root  1099 63.6 61.5 71424 18740 pts/0   R09:39   1:22 ./genattrtab

The oom killer avoided killing your busy, large, root-owned process.
Don't run gcc compiles as root.  Protecting root processes is an
explicit design goal here.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-18 Thread Ingo Oeser

On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote:
> I am very unimpressed with the current OOM killer. 
[...]
> We need to decide on a better algorithm,
> albeit simple, that will alleviate this problem before 2.4.0 final comes out.

We don't need to decide on one, you can provide and install your
own, if your apply my oom-killer-api-patch.

It's at: http://www.tu-chemnitz.de/~ioe/oom_kill_api.patch

PS: Removed Linus from CC, because every change of MM has to be
   approved by Rik first. Added linux-mm, because it's an MM issue.

PPS: We had an controversal discussion at linux-mm about this
   last week. So look into the archives.

Regards

Ingo Oeser
-- 
Feel the power of the penguin - run [EMAIL PROTECTED]
:x
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-18 Thread Ingo Oeser

On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote:
 I am very unimpressed with the current OOM killer. 
[...]
 We need to decide on a better algorithm,
 albeit simple, that will alleviate this problem before 2.4.0 final comes out.

We don't need to decide on one, you can provide and install your
own, if your apply my oom-killer-api-patch.

It's at: http://www.tu-chemnitz.de/~ioe/oom_kill_api.patch

PS: Removed Linus from CC, because every change of MM has to be
   approved by Rik first. Added linux-mm, because it's an MM issue.

PPS: We had an controversal discussion at linux-mm about this
   last week. So look into the archives.

Regards

Ingo Oeser
-- 
Feel the power of the penguin - run [EMAIL PROTECTED]
esc:x
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-18 Thread Stephen Tweedie

Hi,

On Tue, Oct 17, 2000 at 10:02:52AM -0400, Byron Stanoszek wrote:

 I am very unimpressed with the current OOM killer. After 10 days of online
 time, I decided to try compiling gcc again, the very culprit that killed my
 last system using 2.4.0-test8 Friday night (to which I was unable to reset
 the system until Monday morning).
 
 root  1099 63.6 61.5 71424 18740 pts/0   R09:39   1:22 ./genattrtab

The oom killer avoided killing your busy, large, root-owned process.
Don't run gcc compiles as root.  Protecting root processes is an
explicit design goal here.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



OOM Test Case - Failed!

2000-10-17 Thread Byron Stanoszek

I am very unimpressed with the current OOM killer. After 10 days of online
time, I decided to try compiling gcc again, the very culprit that killed my
last system using 2.4.0-test8 Friday night (to which I was unable to reset
the system until Monday morning).

GCC started compiling normally, until it reached the command:
  ./genattrtab ../../gcc/config/i386/i386.md > tmp-attrtab.c

At this time, genattrtab started to accumulate 70+ Megabytes of memory. For
comparison, I only have 32MB of RAM and 64MB of swap space. Also during this
time were several daemon and user-level programs running, using at most 4MB
of ram each and running peacefully in the background.

The system slowed down to a crawl. 5 minutes later, the OOM killer finally
kicked in and killed 5 processes: . I figure, okay, httpd doesn't need
to run, I'd rather give the GCC-compilation the extra RAM it needs to finish
its 'genattrtab' program.

10 minutes pass and the system does not get better. Then all of a sudden, the
console flashes with more  processes killed. "What is going on here," I
thought to myself. There were only 6 httpd processes running when I first
started the compilation. It appears that the OOM killer destroyed only the
children of the Apache web daemon, and not the daemon itself! The web daemon
just spawned more httpd processes to fill in the children that it lost earlier.

Meanwhile, genattrtab continued to consume RAM in the background. After 10 more
minutes of waiting on the OOM killer, I come back to a console that is filled
with 'Killing process ' messages. It never had the bright idea to kill
the parent or any process OTHER than httpd.

The expected process to kill here would be ./genattrtab, which at the time was
consuming more RAM than available and had only started 25 minutes prior...

root  1099 63.6 61.5 71424 18740 pts/0   R09:39   1:22 ./genattrtab

This was my first OOM killer test, run on 2.4.0-test9-final with Rik's VM
patches that went into test10-pre1. My prognosis is that the VM runs almost 2x
as fast when there is memory available and swapping occurs, compared to the
old VM. However, when memory runs out, it takes up to 5 minutes for the OOM
killer to start killing processes, and does a bad job at that.

Granted, the random OOM killer in 2.2 was better at its job than this because
it brought back a usable system. Even something that killed the process that's
using the most RAM or the process that allocates the most space in a set period
of time would be good in this case. We need to decide on a better algorithm,
albeit simple, that will alleviate this problem before 2.4.0 final comes out.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



OOM Test Case - Failed!

2000-10-17 Thread Byron Stanoszek

I am very unimpressed with the current OOM killer. After 10 days of online
time, I decided to try compiling gcc again, the very culprit that killed my
last system using 2.4.0-test8 Friday night (to which I was unable to reset
the system until Monday morning).

GCC started compiling normally, until it reached the command:
  ./genattrtab ../../gcc/config/i386/i386.md  tmp-attrtab.c

At this time, genattrtab started to accumulate 70+ Megabytes of memory. For
comparison, I only have 32MB of RAM and 64MB of swap space. Also during this
time were several daemon and user-level programs running, using at most 4MB
of ram each and running peacefully in the background.

The system slowed down to a crawl. 5 minutes later, the OOM killer finally
kicked in and killed 5 processes: httpd. I figure, okay, httpd doesn't need
to run, I'd rather give the GCC-compilation the extra RAM it needs to finish
its 'genattrtab' program.

10 minutes pass and the system does not get better. Then all of a sudden, the
console flashes with more httpd processes killed. "What is going on here," I
thought to myself. There were only 6 httpd processes running when I first
started the compilation. It appears that the OOM killer destroyed only the
children of the Apache web daemon, and not the daemon itself! The web daemon
just spawned more httpd processes to fill in the children that it lost earlier.

Meanwhile, genattrtab continued to consume RAM in the background. After 10 more
minutes of waiting on the OOM killer, I come back to a console that is filled
with 'Killing process httpd' messages. It never had the bright idea to kill
the parent or any process OTHER than httpd.

The expected process to kill here would be ./genattrtab, which at the time was
consuming more RAM than available and had only started 25 minutes prior...

root  1099 63.6 61.5 71424 18740 pts/0   R09:39   1:22 ./genattrtab

This was my first OOM killer test, run on 2.4.0-test9-final with Rik's VM
patches that went into test10-pre1. My prognosis is that the VM runs almost 2x
as fast when there is memory available and swapping occurs, compared to the
old VM. However, when memory runs out, it takes up to 5 minutes for the OOM
killer to start killing processes, and does a bad job at that.

Granted, the random OOM killer in 2.2 was better at its job than this because
it brought back a usable system. Even something that killed the process that's
using the most RAM or the process that allocates the most space in a set period
of time would be good in this case. We need to decide on a better algorithm,
albeit simple, that will alleviate this problem before 2.4.0 final comes out.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/