Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 19:53:50 -0700, [EMAIL PROTECTED] wrote:
> On other machines I'd set RLIMIT_DATA and my OOM problems went away,
> but on linux this didn't work

RLIMIT_DATA appears to only be checked for aout format executables.
Looking at the 2.4.0-test10pre1 sources for fs/binfmt_aout.c and
fs/binfmt_elf.c you'll note the difference in load_aout_binary() and
load_elf_binary(), both just above the comment of "OK, This is the point
of no return"

Does putting a similar check to the aout one make sense for ELF?

I'm just trying to avoid Rik having to pull his hair out implementing a
system that conceptually already exists in the kernel (nasty processes
being terminated before they do some damage).  Especially when that
existing system is far more configurable.

Cheers,

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 11:45:06 -0400, Bruce A. Locke wrote:
> This manpage shows me functions and structs.

What were you expecting from the system call section of the Linux
Programmer's Manual?  Dancing girls?

(h...)

> I'm assuming you want these used by the offending program or the shell
> under which the program is being called.

That's usually what happens.

> In the first case, a person might not have source to the program and
> if thats the case, it doesn't help much.

Closed-source software is *so* 20th century... ;-)  Anyway, when run
from the shell it'll inherit its parent's limits (which leads to your
next question...)

> And in the second case, if the shell sets it, does it affect children
> of a process (aka fork()'d)?  

Certainly.

Maybe if more distributions took Debian's stance and set the default
limits so anal that you frequently can't even read email let alone
recompile the kernel without getting the process terminated for tripping
one limit or another, then more people would know this functionality
exists and set the limits more appropriately.

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 12:48:54 -0400, Andrew Pimlott wrote:
> No way should a desktop user be responsible for micro-managing the
> resource usage of his applications.

That's right.  The systems administrator should, and will set
appropriate limits for users on his/her system that apply from login.

This is how the systems I first used were configured (lucky me had a
damn fine sysadmin), and so this is how I configure mine.

> The only thing that knows what's right for Netscape is Netscape.

I would disagree with this, I believe this is exactly the root of
people's problems with Netscape (and the same theory should apply to
other apps).  The application doesn't know what's _right_ - it knows
what it _wants_.  Big difference.

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Andrew Pimlott

On Thu, Oct 12, 2000 at 01:58:49AM +1100, Matthew Hawkins wrote:
> On 2000-10-11 10:33:39 -0400, Bruce A. Locke wrote:
> > 
> > Your making the deadly assumption that all applications behave themselves
> > exactly the same all the time.  Oops... netscape decided to freak out and
> > take up all your memory... guess its the admins fault.
> 
> Yep, for not setting appropriate resource limits.

No way should a desktop user be responsible for micro-managing the
resource usage of his applications.  How can he decide what's
reasonable for Netscape to consume?  Shouldn't Netscape be allowed
to take up most of memory, if it's the only major application and
the memory will improve its performance?

The only thing that knows what's right for Netscape is Netscape.  If
Netscape were clever and kind, perhaps it would estimate what's
reasonable and set limits on itself, adjusting them from time to
time based on user behavior and environmental factors.  But
Netscape's a pretty mature program, and it doesn't do this; it can
hardly be expected of the zillions of immature (and probably leaky)
applications a user might run.

So, we inevitably need an automated low-memory or out-of-memory
algorithm.  I tend to think it may need to be more adjustable than
Rik's--people will be much more comfortable if they can say "spare
this simulation at all cost!" or "kill off one of these processes in
an emergency" or "this system has no business coming within 90% of
RAM+swap capacity, so start killing things at that point--oh, and
mail me".  Some of this has no place in the kernel, obviously.  But
Rik has a good start, and perhaps his work will be part of a more
complete solution.

Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Jesse Pollard

-  Received message begins Here  -

> 
> On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote:
> > Until user memory resource quotas are included in the kernel, there will be
> > nothing else that can be done. Even with resource quotas, if the total of
> > active users exceeds the resource then the same/equivalent situation occurs.
> 
> So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS,
> RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op?
> 
> If so, I wish to register a complaint ;-)

Not exactly. As I have seen it, each process gets a copy of these limits.
A single process cannot exceed the limit, but the sum of all processes
can.

One of the problems is cause by COW:

given trivially small limits (1 MB)

  The first process allocates and initializes up to one MB, then forks.
  The second process begins updating data - .5MB. Neither process exceeds
  the limits, but the sum is now 1.5MB. If this is repeated enough, then
  the system can go OOM, with none of the processes at or over the limits
  set.

Another problem occurs on multi-user servers. Each user logs in and
gets "reasonable" rlimit values - each user uses one medium sized
process. If the #users * rlimits exceeds the system capacity then OOM
could occur, and still none may have exceeded the rlimit.

I've always treated rlimit values as "suggestions" to the user process
to aid in debugging. (this is more applicable to the ulimits though).
The users process will not exeed the value, and when they do it is a
strong suggestion that a bug may be present. (I first saw this with a
leakey X server.)

There have been some patches (the beancounter stuff) that does relate
to resource control, but a more integrated resource accounting will make
it work better. I do believe it should be available as an option, especially
for multi-user servers, clusters, and other large systems.

It isn't that usefull on single user workstations.

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Richard B. Johnson

On Thu, 12 Oct 2000, Matthew Hawkins wrote:

> 
> Seriously, am I missing something obvious or is it far simpler just to
> keel over and die if the system goes OOM?  I mean, seriously, if the
> administrator lets it get to that state then he/she/it deserves a dead
> system.  It's akin to having your car run out of petrol - you don't
> start shooting passengers because their extra load made the engine chew
> more.  You pack up your kitty and go to the nearest petrol station and
> buy more, plug it into the car then learn from the experience so this
> fringe case of it happening doesn't happen again.  I don't really see
> much difference between a car going "OOP" and a computer going OOM.
> Should we start deleting files according to some randomly-chosen
> heueristic if a filesystem goes "OOS" ?

Excellent point. However, the idea is to kill an attacker if your 'car'
is being hijacked.
 
Whatever is being designed should ideally have zero impact on the usual
performance and only come into play if something runs away, deliberately
or by accident.

If Linux doesn't track down and kill deliberate attempts to kill the
system, there will always be those who say; "Linux is no good because
a user can readily kill it". Of course we could track down and
kill those who say this, but it'd get messy.

FYI, a fork() bomb on my Sun Workstation does not kill it. Also
malloc()ing and writing all over the place doesn't kill it either.


Script started on Wed Oct 11 10:41:38 2000
# cat xxx.c

main()
{
for(;;)
fork();
}

# gcc -o xxx xxx.c
# ./xxx
^C
# # ^C
# ps
   PID TTY  TIME CMD
 24800 pts/10:00 xxx
 24335 pts/10:00 sh
 24688 pts/10:00 xxx
 24690 pts/10:00 xxx
 24692 pts/10:00 xxx
 24694 pts/10:00 xxx
 24696 pts/10:00 xxx
 24697 pts/10:00 xxx
 24699 pts/10:00 xxx
 24701 pts/10:00 xxx
 24703 pts/10:00 xxx
 24704 pts/10:00 xxx
 24706 pts/10:00 xxx
 24708 pts/10:00 xxx
 24710 pts/10:00 xxx
 24712 pts/10:00 xxx
 24714 pts/10:00 xxx
 24716 pts/10:00 xxx
 24717 pts/10:00 xxx
 24719 pts/10:00 xxx
 24720 pts/10:00 xxx
 24721 pts/10:00 xxx
 24722 pts/10:00 xxx
 24723 pts/10:00 xxx
 24724 pts/10:00 xxx
 24725 pts/10:00 xxx
 24726 pts/10:00 xxx
 24727 pts/10:00 xxx
 24728 pts/10:00 xxx
 24729 pts/10:00 xxx
 24730 pts/10:00 xxx
 24731 pts/10:00 xxx
 24732 pts/10:00 xxx
 24733 pts/10:00 xxx
 24734 pts/10:00 xxx
 24735 pts/10:00 xxx
 24736 pts/10:00 xxx
 24737 pts/10:00 xxx
 24738 pts/10:00 xxx
 24739 pts/10:00 xxx
 24740 pts/10:00 xxx
 24741 pts/10:00 xxx
 24742 pts/10:00 xxx
 24743 pts/10:00 xxx
 24744 pts/10:00 xxx
 24801 pts/10:00 ps
 24687 pts/10:00 xxx
 24689 pts/10:00 xxx
 24691 pts/10:00 xxx
 24693 pts/10:00 xxx
 24695 pts/10:00 xxx
 24698 pts/10:00 xxx
 24700 pts/10:00 xxx
 24702 pts/10:00 xxx
 24705 pts/10:00 xxx
 24707 pts/10:00 xxx
 24709 pts/10:00 xxx
 24711 pts/10:00 xxx
 24713 pts/10:00 xxx
 24715 pts/10:00 xxx
 24718 pts/10:00 xxx
 24653 pts/10:00 xxx
 24610 pts/10:00 xxx
 24614 pts/10:00 xxx
 24615 pts/10:00 xxx
 24616 pts/10:00 xxx
 24617 pts/10:00 xxx
 24618 pts/10:00 xxx
 24619 pts/10:00 xxx
 24620 pts/10:00 xxx
 24621 pts/10:00 xxx
 24622 pts/10:00 xxx
 24623 pts/10:00 xxx
 24624 pts/10:00 xxx
 24625 pts/10:00 xxx
 24626 pts/10:00 xxx
 24627 pts/10:00 xxx
 24628 pts/10:00 xxx
 24629 pts/10:00 xxx
 24630 pts/10:00 xxx
 24631 pts/10:00 xxx
 24632 pts/10:00 xxx
 24686 pts/10:00 xxx
 24685 pts/10:00 xxx
 24684 pts/10:00 xxx
 24683 pts/10:00 xxx
 24682 pts/10:00 xxx
 24681 pts/10:00 xxx
 24680 pts/10:00 xxx
 24679 pts/10:00 xxx
 24678 pts/10:00 xxx
 24677 pts/10:00 xxx
 24676 pts/10:00 xxx
 24675 pts/10:00 xxx
 24674 pts/10:00 xxx
 24673 pts/10:00 xxx
 24672 pts/10:00 xxx
 24671 pts/10:00 xxx
 24670 pts/10:00 xxx
 24669 pts/10:00 xxx
 24668 pts/10:00 xxx
 24667 pts/10:00 xxx
 24666 pts/10:00 xxx
 24665 pts/10:00 xxx
 24664 pts/10:00 xxx
 24663 pts/10:00 xxx
 24662 pts/10:00 xxx
 24661 pts/10:00 xxx
 24660 pts/10:00 xxx
 24659 pts/10:00 xxx
 24658 pts/10:00 xxx
 24657 pts/10:00 xxx
 24656 pts/10:00 xxx
 24655 pts/10:00 xxx
 24654 pts/10:00 xxx
 24652 pts/10:00 xxx
 24651 pts/10:00 xxx
 24650 pts/10:00 xxx
 24649 pts/10:00 xxx
 24648 pts/10:00 xxx
 24647 pts/10:00 xxx
 24646 pts/10:00 xxx
 24645 pts/10:00 xxx
 24644 pts/10:00 xxx
 24643 pts/10:00 xxx
 24642 pts/10:00 xxx
 24634 pts/10:00 xxx
 24633 pts/10:00 xxx
 24641 pts/10:00 xxx
 24640 pts/10:00 xxx
 24639 pts/10:00 xxx
 24638 pts/10:00 xxx
 24637 pts/10:00 xxx
 24636 pts/10:00 xxx
 24635 pts/10:00 xxx
 24613 pts/1

Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 10:33:39 -0400, Bruce A. Locke wrote:
> 
> Your making the deadly assumption that all applications behave themselves
> exactly the same all the time.  Oops... netscape decided to freak out and
> take up all your memory... guess its the admins fault.

Yep, for not setting appropriate resource limits.

man 2 setrlimit

Of course, if its a kernel bug that causes it I think you're SOL ;)

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote:
> Until user memory resource quotas are included in the kernel, there will be
> nothing else that can be done. Even with resource quotas, if the total of
> active users exceeds the resource then the same/equivalent situation occurs.

So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS,
RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op?

If so, I wish to register a complaint ;-)

-- 
* Matthew Hawkins <[EMAIL PROTECTED]> :(){ :|:&};:
** Information Specialist, tSA Group Pty. Ltd.   Ph: +61 2 6257 7111
*** 1 Hall Street, Lyneham ACT 2602 Australia.   Fx: +61 2 6257 7311
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Jesse Pollard

-  Received message begins Here  -

> 
> 
> Heh.. now all we need is some smart-arse to make something similar to
> apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy
> ;)
> 
> Seriously, am I missing something obvious or is it far simpler just to
> keel over and die if the system goes OOM?  I mean, seriously, if the
> administrator lets it get to that state then he/she/it deserves a dead
> system.  It's akin to having your car run out of petrol - you don't
> start shooting passengers because their extra load made the engine chew
> more.  You pack up your kitty and go to the nearest petrol station and
> buy more, plug it into the car then learn from the experience so this
> fringe case of it happening doesn't happen again.  I don't really see
> much difference between a car going "OOP" and a computer going OOM.
> Should we start deleting files according to some randomly-chosen
> heueristic if a filesystem goes "OOS" ?

Not deleting files, but your system may crash :)

The problem with memory is that the tools are not available (ie already
included in the kernel) to do anything else. In the example of running
out of file space, there are quota limits. You can still run out of space,
but only when the sum of all users quota allocations exceed the disk
capacity.

Until user memory resource quotas are included in the kernel, there will be
nothing else that can be done. Even with resource quotas, if the total of
active users exceeds the resource then the same/equivalent situation occurs.

What is being done is still necessary, but in the long term it will end
up addressing the case where a single user runs out, rather than the system
as a whole.

User memory resource quota control is needed in large clusters, and in large
systems with multiple users. In a single user environment, resource quotas
are less important than providing a consistant (and hopefully intutitive)
process abort. That keeps the system going, and becomes up to the user to
choose what else may need to be aborted.

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-11 Thread Matthew Hawkins


Heh.. now all we need is some smart-arse to make something similar to
apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy
;)

Seriously, am I missing something obvious or is it far simpler just to
keel over and die if the system goes OOM?  I mean, seriously, if the
administrator lets it get to that state then he/she/it deserves a dead
system.  It's akin to having your car run out of petrol - you don't
start shooting passengers because their extra load made the engine chew
more.  You pack up your kitty and go to the nearest petrol station and
buy more, plug it into the car then learn from the experience so this
fringe case of it happening doesn't happen again.  I don't really see
much difference between a car going "OOP" and a computer going OOM.
Should we start deleting files according to some randomly-chosen
heueristic if a filesystem goes "OOS" ?

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-11 Thread Andrea Arcangeli

On Wed, Oct 11, 2000 at 11:08:41AM +0200, Helge Hafting wrote:
> Nothing wrong with a big init - the problem is a memory-leaking init.
> That one will die anyway, wether it dies early from an OOM-killer
> or later when all other processes are gone don't really matter.

Indeed.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-11 Thread Helge Hafting

Andrea Arcangeli wrote:
> 
> On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote:
> > If you want init to live - prove that it don't eat too much memory.
> 
> I don't see why the machine should be stable only if init is small.
> My kernel won't be stable only if init is small since it doesn't cost
> anything to handle correctly the big init case.
>
Nothing wrong with a big init - the problem is a memory-leaking init.
That one will die anyway, wether it dies early from an OOM-killer
or later when all other processes are gone don't really matter.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-11 Thread Helge Hafting

Andrea Arcangeli wrote:
 
 On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote:
  If you want init to live - prove that it don't eat too much memory.
 
 I don't see why the machine should be stable only if init is small.
 My kernel won't be stable only if init is small since it doesn't cost
 anything to handle correctly the big init case.

Nothing wrong with a big init - the problem is a memory-leaking init.
That one will die anyway, wether it dies early from an OOM-killer
or later when all other processes are gone don't really matter.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-11 Thread Andrea Arcangeli

On Wed, Oct 11, 2000 at 11:08:41AM +0200, Helge Hafting wrote:
 Nothing wrong with a big init - the problem is a memory-leaking init.
 That one will die anyway, wether it dies early from an OOM-killer
 or later when all other processes are gone don't really matter.

Indeed.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Matthew Hawkins


Heh.. now all we need is some smart-arse to make something similar to
apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy
;)

Seriously, am I missing something obvious or is it far simpler just to
keel over and die if the system goes OOM?  I mean, seriously, if the
administrator lets it get to that state then he/she/it deserves a dead
system.  It's akin to having your car run out of petrol - you don't
start shooting passengers because their extra load made the engine chew
more.  You pack up your kitty and go to the nearest petrol station and
buy more, plug it into the car then learn from the experience so this
fringe case of it happening doesn't happen again.  I don't really see
much difference between a car going "OOP" and a computer going OOM.
Should we start deleting files according to some randomly-chosen
heueristic if a filesystem goes "OOS" ?

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Bruce A. Locke


Your making the deadly assumption that all applications behave themselves
exactly the same all the time.  Oops... netscape decided to freak out and
take up all your memory... guess its the admins fault.  Oops... some
mod_perl script decided to freak out and an apache process decides to suck
all of your CPU and MEM.

Crap like this does happen.  An example of this is a webboard package
called "Blackboard" consisting of various mod_perl scripts, apache, and
mysql. It is an educational online conferencing system being used in
conjunction with many college classes and thus is quite vital to the
campus.  

Unfortunatly its buggy as hell and the memory sucking bug didn't pop up
until we were a couple weeks into classes and locked into the system.  A
mod_perl script freaks out, the copy of apache goes nuts, and we get a
bunch of lovely out of memory related messages to the console.  Its times
like these that an OOM killer like Rik's would be very useful.  I feel
Rik's OOM backported to 2.2.x would do wonders for situation.  After
playing with Rik's OOM system, I know it would do the right thing on this
system but unfortunatly 2.4.x isn't trustworthy yet

Yes, the software is buggy and should be fixed.  Do I have the power to
fix a broken commerical package that I'm locked into?  No.

The point of an OOM killer is if all hell breaks loose and you have a
choice between a locked up system, a system thats slow as hell because its
spending all its time swapping, or a system that kills the offender and
gets back to buisness.  I choose the third option.  I can't think of any
situation (either on desktop or server) where a system lockup or panic due
to OOM would be acceptible w/ 2.4.x.


On Thu, 12 Oct 2000, Matthew Hawkins wrote:

 
 Heh.. now all we need is some smart-arse to make something similar to
 apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy
 ;)
 
 Seriously, am I missing something obvious or is it far simpler just to
 keel over and die if the system goes OOM?  I mean, seriously, if the
 administrator lets it get to that state then he/she/it deserves a dead
 system.  It's akin to having your car run out of petrol - you don't
 start shooting passengers because their extra load made the engine chew
 more.  You pack up your kitty and go to the nearest petrol station and
 buy more, plug it into the car then learn from the experience so this
 fringe case of it happening doesn't happen again.  I don't really see
 much difference between a car going "OOP" and a computer going OOM.
 Should we start deleting files according to some randomly-chosen
 heueristic if a filesystem goes "OOS" ?
 
 -- 
 Matt
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
 

--
Bruce A. Locke
[EMAIL PROTECTED]

"The Internet views censorship as damage and routes around it"
www.eff.org  www.peacefire.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Jesse Pollard

-  Received message begins Here  -

 
 
 Heh.. now all we need is some smart-arse to make something similar to
 apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy
 ;)
 
 Seriously, am I missing something obvious or is it far simpler just to
 keel over and die if the system goes OOM?  I mean, seriously, if the
 administrator lets it get to that state then he/she/it deserves a dead
 system.  It's akin to having your car run out of petrol - you don't
 start shooting passengers because their extra load made the engine chew
 more.  You pack up your kitty and go to the nearest petrol station and
 buy more, plug it into the car then learn from the experience so this
 fringe case of it happening doesn't happen again.  I don't really see
 much difference between a car going "OOP" and a computer going OOM.
 Should we start deleting files according to some randomly-chosen
 heueristic if a filesystem goes "OOS" ?

Not deleting files, but your system may crash :)

The problem with memory is that the tools are not available (ie already
included in the kernel) to do anything else. In the example of running
out of file space, there are quota limits. You can still run out of space,
but only when the sum of all users quota allocations exceed the disk
capacity.

Until user memory resource quotas are included in the kernel, there will be
nothing else that can be done. Even with resource quotas, if the total of
active users exceeds the resource then the same/equivalent situation occurs.

What is being done is still necessary, but in the long term it will end
up addressing the case where a single user runs out, rather than the system
as a whole.

User memory resource quota control is needed in large clusters, and in large
systems with multiple users. In a single user environment, resource quotas
are less important than providing a consistant (and hopefully intutitive)
process abort. That keeps the system going, and becomes up to the user to
choose what else may need to be aborted.

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote:
 Until user memory resource quotas are included in the kernel, there will be
 nothing else that can be done. Even with resource quotas, if the total of
 active users exceeds the resource then the same/equivalent situation occurs.

So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS,
RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op?

If so, I wish to register a complaint ;-)

-- 
* Matthew Hawkins [EMAIL PROTECTED] :(){ :|:};:
** Information Specialist, tSA Group Pty. Ltd.   Ph: +61 2 6257 7111
*** 1 Hall Street, Lyneham ACT 2602 Australia.   Fx: +61 2 6257 7311
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Paul Jakma

On Wed, 11 Oct 2000, Bruce A. Locke wrote:

 
 Your making the deadly assumption that all applications behave themselves
 exactly the same all the time.  Oops... netscape decided to freak out and
 take up all your memory... guess its the admins fault.  Oops... some
 mod_perl script decided to freak out and an apache process decides to suck
 all of your CPU and MEM.
 

that's why you have per process limits set. Eg, PAM makes this
exceedingly easy with pam_limit.so - edit /etc/security/limit.conf.

this prevents at least 90% of OOM situations (ie individual leaky
processes). eg netscape will then pop-up "can not allocate memory"
messages and stop rendering pages instead of crashing your system.

--paulj

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 10:33:39 -0400, Bruce A. Locke wrote:
 
 Your making the deadly assumption that all applications behave themselves
 exactly the same all the time.  Oops... netscape decided to freak out and
 take up all your memory... guess its the admins fault.

Yep, for not setting appropriate resource limits.

man 2 setrlimit

Of course, if its a kernel bug that causes it I think you're SOL ;)

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Richard B. Johnson

On Thu, 12 Oct 2000, Matthew Hawkins wrote:

 
 Seriously, am I missing something obvious or is it far simpler just to
 keel over and die if the system goes OOM?  I mean, seriously, if the
 administrator lets it get to that state then he/she/it deserves a dead
 system.  It's akin to having your car run out of petrol - you don't
 start shooting passengers because their extra load made the engine chew
 more.  You pack up your kitty and go to the nearest petrol station and
 buy more, plug it into the car then learn from the experience so this
 fringe case of it happening doesn't happen again.  I don't really see
 much difference between a car going "OOP" and a computer going OOM.
 Should we start deleting files according to some randomly-chosen
 heueristic if a filesystem goes "OOS" ?

Excellent point. However, the idea is to kill an attacker if your 'car'
is being hijacked.
 
Whatever is being designed should ideally have zero impact on the usual
performance and only come into play if something runs away, deliberately
or by accident.

If Linux doesn't track down and kill deliberate attempts to kill the
system, there will always be those who say; "Linux is no good because
a user can readily kill it". Of course we could track down and
kill those who say this, but it'd get messy.

FYI, a fork() bomb on my Sun Workstation does not kill it. Also
malloc()ing and writing all over the place doesn't kill it either.


Script started on Wed Oct 11 10:41:38 2000
# cat xxx.c

main()
{
for(;;)
fork();
}

# gcc -o xxx xxx.c
# ./xxx
^C
# # ^C
# ps
   PID TTY  TIME CMD
 24800 pts/10:00 xxx
 24335 pts/10:00 sh
 24688 pts/10:00 xxx
 24690 pts/10:00 xxx
 24692 pts/10:00 xxx
 24694 pts/10:00 xxx
 24696 pts/10:00 xxx
 24697 pts/10:00 xxx
 24699 pts/10:00 xxx
 24701 pts/10:00 xxx
 24703 pts/10:00 xxx
 24704 pts/10:00 xxx
 24706 pts/10:00 xxx
 24708 pts/10:00 xxx
 24710 pts/10:00 xxx
 24712 pts/10:00 xxx
 24714 pts/10:00 xxx
 24716 pts/10:00 xxx
 24717 pts/10:00 xxx
 24719 pts/10:00 xxx
 24720 pts/10:00 xxx
 24721 pts/10:00 xxx
 24722 pts/10:00 xxx
 24723 pts/10:00 xxx
 24724 pts/10:00 xxx
 24725 pts/10:00 xxx
 24726 pts/10:00 xxx
 24727 pts/10:00 xxx
 24728 pts/10:00 xxx
 24729 pts/10:00 xxx
 24730 pts/10:00 xxx
 24731 pts/10:00 xxx
 24732 pts/10:00 xxx
 24733 pts/10:00 xxx
 24734 pts/10:00 xxx
 24735 pts/10:00 xxx
 24736 pts/10:00 xxx
 24737 pts/10:00 xxx
 24738 pts/10:00 xxx
 24739 pts/10:00 xxx
 24740 pts/10:00 xxx
 24741 pts/10:00 xxx
 24742 pts/10:00 xxx
 24743 pts/10:00 xxx
 24744 pts/10:00 xxx
 24801 pts/10:00 ps
 24687 pts/10:00 xxx
 24689 pts/10:00 xxx
 24691 pts/10:00 xxx
 24693 pts/10:00 xxx
 24695 pts/10:00 xxx
 24698 pts/10:00 xxx
 24700 pts/10:00 xxx
 24702 pts/10:00 xxx
 24705 pts/10:00 xxx
 24707 pts/10:00 xxx
 24709 pts/10:00 xxx
 24711 pts/10:00 xxx
 24713 pts/10:00 xxx
 24715 pts/10:00 xxx
 24718 pts/10:00 xxx
 24653 pts/10:00 xxx
 24610 pts/10:00 xxx
 24614 pts/10:00 xxx
 24615 pts/10:00 xxx
 24616 pts/10:00 xxx
 24617 pts/10:00 xxx
 24618 pts/10:00 xxx
 24619 pts/10:00 xxx
 24620 pts/10:00 xxx
 24621 pts/10:00 xxx
 24622 pts/10:00 xxx
 24623 pts/10:00 xxx
 24624 pts/10:00 xxx
 24625 pts/10:00 xxx
 24626 pts/10:00 xxx
 24627 pts/10:00 xxx
 24628 pts/10:00 xxx
 24629 pts/10:00 xxx
 24630 pts/10:00 xxx
 24631 pts/10:00 xxx
 24632 pts/10:00 xxx
 24686 pts/10:00 xxx
 24685 pts/10:00 xxx
 24684 pts/10:00 xxx
 24683 pts/10:00 xxx
 24682 pts/10:00 xxx
 24681 pts/10:00 xxx
 24680 pts/10:00 xxx
 24679 pts/10:00 xxx
 24678 pts/10:00 xxx
 24677 pts/10:00 xxx
 24676 pts/10:00 xxx
 24675 pts/10:00 xxx
 24674 pts/10:00 xxx
 24673 pts/10:00 xxx
 24672 pts/10:00 xxx
 24671 pts/10:00 xxx
 24670 pts/10:00 xxx
 24669 pts/10:00 xxx
 24668 pts/10:00 xxx
 24667 pts/10:00 xxx
 24666 pts/10:00 xxx
 24665 pts/10:00 xxx
 24664 pts/10:00 xxx
 24663 pts/10:00 xxx
 24662 pts/10:00 xxx
 24661 pts/10:00 xxx
 24660 pts/10:00 xxx
 24659 pts/10:00 xxx
 24658 pts/10:00 xxx
 24657 pts/10:00 xxx
 24656 pts/10:00 xxx
 24655 pts/10:00 xxx
 24654 pts/10:00 xxx
 24652 pts/10:00 xxx
 24651 pts/10:00 xxx
 24650 pts/10:00 xxx
 24649 pts/10:00 xxx
 24648 pts/10:00 xxx
 24647 pts/10:00 xxx
 24646 pts/10:00 xxx
 24645 pts/10:00 xxx
 24644 pts/10:00 xxx
 24643 pts/10:00 xxx
 24642 pts/10:00 xxx
 24634 pts/10:00 xxx
 24633 pts/10:00 xxx
 24641 pts/10:00 xxx
 24640 pts/10:00 xxx
 24639 pts/10:00 xxx
 24638 pts/10:00 xxx
 24637 pts/10:00 xxx
 24636 pts/10:00 xxx
 24635 pts/10:00 xxx
 24613 pts/10:00 xxx
 

Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Jesse Pollard

-  Received message begins Here  -

 
 On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote:
  Until user memory resource quotas are included in the kernel, there will be
  nothing else that can be done. Even with resource quotas, if the total of
  active users exceeds the resource then the same/equivalent situation occurs.
 
 So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS,
 RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op?
 
 If so, I wish to register a complaint ;-)

Not exactly. As I have seen it, each process gets a copy of these limits.
A single process cannot exceed the limit, but the sum of all processes
can.

One of the problems is cause by COW:

given trivially small limits (1 MB)

  The first process allocates and initializes up to one MB, then forks.
  The second process begins updating data - .5MB. Neither process exceeds
  the limits, but the sum is now 1.5MB. If this is repeated enough, then
  the system can go OOM, with none of the processes at or over the limits
  set.

Another problem occurs on multi-user servers. Each user logs in and
gets "reasonable" rlimit values - each user uses one medium sized
process. If the #users * rlimits exceeds the system capacity then OOM
could occur, and still none may have exceeded the rlimit.

I've always treated rlimit values as "suggestions" to the user process
to aid in debugging. (this is more applicable to the ulimits though).
The users process will not exeed the value, and when they do it is a
strong suggestion that a bug may be present. (I first saw this with a
leakey X server.)

There have been some patches (the beancounter stuff) that does relate
to resource control, but a more integrated resource accounting will make
it work better. I do believe it should be available as an option, especially
for multi-user servers, clusters, and other large systems.

It isn't that usefull on single user workstations.

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Bruce A. Locke

On Thu, 12 Oct 2000, Matthew Hawkins wrote:

 Yep, for not setting appropriate resource limits.
 
 man 2 setrlimit
 
 Of course, if its a kernel bug that causes it I think you're SOL ;)

This manpage shows me functions and structs.  I'm assuming you want these
used by the offending program or the shell under which the program is
being called.  In the first case, a person might not have source to the
program and if thats the case, it doesn't help much.  And in the second
case, if the shell sets it, does it affect children of a process (aka
fork()'d)?  

Thanks for yout time...

 
 -- 
 Matt
 

--
Bruce A. Locke
[EMAIL PROTECTED]

"The Internet views censorship as damage and routes around it"
www.eff.org  www.peacefire.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Bruce A. Locke

On Wed, 11 Oct 2000, Paul Jakma wrote:

 that's why you have per process limits set. Eg, PAM makes this
 exceedingly easy with pam_limit.so - edit /etc/security/limit.conf.
 
 this prevents at least 90% of OOM situations (ie individual leaky
 processes). eg netscape will then pop-up "can not allocate memory"
 messages and stop rendering pages instead of crashing your system.

I wasn't aware PAM settings affected daemons started up during boottime
but I will check into it, thank you.

BTW, you said it works only 90%, what are the other 10% of times it
doesn't work?

 
 --paulj
 

--
Bruce A. Locke
[EMAIL PROTECTED]

"The Internet views censorship as damage and routes around it"
www.eff.org  www.peacefire.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Paul Jakma

On Wed, 11 Oct 2000, Bruce A. Locke wrote:

 I wasn't aware PAM settings affected daemons started up during boottime
 but I will check into it, thank you.
 

daemons generally don't need to be PAM aware (unless they deal with
authorising things). The script that launches it however (if started
by a PAM aware app such as su) can set limits - which the daemon
should inherit.

 BTW, you said it works only 90%, what are the other 10% of times it
 doesn't work?
 

malicious processes, or a collection of processes.

--paulj

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Andrew Pimlott

On Thu, Oct 12, 2000 at 01:58:49AM +1100, Matthew Hawkins wrote:
 On 2000-10-11 10:33:39 -0400, Bruce A. Locke wrote:
  
  Your making the deadly assumption that all applications behave themselves
  exactly the same all the time.  Oops... netscape decided to freak out and
  take up all your memory... guess its the admins fault.
 
 Yep, for not setting appropriate resource limits.

No way should a desktop user be responsible for micro-managing the
resource usage of his applications.  How can he decide what's
reasonable for Netscape to consume?  Shouldn't Netscape be allowed
to take up most of memory, if it's the only major application and
the memory will improve its performance?

The only thing that knows what's right for Netscape is Netscape.  If
Netscape were clever and kind, perhaps it would estimate what's
reasonable and set limits on itself, adjusting them from time to
time based on user behavior and environmental factors.  But
Netscape's a pretty mature program, and it doesn't do this; it can
hardly be expected of the zillions of immature (and probably leaky)
applications a user might run.

So, we inevitably need an automated low-memory or out-of-memory
algorithm.  I tend to think it may need to be more adjustable than
Rik's--people will be much more comfortable if they can say "spare
this simulation at all cost!" or "kill off one of these processes in
an emergency" or "this system has no business coming within 90% of
RAM+swap capacity, so start killing things at that point--oh, and
mail me".  Some of this has no place in the kernel, obviously.  But
Rik has a good start, and perhaps his work will be part of a more
complete solution.

Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Rik van Riel

On Thu, 12 Oct 2000, Matthew Hawkins wrote:
 On 2000-10-11 09:45:30 -0500, Jesse Pollard wrote:
  Until user memory resource quotas are included in the kernel, there will be
  nothing else that can be done. Even with resource quotas, if the total of
  active users exceeds the resource then the same/equivalent situation occurs.
 
 So setrlimit() with RLIMIT_DATA, RLIMIT_STACK, RLIMIT_RSS,
 RLIMIT_MEMLOCK, RLIMIT_AS et al is a null op?
 
 If so, I wish to register a complaint ;-)

Don't send a complaint, send patches ...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 12:48:54 -0400, Andrew Pimlott wrote:
 No way should a desktop user be responsible for micro-managing the
 resource usage of his applications.

That's right.  The systems administrator should, and will set
appropriate limits for users on his/her system that apply from login.

This is how the systems I first used were configured (lucky me had a
damn fine sysadmin), and so this is how I configure mine.

 The only thing that knows what's right for Netscape is Netscape.

I would disagree with this, I believe this is exactly the root of
people's problems with Netscape (and the same theory should apply to
other apps).  The application doesn't know what's _right_ - it knows
what it _wants_.  Big difference.

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 11:45:06 -0400, Bruce A. Locke wrote:
 This manpage shows me functions and structs.

What were you expecting from the system call section of the Linux
Programmer's Manual?  Dancing girls?

(h...)

 I'm assuming you want these used by the offending program or the shell
 under which the program is being called.

That's usually what happens.

 In the first case, a person might not have source to the program and
 if thats the case, it doesn't help much.

Closed-source software is *so* 20th century... ;-)  Anyway, when run
from the shell it'll inherit its parent's limits (which leads to your
next question...)

 And in the second case, if the shell sets it, does it affect children
 of a process (aka fork()'d)?  

Certainly.

Maybe if more distributions took Debian's stance and set the default
limits so anal that you frequently can't even read email let alone
recompile the kernel without getting the process terminated for tripping
one limit or another, then more people would know this functionality
exists and set the limits more appropriately.

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread lamont


I've had to support an app running as a back-end to a webserver that would
malloc() different amounts of memory depending on user input, up to
multiple gigabytes of memory which vastly exceeded the 512k the machine
had as main memory.  The app was a program that would scan genetic
sequence looking for 'repeats' in the sequence, and one sequence would
malloc a hundred megs while a similar sequence of the same size would
cause the algorithm to try to malloc over a gig.  Part of the algorithm
was actually to simply try to malloc all the memory it could and if it ran
out, it would bump down the resolution that it was scanning with and try
again.  And it would regularly push the machine into OOM and take it down
because daemons got killed long before the OOM killer got around to taking
out the process that was malloc()ing all the memory.  On other machines
I'd set RLIMIT_DATA and my OOM problems went away, but on linux this
didn't work (and i wasn't comfortable enough with kernel sources back then
to manage to find RLIMIT_AS).

On Thu, 12 Oct 2000, Matthew Hawkins wrote:
 Heh.. now all we need is some smart-arse to make something similar to
 apply to the _entire_ VM subsystem, and both Rik and Andrea can be happy
 ;)
 
 Seriously, am I missing something obvious or is it far simpler just to
 keel over and die if the system goes OOM?  I mean, seriously, if the
 administrator lets it get to that state then he/she/it deserves a dead
 system.  It's akin to having your car run out of petrol - you don't
 start shooting passengers because their extra load made the engine chew
 more.  You pack up your kitty and go to the nearest petrol station and
 buy more, plug it into the car then learn from the experience so this
 fringe case of it happening doesn't happen again.  I don't really see
 much difference between a car going "OOP" and a computer going OOM.
 Should we start deleting files according to some randomly-chosen
 heueristic if a filesystem goes "OOS" ?
 
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-11 Thread Matthew Hawkins

On 2000-10-11 19:53:50 -0700, [EMAIL PROTECTED] wrote:
 On other machines I'd set RLIMIT_DATA and my OOM problems went away,
 but on linux this didn't work

RLIMIT_DATA appears to only be checked for aout format executables.
Looking at the 2.4.0-test10pre1 sources for fs/binfmt_aout.c and
fs/binfmt_elf.c you'll note the difference in load_aout_binary() and
load_elf_binary(), both just above the comment of "OK, This is the point
of no return"

Does putting a similar check to the aout one make sense for ELF?

I'm just trying to avoid Rik having to pull his hair out implementing a
system that conceptually already exists in the kernel (nasty processes
being terminated before they do some damage).  Especially when that
existing system is far more configurable.

Cheers,

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-10 Thread Tom Rini

On Tue, Oct 10, 2000 at 05:58:46PM -0300, Rik van Riel wrote:
> On Tue, 10 Oct 2000, Tom Rini wrote:
> > On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
> > > On Tue, 10 Oct 2000, Ingo Oeser wrote:
> > > 
> > > > before you argue endlessly about the "Right OOM Killer (TM)", I
> > > > did a small patch to allow replacing the OOM killer at runtime.
> > > > 
> > > > So now you can stop arguing about the one and only OOM killer,
> > > > implement it, provide it as module and get back to the important
> > > > stuff ;-)
> > > 
> > > This is definately a cool toy for people who have doubts
> > > that my OOM killer will do the wrong thing in their
> > > workloads.
> > 
> > I think this can be useful for more than just a cool toy.  I
> > think that the main thing that this discusion has shown is no
> > OOM killer will please 100% of the people 100% of the time.  I
> > think we should try and have a good generic OOM killer that
> > kills the right process most of the time.  People can impliment
> > (and submit) different-style OOM killers as needed.
> 
> Indeed, though I suspect most of the people trying this would
> fall into the trap of over-engineering their OOM killer, after
> which it mostly becomes less predictable ;)

I was thinking more along the lines of ones w/ "safety" features that not
everyone might like/need (ie /usr/local/bin/foo is always good, those
sugjestions).  It seems like useful functionality at little/no cost.
And a neat toy for now. :)

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-10 Thread Tom Rini

On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
> On Tue, 10 Oct 2000, Ingo Oeser wrote:
> 
> > before you argue endlessly about the "Right OOM Killer (TM)", I
> > did a small patch to allow replacing the OOM killer at runtime.
> > 
> > So now you can stop arguing about the one and only OOM killer,
> > implement it, provide it as module and get back to the important
> > stuff ;-)
> 
> This is definately a cool toy for people who have doubts
> that my OOM killer will do the wrong thing in their
> workloads.

I think this can be useful for more than just a cool toy.  I think that the
main thing that this discusion has shown is no OOM killer will please 100% of
the people 100% of the time.  I think we should try and have a good generic
OOM killer that kills the right process most of the time.  People can impliment
(and submit) different-style OOM killers as needed.  Or at least get 'em on
freshmeat. :)

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Miles Lane

Olaf Titz wrote:
> 
> > > Still, it would be nice to recover that 4 MB when the system
> > > doesn't have any memory left.
> > Yup. The X server could give back the memory for some cases like the
> > background without too much hackery.
> 
> Then Linux only needs to implement SIGDANGER, which has been talked
> about for years...
> 
> X would be a good candidate to implement a handler for it. Others are
> Emacs, Mozilla or JVMs - basically everything which has a GC of some
> sort. It could even be used to implement a configurable user mode OOM
> killer.

It would be good to talk to the KDE and Gnome folks about this as well.
I am pretty sure they have large blocks of memory that could be flushed
or freed in a low-memory or OOM condition.

Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Linus Torvalds



On Tue, 10 Oct 2000, Rogier Wolff wrote:
> 
> So if Netscape can "pump" 40 extra megabytes of memory out of X, this
> can be exploited. 
> 
> Now we're back to the point that a heuristic can never be right all
> the time..

I agree. In fact, we never left that.

Nothing is perfect.

In fact, a lot of engineering is _recognizing_ that you can never achieve
"perfect", and you're much better off not even trying - and having a
simple system that is "good enough".

This is the old adage of "perfect is the enemy of good" - trying too hard
is actually _detrimental_ in 99% of all cases. We should have simple
heuristics that work most of the time, instead of trying to cajole a
complex system like X to help us do some complicated resource management
system.

Complexity will just result in the OOM killer failing in surprising ways.

A simple heuristic will mean that the OOM killer will still fail, but at
least it won't be be in subtle and surprising ways.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-10 Thread Ingo Oeser

On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
> > So now you can stop arguing about the one and only OOM killer,
> > implement it, provide it as module and get back to the important
> > stuff ;-)
> 
> This is definately a cool toy for people who have doubts
> that my OOM killer will do the wrong thing in their
> workloads.

Thanks ;-)

But I forgot to include my changes to the mm/Makefile (to export
the API for modules).

Here is a _working_ one:

--- linux-2.4.0-test10-pre1/mm/oom_kill.c   Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c   Tue Oct 10 16:59:27 2000
@@ -13,6 +13,8 @@
  *  machine) this file will double as a 'coding guide' and a signpost
  *  for newbie kernel hackers. It features several pointers to major
  *  kernel subsystems and hints as to where to find out what things do.
+ *
+ *  Added oom_killer API for special needs - Ingo Oeser
  */
 
 #include 
@@ -136,7 +138,7 @@
 }
 
 /**
- * oom_kill - kill the "best" process when we run out of memory
+ * oom_kill_rik - kill the "best" process when we run out of memory
  *
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
@@ -147,7 +149,9 @@
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
  * we select a process with CAP_SYS_RAW_IO set).
  */
-void oom_kill(void)
+
+
+static void oom_kill_rik(void)
 {
 
struct task_struct *p = select_bad_process();
@@ -207,4 +211,63 @@
 
/* Else... */
return 1;
+}
+
+/* Protects oom_killer against resetting during its execution */
+static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED;
+
+static oom_killer_t oom_killer = oom_kill_rik;
+
+/** 
+ * oom_kill - the oom_kill wrapper for installable OOM killers
+ *
+ * Wraper around the OOM killers, that can be installed via
+ * install_oom_killer and reset_default_oom_killer.
+ *
+ * This gets called from kswapd() in linux/mm/vmscan.c when we 
+ * really run out of memory.
+ */
+void oom_kill(void) {
+   read_lock(_kill_lock);
+   oom_killer();
+   read_unlock(_kill_lock);
+}
+
+/**
+ * install_oom_killer - install alternate OOM killer
+ * @new_oom_kill: the alternate OOM killer provided by the caller
+ *
+ * Since the default OOM killer (oom_kill_rik) is not suitable 
+ * for everyone, we provide an interface to install custom OOM killers.
+ * 
+ * You can take the most appropriate action for your application if the
+ * kernel goes OOM.
+ *
+ * Providing an NULL argument just returns the current OOM killer.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ * 
+ * NOTE: We don't do refcounting on OOM killers, so be careful with 
+ * modules
+ */
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) {
+   oom_killer_t tmp;
+   write_lock(_kill_lock);
+   tmp=oom_killer;
+   if (new_oom_kill) 
+   oom_killer=new_oom_kill;
+   write_unlock(_kill_lock);
+   return tmp;
+}
+
+/**
+ * reset_default_oom_killer - reset back to default OOM killer
+ *
+ * If you are going to unload the module which provided 
+ * your OOM killer, you can install the default one by this.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ */
+oom_killer_t reset_default_oom_killer(void) {
+   return install_oom_killer(_kill_rik);
 }
--- linux-2.4.0-test10-pre1/include/linux/swap.hTue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.hTue Oct 10 16:44:22 2000
@@ -127,8 +127,14 @@
 #define read_swap_cache(entry) read_swap_cache_async(entry, 1);
 
 /* linux/mm/oom_kill.c */
+typedef void (*oom_killer_t)(void);
+
 extern int out_of_memory(void);
 extern void oom_kill(void);
+
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill);
+oom_killer_t reset_default_oom_killer(void);
+
 
 /*
  * Make these inline later once they are working properly.
--- linux-2.4.0-test10-pre1/mm/Makefile Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/Makefile Tue Oct 10 16:34:06 2000
@@ -10,7 +10,8 @@
 O_TARGET := mm.o
 O_OBJS  := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \
vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \
-   page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o
+   page_alloc.o swap_state.o swapfile.o numa.o
+OX_OBJS  := oom_kill.o
 
 ifeq ($(CONFIG_HIGHMEM),y)
 O_OBJS += highmem.o

Regards

Ingo Oeser
-- 
Feel the power of the penguin - run [EMAIL PROTECTED]
:x
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Philipp Rumpf

On Tue, Oct 10, 2000 at 12:30:51PM -0300, Rik van Riel wrote:
> Not killing init when we "should" definately prevents
> embedded systems from auto-rebooting when they should
> do so.
> 
> (OTOH, I don't think embedded systems will run into
> this OOM issue too much)

but when they do, they're hard to fix.  Think about an elevator control
system with a single process that happens to implement a somewhat broken
version of the elevator algorithm ;)

> > that's what I said.  we need to be sure to _get_ a panic() though.
> 
> I believe the kernel automatically panic()s when init
> dies ... from kernel/exit.c::do_exit()
> 
> if (tsk->pid == 1)
> panic("Attempted to kill init!");

guess who added that code.  We still kill init with SIGTERM which doesn't
seem to work though.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Rik van Riel

On Tue, 10 Oct 2000, Philipp Rumpf wrote:
> On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote:
> > On Tue, 10 Oct 2000, Philipp Rumpf wrote:
> > > > > The algorithm you posted on the list in this thread will kill
> > > > > init if on 4Mbyte machine without swap init is large 3 Mbytes
> > > > > and you execute a task that grows over 1M.
> > > > 
> > > > This sounds suspiciously like the description of a DEAD system ;)
> > > 
> > > But wouldn't a watchdog daemon which doesn't allocate any memory
> > > still get run ?
> > 
> > Indeed, it would. It would also /prevent/ the system
> > from automatically rebooting itself into a usable state ;)
> 
> So it's not dead in the "oh, it'll be back in 30 seconds" sense.  
> So our behaviour is broken (more so than random process
> killing).

*nod*

Not killing init when we "should" definately prevents
embedded systems from auto-rebooting when they should
do so.

(OTOH, I don't think embedded systems will run into
this OOM issue too much)

> > > You care about getting an automatic reboot.  So you need to be sure the
> > > watchdog daemon gets killed first or you panic() after some time.
> > 
> > echo 30 > /proc/sys/kernel/panic
> 
> that's what I said.  we need to be sure to _get_ a panic() though.

I believe the kernel automatically panic()s when init
dies ... from kernel/exit.c::do_exit()

if (tsk->pid == 1)
panic("Attempted to kill init!");

[which will make our system auto-reboot and be back on its feet
in a healty state again soon]

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Philipp Rumpf

On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote:
> On Tue, 10 Oct 2000, Philipp Rumpf wrote:
> > > > The algorithm you posted on the list in this thread will kill
> > > > init if on 4Mbyte machine without swap init is large 3 Mbytes
> > > > and you execute a task that grows over 1M.
> > > 
> > > This sounds suspiciously like the description of a DEAD system ;)
> > 
> > But wouldn't a watchdog daemon which doesn't allocate any memory
> > still get run ?
> 
> Indeed, it would. It would also /prevent/ the system
> from automatically rebooting itself into a usable state ;)

So it's not dead in the "oh, it'll be back in 30 seconds" sense.  So our
behaviour is broken (more so than random process killing).

> > You care about getting an automatic reboot.  So you need to be sure the
> > watchdog daemon gets killed first or you panic() after some time.
> 
> echo 30 > /proc/sys/kernel/panic

that's what I said.  we need to be sure to _get_ a panic() though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Rik van Riel

On Tue, 10 Oct 2000, Philipp Rumpf wrote:

> > > The algorithm you posted on the list in this thread will kill
> > > init if on 4Mbyte machine without swap init is large 3 Mbytes
> > > and you execute a task that grows over 1M.
> > 
> > This sounds suspiciously like the description of a DEAD system ;)
> 
> But wouldn't a watchdog daemon which doesn't allocate any memory
> still get run ?

Indeed, it would. It would also /prevent/ the system
from automatically rebooting itself into a usable state ;)

> > (in which case you simply don't care if init is being killed or not)
> 
> You care about getting an automatic reboot.  So you need to be sure the
> watchdog daemon gets killed first or you panic() after some time.

echo 30 > /proc/sys/kernel/panic

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)

2000-10-10 Thread Ingo Oeser

[OOM killer war]

Hi there,

before you argue endlessly about the "Right OOM Killer (TM)", I
did a small patch to allow replacing the OOM killer at runtime.

You can even use modules, if you are careful (see khttpd on how
to do this without refcouting).

So now you can stop arguing about the one and only OOM killer,
implement it, provide it as module and get back to the important
stuff ;-)

PS: Patch is against test10-pre1.

Thanks for listening

Ingo Oeser

--- linux-2.4.0-test10-pre1/mm/oom_kill.c   Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c   Tue Oct 10 16:59:27 2000
@@ -13,6 +13,8 @@
  *  machine) this file will double as a 'coding guide' and a signpost
  *  for newbie kernel hackers. It features several pointers to major
  *  kernel subsystems and hints as to where to find out what things do.
+ *
+ *  Added oom_killer API for special needs - Ingo Oeser
  */
 
 #include 
@@ -136,7 +138,7 @@
 }
 
 /**
- * oom_kill - kill the "best" process when we run out of memory
+ * oom_kill_rik - kill the "best" process when we run out of memory
  *
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
@@ -147,7 +149,9 @@
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
  * we select a process with CAP_SYS_RAW_IO set).
  */
-void oom_kill(void)
+
+
+static void oom_kill_rik(void)
 {
 
struct task_struct *p = select_bad_process();
@@ -207,4 +211,63 @@
 
/* Else... */
return 1;
+}
+
+/* Protects oom_killer against resetting during its execution */
+static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED;
+
+static oom_killer_t oom_killer = oom_kill_rik;
+
+/** 
+ * oom_kill - the oom_kill wrapper for installable OOM killers
+ *
+ * Wraper around the OOM killers, that can be installed via
+ * install_oom_killer and reset_default_oom_killer.
+ *
+ * This gets called from kswapd() in linux/mm/vmscan.c when we 
+ * really run out of memory.
+ */
+void oom_kill(void) {
+   read_lock(_kill_lock);
+   oom_killer();
+   read_unlock(_kill_lock);
+}
+
+/**
+ * install_oom_killer - install alternate OOM killer
+ * @new_oom_kill: the alternate OOM killer provided by the caller
+ *
+ * Since the default OOM killer (oom_kill_rik) is not suitable 
+ * for everyone, we provide an interface to install custom OOM killers.
+ * 
+ * You can take the most appropriate action for your application if the
+ * kernel goes OOM.
+ *
+ * Providing an NULL argument just returns the current OOM killer.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ * 
+ * NOTE: We don't do refcounting on OOM killers, so be careful with 
+ * modules
+ */
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) {
+   oom_killer_t tmp;
+   write_lock(_kill_lock);
+   tmp=oom_killer;
+   if (new_oom_kill) 
+   oom_killer=new_oom_kill;
+   write_unlock(_kill_lock);
+   return tmp;
+}
+
+/**
+ * reset_default_oom_killer - reset back to default OOM killer
+ *
+ * If you are going to unload the module which provided 
+ * your OOM killer, you can install the default one by this.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ */
+oom_killer_t reset_default_oom_killer(void) {
+   return install_oom_killer(_kill_rik);
 }
--- linux-2.4.0-test10-pre1/include/linux/swap.hTue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.hTue Oct 10 16:44:22 2000
@@ -127,8 +127,14 @@
 #define read_swap_cache(entry) read_swap_cache_async(entry, 1);
 
 /* linux/mm/oom_kill.c */
+typedef void (*oom_killer_t)(void);
+
 extern int out_of_memory(void);
 extern void oom_kill(void);
+
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill);
+oom_killer_t reset_default_oom_killer(void);
+
 
 /*
  * Make these inline later once they are working properly.
-- 
Feel the power of the penguin - run [EMAIL PROTECTED]
:x
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Rogier Wolff

Linus Torvalds wrote:
> Basically, the only thing _I_ think X can do is to really say "oh, please
> don't count my memory, because everything I do I do for my clients, not
> for myself". 
> 
> THAT is my argument. Basically there is nothing we can reliably account.
> 
> So we might as well fall back on just saying "X is more important than
> some random client", and have a mm niceness level. Which right now is
> obviously approximated by the IO capabilities tests etc.

FYI:

I ran my machine out of memory (without crashing by the way) this
weekend by loading a whole bunch of large images into netscape. I
noticed not being able to open more windows when I saw my swapspace
exhausted. I noticed the large netscape, and killed it. 

At that moment my X was still taking 80Mb of RAM. I manually killed it
and restarted it to get rid of that memory. 

So if Netscape can "pump" 40 extra megabytes of memory out of X, this
can be exploited. 

Now we're back to the point that a heuristic can never be right all
the time..

Roger. 

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
*   Common sense is the collection of*
**  prejudices acquired by age eighteen.   -- Albert Einstein 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Andrea Arcangeli

On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote:
> If you want init to live - prove that it don't eat too much memory.

I don't see why the machine should be stable only if init is small.
My kernel won't be stable only if init is small since it doesn't cost
anything to handle correctly the big init case.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Andrea Arcangeli

On Tue, Oct 10, 2000 at 04:38:02AM +0100, Philipp Rumpf wrote:
> Init should never die.  If we get to do_exit in init we'll panic which is
> the right thing to do (reboot on critical systems).

If the page fault can fail with OOM on init, init will get a SIGSEGV while
running a signal handler (copy-user will return -EFAULT regardless it was an
oom or a real segfault) and it _won't_ panic and the system is unusable.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Marco Colombo

On Mon, 9 Oct 2000, Linus Torvalds wrote:

> On Mon, 9 Oct 2000, Rik van Riel wrote:
> >
> > > I'd prefer just X having a higher "mm nice level" or something.
> > 
> > Which it has, because:
> > 
> > 1) CAP_RAW_IO
> > 2) p->euid == 0
> 
> Oh, I agree, but we might want to generalize this a bit so that root could
> say "this process is important" and then drop root privileges and still
> get "credited" for the fact that it's important.
> 
> It's not a big deal. It works for X right now.

How about using

p->rlim[RLIMIT_AS].rlim_cur

to weight the badness point for a process?
On my system, a 128MB RAM + 256MB swap, it defaults to some (insane?) value:

bash$ ulimit -vH -vS
virtual memory (kbytes)  4194302
virtual memory (kbytes)  2105343

for every process, which just means it is unused.

The idea is:
1) set default for rlim[RLIMIT_AS].rlim_max to a saner value;
2) processes with higher rlim[RLIMIT_AS].rlim_cur get lower badness.

This way, the badness of a process is not proportional to its absolute
size, but to the fraction of allowed AS it is using. Processes
that are capable(CAP_SYS_RESOURCE) can set RLIMIT_AS to a very high value,
so they get less badness point. X is a perfect candidate.

User's runaway processes (netscape) will have lower rlim[RLIMIT_AS].rlim_cur,
thus will get higher badness.

Something like:

-   points = p->mm->total_vm;
+   points = p->mm->total_vm / (p->rlim[RLIMIT_AS].rlim_cur << AS_FACTOR);

with

#define AS_FACTOR 30

maybe? (this is Rik's call, he knows better than me how to balance it...)

It's simple, it's configurable. 1) may be enforced by the kernel, or
completely left to user space.
On my system, in its default configuration (no use of RLIMIT_AS),
it has no impact at all (all processes have the same limit).

Sounds good or am I missing something?

> 
>   Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 

.TM.
-- 
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread J.A. Sutherland

--On 09 October 2000, 17:40 -0300 Rik van Riel <[EMAIL PROTECTED]>
wrote:
> On Mon, 9 Oct 2000, James Sutherland wrote:
>> On Mon, 9 Oct 2000, Ingo Molnar wrote:
>> > On Mon, 9 Oct 2000, Rik van Riel wrote:
>> > 
>> > > > so dns helper is killed first, then netscape. (my idea might not
>> > > > make sense though.)
>> > > 
>> > > It makes some sense, but I don't think OOM is something that
>> > > occurs often enough to care about it /that/ much...
>> > 
>> > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup
>> > case, with 4MB RAM and no swap, where the admin tries to exec a 2MB
>> > process. I think it's a legitimate concern - i cannot know in advance
>> > whether a freshly started process would trigger an OOM or not.
>> 
>> Shouldn't the runtime factor handle this, making sure the new
>> process is killed? (Maybe not if you're almost OOM right from
>> the word go, and run this process straight off... Hrm.)
> 
> It should.
> 
> Also, the example is a tad unrealistic since init seems to be
> around 70 kB in size on my systems ;)

In extreme cases, though, you could arrange things so the
machine only has 100K of RAM when it loads init, at which
point init tries running, say, rc.sysinit - and everything goes 
bang. Of course, a machine like that won't be very much use
anyway...

More realistically, though, I could be running with something
like init=/bin/sash - does your statically linked sash binary
fit in 70K? :-)


James.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Jamie Lokier

Andreas Dilger wrote:
> Having a SIGDANGER handler is good for 2 reasons:
> 1) Lets processes know when memory is short so they can free needless cache.
> 2) Mark process with a SIGDANGER handler as "more important" than those
>without.  Most people won't care about this, but init, and X, and
>long-running simulations might.

For point 1, it would be much nicer to have user processes participate
in memory balancing _before_ getting anywhere near an OOM state.

A nice way is to send SIGDANGER with siginfo saying how much memory the
kernel wants back (or how fast).  Applications that don't know to use
that info, but do have a SIGDANGER handler, will still react just rather
more severely.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Jamie Lokier

Albert D. Cahalan wrote:
> X, and any other big friendly processes, could participate in
> memory balancing operations. X could be made to clean out a
> font cache when the kernel signals that memory is low. When
> the situation becomes serious, X could just mmap /dev/zero over
> top of the background image.

Haven't we already had this discussion?  Quite a lot of programs have
cached data (X fonts, Netscape (lots!)), GC-able data (Emacs, Java
etc.), data that can simply be discarded (X window backing stores), or
data that can be written to disk on demand (Netscape again).

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread john slee

On Mon, Oct 09, 2000 at 06:34:29PM -0300, Rik van Riel wrote:
> On Mon, 9 Oct 2000, Ingo Molnar wrote:
> > On Mon, 9 Oct 2000, Rik van Riel wrote:
> > 
> > > Would this complexity /really/ be worth it for the twice-yearly OOM
> > > situation?
> > 
> > the only reason i suggested this was the init=/bin/bash, 4MB
> > RAM, no swap emergency-bootup case. We must not kill init in
> > that case - if the current code doesnt then great and none of
> > this is needed.

perhaps a boot time option oom=0 ?  since oom is such a rare case, this
wouldn't impact normal usage...

-- 
john slee <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Olaf Titz

> > Still, it would be nice to recover that 4 MB when the system
> > doesn't have any memory left.
> Yup. The X server could give back the memory for some cases like the
> background without too much hackery.

Then Linux only needs to implement SIGDANGER, which has been talked
about for years...

X would be a good candidate to implement a handler for it. Others are
Emacs, Mozilla or JVMs - basically everything which has a GC of some
sort. It could even be used to implement a configurable user mode OOM
killer.

Olaf

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-10 Thread Helge Hafting

Andrea Arcangeli wrote:
> 
> On Mon, Oct 09, 2000 at 08:42:26PM +0200, Ingo Molnar wrote:
> > ignoring the kill would just preserve those bugs artificially.
> 
> If the oom killer kills a thing like init by mistake or init has a memleak
> you'll notice both problems regardless of having a magic for init in a _very_
> slow path so I don't buy your point.
> .
> For corretness init must not be killed ever, period.
> 
> So you have two choices:
> 
> o   math proof that the current algorithm without the magic can't end
> killing init (and I should be able to proof the other way around
> instead)
> 
> o   have a magic check for init
> 
> So the magic is _strictly_ necessary at the moment.

A well-written init will be saved by being the oldest process around.
A memory-leaking init _will_ be killed even whith your magic test,
when the kernel eventually gets stuck OOM and init is the only
process left (all the other have been OOM-killed before.)  
A deadlocked kernel don't schedule any processes, so they are all dead.

If you want init to live - prove that it don't eat too much memory.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Helge Hafting

Andrea Arcangeli wrote:
 
 On Mon, Oct 09, 2000 at 08:42:26PM +0200, Ingo Molnar wrote:
  ignoring the kill would just preserve those bugs artificially.
 
 If the oom killer kills a thing like init by mistake or init has a memleak
 you'll notice both problems regardless of having a magic for init in a _very_
 slow path so I don't buy your point.
 .
 For corretness init must not be killed ever, period.
 
 So you have two choices:
 
 o   math proof that the current algorithm without the magic can't end
 killing init (and I should be able to proof the other way around
 instead)
 
 o   have a magic check for init
 
 So the magic is _strictly_ necessary at the moment.

A well-written init will be saved by being the oldest process around.
A memory-leaking init _will_ be killed even whith your magic test,
when the kernel eventually gets stuck OOM and init is the only
process left (all the other have been OOM-killed before.)  
A deadlocked kernel don't schedule any processes, so they are all dead.

If you want init to live - prove that it don't eat too much memory.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Jamie Lokier

Albert D. Cahalan wrote:
 X, and any other big friendly processes, could participate in
 memory balancing operations. X could be made to clean out a
 font cache when the kernel signals that memory is low. When
 the situation becomes serious, X could just mmap /dev/zero over
 top of the background image.

Haven't we already had this discussion?  Quite a lot of programs have
cached data (X fonts, Netscape (lots!)), GC-able data (Emacs, Java
etc.), data that can simply be discarded (X window backing stores), or
data that can be written to disk on demand (Netscape again).

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Jamie Lokier

Andreas Dilger wrote:
 Having a SIGDANGER handler is good for 2 reasons:
 1) Lets processes know when memory is short so they can free needless cache.
 2) Mark process with a SIGDANGER handler as "more important" than those
without.  Most people won't care about this, but init, and X, and
long-running simulations might.

For point 1, it would be much nicer to have user processes participate
in memory balancing _before_ getting anywhere near an OOM state.

A nice way is to send SIGDANGER with siginfo saying how much memory the
kernel wants back (or how fast).  Applications that don't know to use
that info, but do have a SIGDANGER handler, will still react just rather
more severely.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread J.A. Sutherland

--On 09 October 2000, 17:40 -0300 Rik van Riel [EMAIL PROTECTED]
wrote:
 On Mon, 9 Oct 2000, James Sutherland wrote:
 On Mon, 9 Oct 2000, Ingo Molnar wrote:
  On Mon, 9 Oct 2000, Rik van Riel wrote:
  
so dns helper is killed first, then netscape. (my idea might not
make sense though.)
   
   It makes some sense, but I don't think OOM is something that
   occurs often enough to care about it /that/ much...
  
  i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup
  case, with 4MB RAM and no swap, where the admin tries to exec a 2MB
  process. I think it's a legitimate concern - i cannot know in advance
  whether a freshly started process would trigger an OOM or not.
 
 Shouldn't the runtime factor handle this, making sure the new
 process is killed? (Maybe not if you're almost OOM right from
 the word go, and run this process straight off... Hrm.)
 
 It should.
 
 Also, the example is a tad unrealistic since init seems to be
 around 70 kB in size on my systems ;)

In extreme cases, though, you could arrange things so the
machine only has 100K of RAM when it loads init, at which
point init tries running, say, rc.sysinit - and everything goes 
bang. Of course, a machine like that won't be very much use
anyway...

More realistically, though, I could be running with something
like init=/bin/sash - does your statically linked sash binary
fit in 70K? :-)


James.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Marco Colombo

On Mon, 9 Oct 2000, Linus Torvalds wrote:

 On Mon, 9 Oct 2000, Rik van Riel wrote:
 
   I'd prefer just X having a higher "mm nice level" or something.
  
  Which it has, because:
  
  1) CAP_RAW_IO
  2) p-euid == 0
 
 Oh, I agree, but we might want to generalize this a bit so that root could
 say "this process is important" and then drop root privileges and still
 get "credited" for the fact that it's important.
 
 It's not a big deal. It works for X right now.

How about using

p-rlim[RLIMIT_AS].rlim_cur

to weight the badness point for a process?
On my system, a 128MB RAM + 256MB swap, it defaults to some (insane?) value:

bash$ ulimit -vH -vS
virtual memory (kbytes)  4194302
virtual memory (kbytes)  2105343

for every process, which just means it is unused.

The idea is:
1) set default for rlim[RLIMIT_AS].rlim_max to a saner value;
2) processes with higher rlim[RLIMIT_AS].rlim_cur get lower badness.

This way, the badness of a process is not proportional to its absolute
size, but to the fraction of allowed AS it is using. Processes
that are capable(CAP_SYS_RESOURCE) can set RLIMIT_AS to a very high value,
so they get less badness point. X is a perfect candidate.

User's runaway processes (netscape) will have lower rlim[RLIMIT_AS].rlim_cur,
thus will get higher badness.

Something like:

-   points = p-mm-total_vm;
+   points = p-mm-total_vm / (p-rlim[RLIMIT_AS].rlim_cur  AS_FACTOR);

with

#define AS_FACTOR 30

maybe? (this is Rik's call, he knows better than me how to balance it...)

It's simple, it's configurable. 1) may be enforced by the kernel, or
completely left to user space.
On my system, in its default configuration (no use of RLIMIT_AS),
it has no impact at all (all processes have the same limit).

Sounds good or am I missing something?

 
   Linus
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
 

.TM.
-- 
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Andrea Arcangeli

On Tue, Oct 10, 2000 at 04:38:02AM +0100, Philipp Rumpf wrote:
 Init should never die.  If we get to do_exit in init we'll panic which is
 the right thing to do (reboot on critical systems).

If the page fault can fail with OOM on init, init will get a SIGSEGV while
running a signal handler (copy-user will return -EFAULT regardless it was an
oom or a real segfault) and it _won't_ panic and the system is unusable.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Andrea Arcangeli

On Tue, Oct 10, 2000 at 09:06:49AM +0200, Helge Hafting wrote:
 If you want init to live - prove that it don't eat too much memory.

I don't see why the machine should be stable only if init is small.
My kernel won't be stable only if init is small since it doesn't cost
anything to handle correctly the big init case.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Rogier Wolff

Linus Torvalds wrote:
 Basically, the only thing _I_ think X can do is to really say "oh, please
 don't count my memory, because everything I do I do for my clients, not
 for myself". 
 
 THAT is my argument. Basically there is nothing we can reliably account.
 
 So we might as well fall back on just saying "X is more important than
 some random client", and have a mm niceness level. Which right now is
 obviously approximated by the IO capabilities tests etc.

FYI:

I ran my machine out of memory (without crashing by the way) this
weekend by loading a whole bunch of large images into netscape. I
noticed not being able to open more windows when I saw my swapspace
exhausted. I noticed the large netscape, and killed it. 

At that moment my X was still taking 80Mb of RAM. I manually killed it
and restarted it to get rid of that memory. 

So if Netscape can "pump" 40 extra megabytes of memory out of X, this
can be exploited. 

Now we're back to the point that a heuristic can never be right all
the time..

Roger. 

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
*   Common sense is the collection of*
**  prejudices acquired by age eighteen.   -- Albert Einstein 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Rik van Riel

On Tue, 10 Oct 2000, Philipp Rumpf wrote:

   The algorithm you posted on the list in this thread will kill
   init if on 4Mbyte machine without swap init is large 3 Mbytes
   and you execute a task that grows over 1M.
  
  This sounds suspiciously like the description of a DEAD system ;)
 
 But wouldn't a watchdog daemon which doesn't allocate any memory
 still get run ?

Indeed, it would. It would also /prevent/ the system
from automatically rebooting itself into a usable state ;)

  (in which case you simply don't care if init is being killed or not)
 
 You care about getting an automatic reboot.  So you need to be sure the
 watchdog daemon gets killed first or you panic() after some time.

echo 30  /proc/sys/kernel/panic

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-10 Thread Ingo Oeser

[OOM killer war]

Hi there,

before you argue endlessly about the "Right OOM Killer (TM)", I
did a small patch to allow replacing the OOM killer at runtime.

You can even use modules, if you are careful (see khttpd on how
to do this without refcouting).

So now you can stop arguing about the one and only OOM killer,
implement it, provide it as module and get back to the important
stuff ;-)

PS: Patch is against test10-pre1.

Thanks for listening

Ingo Oeser

--- linux-2.4.0-test10-pre1/mm/oom_kill.c   Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c   Tue Oct 10 16:59:27 2000
@@ -13,6 +13,8 @@
  *  machine) this file will double as a 'coding guide' and a signpost
  *  for newbie kernel hackers. It features several pointers to major
  *  kernel subsystems and hints as to where to find out what things do.
+ *
+ *  Added oom_killer API for special needs - Ingo Oeser
  */
 
 #include linux/mm.h
@@ -136,7 +138,7 @@
 }
 
 /**
- * oom_kill - kill the "best" process when we run out of memory
+ * oom_kill_rik - kill the "best" process when we run out of memory
  *
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
@@ -147,7 +149,9 @@
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
  * we select a process with CAP_SYS_RAW_IO set).
  */
-void oom_kill(void)
+
+
+static void oom_kill_rik(void)
 {
 
struct task_struct *p = select_bad_process();
@@ -207,4 +211,63 @@
 
/* Else... */
return 1;
+}
+
+/* Protects oom_killer against resetting during its execution */
+static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED;
+
+static oom_killer_t oom_killer = oom_kill_rik;
+
+/** 
+ * oom_kill - the oom_kill wrapper for installable OOM killers
+ *
+ * Wraper around the OOM killers, that can be installed via
+ * install_oom_killer and reset_default_oom_killer.
+ *
+ * This gets called from kswapd() in linux/mm/vmscan.c when we 
+ * really run out of memory.
+ */
+void oom_kill(void) {
+   read_lock(oom_kill_lock);
+   oom_killer();
+   read_unlock(oom_kill_lock);
+}
+
+/**
+ * install_oom_killer - install alternate OOM killer
+ * @new_oom_kill: the alternate OOM killer provided by the caller
+ *
+ * Since the default OOM killer (oom_kill_rik) is not suitable 
+ * for everyone, we provide an interface to install custom OOM killers.
+ * 
+ * You can take the most appropriate action for your application if the
+ * kernel goes OOM.
+ *
+ * Providing an NULL argument just returns the current OOM killer.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ * 
+ * NOTE: We don't do refcounting on OOM killers, so be careful with 
+ * modules
+ */
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) {
+   oom_killer_t tmp;
+   write_lock(oom_kill_lock);
+   tmp=oom_killer;
+   if (new_oom_kill) 
+   oom_killer=new_oom_kill;
+   write_unlock(oom_kill_lock);
+   return tmp;
+}
+
+/**
+ * reset_default_oom_killer - reset back to default OOM killer
+ *
+ * If you are going to unload the module which provided 
+ * your OOM killer, you can install the default one by this.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ */
+oom_killer_t reset_default_oom_killer(void) {
+   return install_oom_killer(oom_kill_rik);
 }
--- linux-2.4.0-test10-pre1/include/linux/swap.hTue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.hTue Oct 10 16:44:22 2000
@@ -127,8 +127,14 @@
 #define read_swap_cache(entry) read_swap_cache_async(entry, 1);
 
 /* linux/mm/oom_kill.c */
+typedef void (*oom_killer_t)(void);
+
 extern int out_of_memory(void);
 extern void oom_kill(void);
+
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill);
+oom_killer_t reset_default_oom_killer(void);
+
 
 /*
  * Make these inline later once they are working properly.
-- 
Feel the power of the penguin - run [EMAIL PROTECTED]
esc:x
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Rik van Riel

On Tue, 10 Oct 2000, Philipp Rumpf wrote:
 On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote:
  On Tue, 10 Oct 2000, Philipp Rumpf wrote:
 The algorithm you posted on the list in this thread will kill
 init if on 4Mbyte machine without swap init is large 3 Mbytes
 and you execute a task that grows over 1M.

This sounds suspiciously like the description of a DEAD system ;)
   
   But wouldn't a watchdog daemon which doesn't allocate any memory
   still get run ?
  
  Indeed, it would. It would also /prevent/ the system
  from automatically rebooting itself into a usable state ;)
 
 So it's not dead in the "oh, it'll be back in 30 seconds" sense.  
 So our behaviour is broken (more so than random process
 killing).

*nod*

Not killing init when we "should" definately prevents
embedded systems from auto-rebooting when they should
do so.

(OTOH, I don't think embedded systems will run into
this OOM issue too much)

   You care about getting an automatic reboot.  So you need to be sure the
   watchdog daemon gets killed first or you panic() after some time.
  
  echo 30  /proc/sys/kernel/panic
 
 that's what I said.  we need to be sure to _get_ a panic() though.

I believe the kernel automatically panic()s when init
dies ... from kernel/exit.c::do_exit()

if (tsk-pid == 1)
panic("Attempted to kill init!");

[which will make our system auto-reboot and be back on its feet
in a healty state again soon]

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-10 Thread Rik van Riel

On Tue, 10 Oct 2000, Ingo Oeser wrote:

 before you argue endlessly about the "Right OOM Killer (TM)", I
 did a small patch to allow replacing the OOM killer at runtime.
 
 So now you can stop arguing about the one and only OOM killer,
 implement it, provide it as module and get back to the important
 stuff ;-)

This is definately a cool toy for people who have doubts
that my OOM killer will do the wrong thing in their
workloads.

If anyone can demonstrate that the current OOM killer is
doing the wrong thing and has a replacement algorithm
available, please let us know ... ;)

[lets move the discussion back to a less theoretical and
more practical point of view]

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Philipp Rumpf

On Tue, Oct 10, 2000 at 12:30:51PM -0300, Rik van Riel wrote:
 Not killing init when we "should" definately prevents
 embedded systems from auto-rebooting when they should
 do so.
 
 (OTOH, I don't think embedded systems will run into
 this OOM issue too much)

but when they do, they're hard to fix.  Think about an elevator control
system with a single process that happens to implement a somewhat broken
version of the elevator algorithm ;)

  that's what I said.  we need to be sure to _get_ a panic() though.
 
 I believe the kernel automatically panic()s when init
 dies ... from kernel/exit.c::do_exit()
 
 if (tsk-pid == 1)
 panic("Attempted to kill init!");

guess who added that code.  We still kill init with SIGTERM which doesn't
seem to work though.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-10 Thread Ingo Oeser

On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
  So now you can stop arguing about the one and only OOM killer,
  implement it, provide it as module and get back to the important
  stuff ;-)
 
 This is definately a cool toy for people who have doubts
 that my OOM killer will do the wrong thing in their
 workloads.

Thanks ;-)

But I forgot to include my changes to the mm/Makefile (to export
the API for modules).

Here is a _working_ one:

--- linux-2.4.0-test10-pre1/mm/oom_kill.c   Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c   Tue Oct 10 16:59:27 2000
@@ -13,6 +13,8 @@
  *  machine) this file will double as a 'coding guide' and a signpost
  *  for newbie kernel hackers. It features several pointers to major
  *  kernel subsystems and hints as to where to find out what things do.
+ *
+ *  Added oom_killer API for special needs - Ingo Oeser
  */
 
 #include linux/mm.h
@@ -136,7 +138,7 @@
 }
 
 /**
- * oom_kill - kill the "best" process when we run out of memory
+ * oom_kill_rik - kill the "best" process when we run out of memory
  *
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
@@ -147,7 +149,9 @@
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
  * we select a process with CAP_SYS_RAW_IO set).
  */
-void oom_kill(void)
+
+
+static void oom_kill_rik(void)
 {
 
struct task_struct *p = select_bad_process();
@@ -207,4 +211,63 @@
 
/* Else... */
return 1;
+}
+
+/* Protects oom_killer against resetting during its execution */
+static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED;
+
+static oom_killer_t oom_killer = oom_kill_rik;
+
+/** 
+ * oom_kill - the oom_kill wrapper for installable OOM killers
+ *
+ * Wraper around the OOM killers, that can be installed via
+ * install_oom_killer and reset_default_oom_killer.
+ *
+ * This gets called from kswapd() in linux/mm/vmscan.c when we 
+ * really run out of memory.
+ */
+void oom_kill(void) {
+   read_lock(oom_kill_lock);
+   oom_killer();
+   read_unlock(oom_kill_lock);
+}
+
+/**
+ * install_oom_killer - install alternate OOM killer
+ * @new_oom_kill: the alternate OOM killer provided by the caller
+ *
+ * Since the default OOM killer (oom_kill_rik) is not suitable 
+ * for everyone, we provide an interface to install custom OOM killers.
+ * 
+ * You can take the most appropriate action for your application if the
+ * kernel goes OOM.
+ *
+ * Providing an NULL argument just returns the current OOM killer.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ * 
+ * NOTE: We don't do refcounting on OOM killers, so be careful with 
+ * modules
+ */
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) {
+   oom_killer_t tmp;
+   write_lock(oom_kill_lock);
+   tmp=oom_killer;
+   if (new_oom_kill) 
+   oom_killer=new_oom_kill;
+   write_unlock(oom_kill_lock);
+   return tmp;
+}
+
+/**
+ * reset_default_oom_killer - reset back to default OOM killer
+ *
+ * If you are going to unload the module which provided 
+ * your OOM killer, you can install the default one by this.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ */
+oom_killer_t reset_default_oom_killer(void) {
+   return install_oom_killer(oom_kill_rik);
 }
--- linux-2.4.0-test10-pre1/include/linux/swap.hTue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.hTue Oct 10 16:44:22 2000
@@ -127,8 +127,14 @@
 #define read_swap_cache(entry) read_swap_cache_async(entry, 1);
 
 /* linux/mm/oom_kill.c */
+typedef void (*oom_killer_t)(void);
+
 extern int out_of_memory(void);
 extern void oom_kill(void);
+
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill);
+oom_killer_t reset_default_oom_killer(void);
+
 
 /*
  * Make these inline later once they are working properly.
--- linux-2.4.0-test10-pre1/mm/Makefile Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/Makefile Tue Oct 10 16:34:06 2000
@@ -10,7 +10,8 @@
 O_TARGET := mm.o
 O_OBJS  := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \
vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \
-   page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o
+   page_alloc.o swap_state.o swapfile.o numa.o
+OX_OBJS  := oom_kill.o
 
 ifeq ($(CONFIG_HIGHMEM),y)
 O_OBJS += highmem.o

Regards

Ingo Oeser
-- 
Feel the power of the penguin - run [EMAIL PROTECTED]
esc:x
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Linus Torvalds



On Tue, 10 Oct 2000, Rogier Wolff wrote:
 
 So if Netscape can "pump" 40 extra megabytes of memory out of X, this
 can be exploited. 
 
 Now we're back to the point that a heuristic can never be right all
 the time..

I agree. In fact, we never left that.

Nothing is perfect.

In fact, a lot of engineering is _recognizing_ that you can never achieve
"perfect", and you're much better off not even trying - and having a
simple system that is "good enough".

This is the old adage of "perfect is the enemy of good" - trying too hard
is actually _detrimental_ in 99% of all cases. We should have simple
heuristics that work most of the time, instead of trying to cajole a
complex system like X to help us do some complicated resource management
system.

Complexity will just result in the OOM killer failing in surprising ways.

A simple heuristic will mean that the OOM killer will still fail, but at
least it won't be be in subtle and surprising ways.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-10 Thread Miles Lane

Olaf Titz wrote:
 
   Still, it would be nice to recover that 4 MB when the system
   doesn't have any memory left.
  Yup. The X server could give back the memory for some cases like the
  background without too much hackery.
 
 Then Linux only needs to implement SIGDANGER, which has been talked
 about for years...
 
 X would be a good candidate to implement a handler for it. Others are
 Emacs, Mozilla or JVMs - basically everything which has a GC of some
 sort. It could even be used to implement a configurable user mode OOM
 killer.

It would be good to talk to the KDE and Gnome folks about this as well.
I am pretty sure they have large blocks of memory that could be flushed
or freed in a low-memory or OOM condition.

Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-10 Thread Tom Rini

On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
 On Tue, 10 Oct 2000, Ingo Oeser wrote:
 
  before you argue endlessly about the "Right OOM Killer (TM)", I
  did a small patch to allow replacing the OOM killer at runtime.
  
  So now you can stop arguing about the one and only OOM killer,
  implement it, provide it as module and get back to the important
  stuff ;-)
 
 This is definately a cool toy for people who have doubts
 that my OOM killer will do the wrong thing in their
 workloads.

I think this can be useful for more than just a cool toy.  I think that the
main thing that this discusion has shown is no OOM killer will please 100% of
the people 100% of the time.  I think we should try and have a good generic
OOM killer that kills the right process most of the time.  People can impliment
(and submit) different-style OOM killers as needed.  Or at least get 'em on
freshmeat. :)

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-10 Thread Rik van Riel

On Tue, 10 Oct 2000, Tom Rini wrote:
 On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
  On Tue, 10 Oct 2000, Ingo Oeser wrote:
  
   before you argue endlessly about the "Right OOM Killer (TM)", I
   did a small patch to allow replacing the OOM killer at runtime.
   
   So now you can stop arguing about the one and only OOM killer,
   implement it, provide it as module and get back to the important
   stuff ;-)
  
  This is definately a cool toy for people who have doubts
  that my OOM killer will do the wrong thing in their
  workloads.
 
 I think this can be useful for more than just a cool toy.  I
 think that the main thing that this discusion has shown is no
 OOM killer will please 100% of the people 100% of the time.  I
 think we should try and have a good generic OOM killer that
 kills the right process most of the time.  People can impliment
 (and submit) different-style OOM killers as needed.

Indeed, though I suspect most of the people trying this would
fall into the trap of over-engineering their OOM killer, after
which it mostly becomes less predictable ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 OOM handler)

2000-10-10 Thread Tom Rini

On Tue, Oct 10, 2000 at 05:58:46PM -0300, Rik van Riel wrote:
 On Tue, 10 Oct 2000, Tom Rini wrote:
  On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
   On Tue, 10 Oct 2000, Ingo Oeser wrote:
   
before you argue endlessly about the "Right OOM Killer (TM)", I
did a small patch to allow replacing the OOM killer at runtime.

So now you can stop arguing about the one and only OOM killer,
implement it, provide it as module and get back to the important
stuff ;-)
   
   This is definately a cool toy for people who have doubts
   that my OOM killer will do the wrong thing in their
   workloads.
  
  I think this can be useful for more than just a cool toy.  I
  think that the main thing that this discusion has shown is no
  OOM killer will please 100% of the people 100% of the time.  I
  think we should try and have a good generic OOM killer that
  kills the right process most of the time.  People can impliment
  (and submit) different-style OOM killers as needed.
 
 Indeed, though I suspect most of the people trying this would
 fall into the trap of over-engineering their OOM killer, after
 which it mostly becomes less predictable ;)

I was thinking more along the lines of ones w/ "safety" features that not
everyone might like/need (ie /usr/local/bin/foo is always good, those
sugjestions).  It seems like useful functionality at little/no cost.
And a neat toy for now. :)

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread David Ford

Andreas Dilger wrote:

> Albert D. Cahalan wrote:
> > X, and any other big friendly processes, could participate in
> > memory balancing operations. X could be made to clean out a
>
> Gerrit Huizenga wrote:
> > Anyway, there is/was an API in PTX to say (either from in-kernel or through
> > some user machinations) "I Am a System Process".  Turns on a bit in the
>
> On AIX there is a signal called SIGDANGER, which is basically what you
> are looking for.  By default it is ignored, but for processes that care
> (e.g. init, X, whatever) they can register a SIGDANGER handler.  At an
> "urgent" (as oposed to "critical") OOM situation, all processes get a
> SIGDANGER sent to them.  Most will ignore it, but ones with handlers
> can free caches, try to do a clean shutdown, whatever.  Any process with
> a SIGDANGER handler get a reduction of "badness" (as the OOM killer calls
> it) when looking for processes to kill.
>
> Having a SIGDANGER handler is good for 2 reasons:
> 1) Lets processes know when memory is short so they can free needless cache.
> 2) Mark process with a SIGDANGER handler as "more important" than those
>without.  Most people won't care about this, but init, and X, and
>long-running simulations might.

Is there any reason why we can't do something like this for 2.5?

-d

--
  "There is a natural aristocracy among men. The grounds of this are
  virtue and talents", Thomas Jefferson [1742-1826], 3rd US President



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Andreas Dilger

> Rik van Riel wrote:
> > > How about SIGTERM a bit before SIGKILL then re-evaluate the OOM
> > > N usecs later?
> >
> > And run the risk of having to kill /another/ process as well ?
> >
> > I really don't know if that would be a wise thing to do
> > (but feel free to do some tests to see if your idea would
> > work ... I'd love to hear some test results with your idea).

David Ford writes:
> I was thinking (dangerous) about an urgent v.s. critical OOM.  urgent could
> trigger a SIGTERM which would give advance notice to the offending process.
> I don't think we have a signal method of notifying processes when resources
> are critically low, feel free to correct me.
> 
> Is there a signal that -might- be used for this?

Albert D. Cahalan wrote:
> X, and any other big friendly processes, could participate in
> memory balancing operations. X could be made to clean out a
> font cache when the kernel signals that memory is low. When
> the situation becomes serious, X could just mmap /dev/zero over
> top of the background image.
>
> Netscape could even be hacked to dump old junk... or if it is
> just too leaky, it could exec itself to fix the problem.

Gerrit Huizenga wrote:
> Anyway, there is/was an API in PTX to say (either from in-kernel or through
> some user machinations) "I Am a System Process".  Turns on a bit in the
> proc struct (task struct) that made it exempt from death from a variety
> of sources, e.g. OOM, generic user signals, portions of system shutdown,
> etc.
> 
> Then, the code looking for things to kill simply skips those that are
> intelligently marked, taking most of the decision making/policy making
> out of the scheduler/memory manager.

On AIX there is a signal called SIGDANGER, which is basically what you
are looking for.  By default it is ignored, but for processes that care
(e.g. init, X, whatever) they can register a SIGDANGER handler.  At an
"urgent" (as oposed to "critical") OOM situation, all processes get a
SIGDANGER sent to them.  Most will ignore it, but ones with handlers
can free caches, try to do a clean shutdown, whatever.  Any process with
a SIGDANGER handler get a reduction of "badness" (as the OOM killer calls
it) when looking for processes to kill.

Having a SIGDANGER handler is good for 2 reasons:
1) Lets processes know when memory is short so they can free needless cache.
2) Mark process with a SIGDANGER handler as "more important" than those
   without.  Most people won't care about this, but init, and X, and
   long-running simulations might.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Philipp Rumpf

> If init dies the kernel hangs solid anyway

Init should never die.  If we get to do_exit in init we'll panic which is
the right thing to do (reboot on critical systems).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Philipp Rumpf

> (but I'd be curious if somebody actually manages to
> trick the OOM killer into killing init ... please
> test a bit more to see if this really happens ;))

In a non-real-world situation, yes.  (mem=3500k, many drivers, init=/bin/bash,
tried to enter a command).  Since the process in question (bash) ignores
SIGTERM, I actually got a hard hang. 

We really should turn this into a panic() (panic means your elevator control
system reboots and maybe misses the right floor.  hard hang means you need
to reboot manually).


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Jim Gettys


"Albert D. Cahalan" <[EMAIL PROTECTED]> writes: 
> Date: Mon, 9 Oct 2000 19:13:25 -0400 (EDT)
>
> >> From: Linus Torvalds <[EMAIL PROTECTED]>
> 
> >> One of the biggest bitmaps is the background bitmap. So you have a
> >> client that uploads it to X and then goes away. There's nobody to
> >> un-count to by the time X decides to switch to another background.
> >
> > Actually, the big offenders are things other than the background
> > bitmap: things like E do absolutely insane things, you would not
> > believe (or maybe you would).  The background pixmap is generally
> > in the worst case typically no worse than 4 megabytes (for those
> > people who are crazy enough to put images up as their root window
> > on 32 bit deep displays, at 1kX1k resolution).
> 
> Still, it would be nice to recover that 4 MB when the system
> doesn't have any memory left.
> 

Yup. The X server could give back the memory for some cases like the
background without too much hackery.

> X, and any other big friendly processes, could participate in
> memory balancing operations. X could be made to clean out a
> font cache when the kernel signals that memory is low. When
> the situation becomes serious, X could just mmap /dev/zero over
> top of the background image.

I agree in principle, though the problem is difficult, as the memory pool 
may get fragmented... Most memory usage is less monolithic than the 
background pixmap.

And maintaining separate memory pools often wastes more memory than it
saves.

> 
> Netscape could even be hacked to dump old junk... or if it is
> just too leaky, it could exec itself to fix the problem.

Netscape 4.x is hopeless; it is leakier than the Titanic.  There is hope 
for Mozilla.
- Jim


--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Ingo Oeser

On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote:
> > If the oom killer kills a thing like init by mistake
> That only happens in the "random" OOM killer 2.2 has ...

[OOM killer war]

Hi there,

before you argue endlessly about the "Right OOM Killer (TM)", I
did a small patch to allow replacing the OOM killer at runtime.

You can even use modules, if you are careful (see khttpd on how
to do this without refcouting).

So now you can stop arguing about the one and only OOM killer,
implement it, provide it as module and get back to the important
stuff ;-)

PS: Patch is against test9 with Rik's latest vmpatch applied.

Thanks for listening

Ingo Oeser

diff -Naur linux-2.4.0-test9-vmpatch/include/linux/swap.h 
linux-2.4.0-test9-vmpatch-ioe/include/linux/swap.h
--- linux-2.4.0-test9-vmpatch/include/linux/swap.h  Sun Oct  8 00:49:17 2000
+++ linux-2.4.0-test9-vmpatch-ioe/include/linux/swap.h  Tue Oct 10 00:50:17 2000
@@ -129,6 +129,9 @@
 /* linux/mm/oom_kill.c */
 extern int out_of_memory(void);
 extern void oom_kill(void);
+void install_oom_killer(void (*new_oom_kill)(void));
+void reset_default_oom_killer(void);
+
 
 /*
  * Make these inline later once they are working properly.
diff -Naur linux-2.4.0-test9-vmpatch/mm/Makefile 
linux-2.4.0-test9-vmpatch-ioe/mm/Makefile
--- linux-2.4.0-test9-vmpatch/mm/Makefile   Sun Oct  8 00:49:17 2000
+++ linux-2.4.0-test9-vmpatch-ioe/mm/Makefile   Tue Oct 10 00:10:07 2000
@@ -10,7 +10,8 @@
 O_TARGET := mm.o
 O_OBJS  := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \
vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \
-   page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o
+   page_alloc.o swap_state.o swapfile.o numa.o
+OX_OBJS  := oom_kill.o
 
 ifeq ($(CONFIG_HIGHMEM),y)
 O_OBJS += highmem.o
diff -Naur linux-2.4.0-test9-vmpatch/mm/oom_kill.c 
linux-2.4.0-test9-vmpatch-ioe/mm/oom_kill.c
--- linux-2.4.0-test9-vmpatch/mm/oom_kill.c Sun Oct  8 00:49:17 2000
+++ linux-2.4.0-test9-vmpatch-ioe/mm/oom_kill.c Tue Oct 10 00:35:32 2000
@@ -13,6 +13,8 @@
  *  machine) this file will double as a 'coding guide' and a signpost
  *  for newbie kernel hackers. It features several pointers to major
  *  kernel subsystems and hints as to where to find out what things do.
+ *
+ *  Added oom_killer API for special needs - Ingo Oeser
  */
 
 #include 
@@ -147,7 +149,9 @@
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
  * we select a process with CAP_SYS_RAW_IO set).
  */
-void oom_kill(void)
+
+
+static void oom_kill_rik(void)
 {
 
struct task_struct *p = select_bad_process();
@@ -207,4 +211,26 @@
 
/* Else... */
return 1;
+}
+
+/* Protects oom_killer against resetting during its execution */
+static rwlock_t oom_kill_lock;
+
+static void (*oom_killer)(void)=oom_kill_rik;
+
+void oom_kill(void) {
+   read_lock(_kill_lock);
+   oom_killer();
+   read_unlock(_kill_lock);
+}
+
+void install_oom_killer(void (*new_oom_kill)(void)) {
+   if (!new_oom_kill) return;
+   write_lock(_kill_lock);
+   oom_killer=new_oom_kill;
+   write_unlock(_kill_lock);
+}
+
+void reset_default_oom_killer(void) {
+   install_oom_killer(_kill_rik);
 }

-- 
Feel the power of the penguin - run [EMAIL PROTECTED]
:x
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Rik van Riel

On Mon, 9 Oct 2000, Albert D. Cahalan wrote:
> Jim Gettys writes:
> >> From: Linus Torvalds <[EMAIL PROTECTED]>
> 
> >> One of the biggest bitmaps is the background bitmap. So you have a
> >> client that uploads it to X and then goes away. There's nobody to
> >> un-count to by the time X decides to switch to another background.
> >
> > Actually, the big offenders are things other than the background
> > bitmap: things like E do absolutely insane things, you would not
> > believe (or maybe you would).  The background pixmap is generally
> > in the worst case typically no worse than 4 megabytes (for those
> > people who are crazy enough to put images up as their root window
> > on 32 bit deep displays, at 1kX1k resolution).
> 
> Still, it would be nice to recover that 4 MB when the system
> doesn't have any memory left.
> 
> X, and any other big friendly processes, could participate in
> memory balancing operations. X could be made to clean out a
> font cache when the kernel signals that memory is low. When
> the situation becomes serious, X could just mmap /dev/zero over
> top of the background image.
> 
> Netscape could even be hacked to dump old junk... or if it is
> just too leaky, it could exec itself to fix the problem.

Which is all good and well to DELAY the task of the OOM killer
for a few more minutes.

But in the end, there will be a point where you REALLY run out
of memory and you have no other choice than the OOM killer...

(not that I'm against alternative measures, I just think they're
orthagonal to this whole discussion)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Albert D. Cahalan

Jim Gettys writes:
>> From: Linus Torvalds <[EMAIL PROTECTED]>

>> One of the biggest bitmaps is the background bitmap. So you have a
>> client that uploads it to X and then goes away. There's nobody to
>> un-count to by the time X decides to switch to another background.
>
> Actually, the big offenders are things other than the background
> bitmap: things like E do absolutely insane things, you would not
> believe (or maybe you would).  The background pixmap is generally
> in the worst case typically no worse than 4 megabytes (for those
> people who are crazy enough to put images up as their root window
> on 32 bit deep displays, at 1kX1k resolution).

Still, it would be nice to recover that 4 MB when the system
doesn't have any memory left.

X, and any other big friendly processes, could participate in
memory balancing operations. X could be made to clean out a
font cache when the kernel signals that memory is low. When
the situation becomes serious, X could just mmap /dev/zero over
top of the background image.

Netscape could even be hacked to dump old junk... or if it is
just too leaky, it could exec itself to fix the problem.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Rik van Riel

On Tue, 10 Oct 2000, bert hubert wrote:
> On Mon, Oct 09, 2000 at 02:38:10PM -0700, Linus Torvalds wrote:
> 
> > So the process that gave X the bitmap dies. What now? Are we going to
> > depend on X un-counting the resources?
> > 
> > I'd prefer just X having a higher "mm nice level" or something.
> 
> I wonder how many megabytes we can fill with all messages about
> an OOM killer. I remember threads about this from '94 onwards.
> Perhaps we can finally have a sane one now :-)

In reality, the OOM killer I mailed a few days ago behaves
quite well in the real world.

I hope Linus will be as sensitive to theoretical arguments
with no foundation in reality as I am (ie. not), so we'll
have SOMETHING in the kernel soon.

If we later find out there are some problems with the OOM
killer, we can always change it then. No need to hold up
a reasonable solution when the current kernel has NO solution
to the problem at all ...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Rik van Riel

On Mon, 9 Oct 2000, Byron Stanoszek wrote:
> On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote:
> 
> > Anyway, there is/was an API in PTX to say (either from in-kernel or through
> > some user machinations) "I Am a System Process".  Turns on a bit in the
> > proc struct (task struct) that made it exempt from death from a variety
> > of sources, e.g. OOM, generic user signals, portions of system shutdown,
> > etc.
> 
> The current OOM killer does this, except for init. Checking to
> see if the process has a page table is equivalent to checking
> for the kernel threads that are integral to the system (PIDs
> 2-5). These will never be killed by the OOM. Init, however,
> still can be killed, and there should be an additional statement
> that doesn't kill if PID == 1.

Only if you can demonstrate any real-world scenario where 
init will be chosen with the current algorithm.

The "3 MB init on 4MB machine" kind of theoretical argument
just isn't convincing if nobody can show that there is a
problem in reality.

> I think we need to sit down and write a better OOM proposal,
> something that doesn't use CPU time and the NICE flag.

The nice flag has been removed from my current kernel tree.

The CPU time used, however, is a different matter. You really
don't want to have the OOM killer kill your 6-week-old running
simulation because a newly started netscape explodes ...

> How about we start by everyone in this discussion give their
> opinion on what the OOM selection process should do,

Quoting from mm/oom_kill.c:

/**
 * oom_badness - calculate a numeric value for how bad this task has been
 * @p: task struct of which task we should calculate
 *
 * The formula used is relatively simple and documented inline in the
 * function. The main rationale is that we want to select a good task
 * to kill when we run out of memory.
 *
 * Good in this context means that:
 * 1) we lose the minimum amount of work done
 * 2) we recover a large amount of memory
 * 3) we don't kill anything innocent of eating tons of memory
 * 4) we want to kill the minimum amount of processes (one)
 * 5) we try to kill the process the user expects us to kill, this
 *algorithm has been meticulously tuned to meet the priniciple
 *of least surprise ... (be careful when you change it)
 */

Do you have any additional requirements?

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread bert hubert

On Mon, Oct 09, 2000 at 02:38:10PM -0700, Linus Torvalds wrote:

> So the process that gave X the bitmap dies. What now? Are we going to
> depend on X un-counting the resources?
> 
> I'd prefer just X having a higher "mm nice level" or something.

I wonder how many megabytes we can fill with all messages about an OOM
killer. I remember threads about this from '94 onwards. Perhaps we can
finally have a sane one now :-)

Regards,

bert hubert

-- 
PowerDNS Versatile DNS Services  
Trilab   The Technology People   
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote:

> Anyway, there is/was an API in PTX to say (either from in-kernel or through
> some user machinations) "I Am a System Process".  Turns on a bit in the
> proc struct (task struct) that made it exempt from death from a variety
> of sources, e.g. OOM, generic user signals, portions of system shutdown,
> etc.

The current OOM killer does this, except for init. Checking to see if the
process has a page table is equivalent to checking for the kernel threads that
are integral to the system (PIDs 2-5). These will never be killed by the OOM.
Init, however, still can be killed, and there should be an additional statement
that doesn't kill if PID == 1.

I think we need to sit down and write a better OOM proposal, something that
doesn't use CPU time and the NICE flag. Lets concentrate our efforts on what
constitutes a good selection method instead of bickering with each other.

How about we start by everyone in this discussion give their opinion on what
the OOM selection process should do, listing them in both order of importance
and severity, giving a rational reason for each choice. Maybe then we can get
somewhere.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Gerrit . Huizenga


At Sequent, we found that there are a small set of processes which are
"critical" to the system's operation in that they should not be killed
on swap shortage, memory shortage, etc.  This included things like init,
potentially inetd, the swapper, page daemon, clusters heartbeat daemon,
and generally any core system service which had a user process component.
If there wasn't enough memory for those processes, or if those processes
weren't already responsible in their use of memory/resources, you were
already toast.

Anyway, there is/was an API in PTX to say (either from in-kernel or through
some user machinations) "I Am a System Process".  Turns on a bit in the
proc struct (task struct) that made it exempt from death from a variety
of sources, e.g. OOM, generic user signals, portions of system shutdown,
etc.

Then, the code looking for things to kill simply skips those that are
intelligently marked, taking most of the decision making/policy making
out of the scheduler/memory manager.

gerrit

> On Mon, 9 Oct 2000, Linus Torvalds wrote:
> > On Mon, 9 Oct 2000, Andi Kleen wrote:
> > > 
> > > netscape usually has child processes: the dns helper. 
> > 
> > Yeah.
> > 
> > One thing we _can_ (and probably should do) is to do a per-user
> > memory pressure thing - we have easy access to the "struct
> > user_struct" (every process has a direct pointer to it), and it
> > should not be too bad to maintain a per-user "VM pressure"
> > counter.
> > 
> > Then, instead of trying to use heuristics like "does this
> > process have children" etc, you'd have things like "is this user
> > a nasty user", which is a much more valid thing to do and can be
> > used to find people who fork tons of processes that are
> > mid-sized but use a lot of memory due to just being many..
> 
> Sure we could do all of this, but does OOM really happen that
> often that we want to make the algorithm this complex ?
> 
> The current algorithm seems to work quite well and is already
> at the limit of how complex I'd like to see it. Having a less
> complex OOM killer turned out to not work very well, but having
> a more complex one is - IMHO - probably overkill ...
> 
> regards,
> 
> Rik
> --
> "What you're running that piece of shit Gnome?!?!"
>-- Miguel de Icaza, UKUUG 2000
> 
> http://www.conectiva.com/ http://www.surriel.com/
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [EMAIL PROTECTED]  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Jim Gettys


> From: Linus Torvalds <[EMAIL PROTECTED]>
> Date: Mon, 9 Oct 2000 14:50:51 -0700 (PDT)
> To: Jim Gettys <[EMAIL PROTECTED]>
> Cc: Alan Cox <[EMAIL PROTECTED]>, Andi Kleen <[EMAIL PROTECTED]>,
> Ingo Molnar <[EMAIL PROTECTED]>, Andrea Arcangeli <[EMAIL PROTECTED]>,
> Rik van Riel <[EMAIL PROTECTED]>,
> Byron Stanoszek <[EMAIL PROTECTED]>,
> MM mailing list <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
> Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> -
> On Mon, 9 Oct 2000, Jim Gettys wrote:
> >
> >
> > On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds
> <[EMAIL PROTECTED]>
> > said:
> >
> > >
> > > The problem is that there is no way to keep track of them afterwards.
> > >
> > > So the process that gave X the bitmap dies. What now? Are we going to
> > > depend on X un-counting the resources?
> > >
> >
> > X has to uncount the resources already, to free the memory in the X server
> > allocated on behalf of that client.  X has to get this right, to be a long
> > lived server (properly debugged X servers last many months without problems:
> > unfortunately, a fair number of DDX's are buggy).
> 
> No, but my point is that it doesn't really work.
> 
> One of the biggest bitmaps is the background bitmap. So you have a client
> that uploads it to X and then goes away. There's nobody to un-count to by
> the time X decides to switch to another background.

Actually, the big offenders are things other than the background bitmap:
things like E do absolutely insane things, you would not believe (or maybe
you would).  The background pixmap is generally in the worst case typically
no worse than 4 megabytes (for those people who are crazy enough to put
images up as their root window on 32 bit deep displays, at 1kX1k resolution).

> 
> Does that memory just disappear as far as the resource handling is
> concerned when the client that originated it dies?

No, X recovers the memory when a connection dies, unless the client has
gone out of its way to arrange to preserve things across connection
termination.  Few, if any clients do this: it is primarily possible mostly
for debugging purposes, that (fortunately, or unfortunately, depending
on your opinion) what happens not just vanish before you can see what
happened.

So the X server does extensive bookkeeping of its memory usage, and retrieves
all memory used by clients when they terminate (with the above rare
exception).

> 
> What happens with TCP connections? They might be local. Or they might not.
> In either case X doesn't know whom to blame.

At least on BSD kernels, it was reasonably straightforward to determine
if a TCP connection was local: in that case, the code actually did an upcall
and delivered data directly to the appropriate socket.  Dunno about the
insides of Linux.

I suspect it should not be hard to find the right process for local
connections.  Distant connections are, indeed, a challenge.

> 
> Basically, the only thing _I_ think X can do is to really say "oh, please
> don't count my memory, because everything I do I do for my clients, not
> for myself".
> 
> THAT is my argument. Basically there is nothing we can reliably account.

Your argument has alot of validity, though the X server does a better job
of accounting than you might think.

BUT, I'm actually more interested in dealing with scheduling preferences, to
get really first rate interactive feel.

> 
> So we might as well fall back on just saying "X is more important than
> some random client", and have a mm niceness level. Which right now is
> obviously approximated by the IO capabilities tests etc.
> 

As I say above, the principle here may be more useful than for the memory 
example, but for controlling scheduling so we can get great interactive 
feel.  THAT is what is really worth discussing.
- Jim


--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Rik van Riel

On Mon, 9 Oct 2000, Aaron Sethman wrote:

> I think the run time should probably be accounted into to this
> as well. Basically start knocking off recent processes first,
> which are likely to be childless, and start working your way up
> in age.

I'm almost getting USENET flashbacks ...  ;)

Please look at the code before suggesting something that
is already there (and has been in the code for some 2 years).

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Alan Cox

> > across AF_UNIX sockets so the mechanism is notionally there to provide the 
> > credentials to X, just not to use them
> 
> The problem is that there is no way to keep track of them afterwards.

If you use mmap for your allocator then beancounter will get it right. Every
resource knows which beancounter it was charged too. It adds an overhead the
average desktop user won't like but which is pretty much essential to do real
mainframe world operation. So it would become

seteuid(Client->passed_euid);
mmap(buffer in pages)
seteuid(getuid());

With lightwait counting semantics its hard to make any tracking system work
well in the corner cases like resources that survive process death.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Aaron Sethman

On Mon, 9 Oct 2000, James Sutherland wrote:

> On Mon, 9 Oct 2000, Ingo Molnar wrote:
> 
> > On Mon, 9 Oct 2000, Rik van Riel wrote:
> > 
> > > > so dns helper is killed first, then netscape. (my idea might not
> > > > make sense though.)
> > > 
> > > It makes some sense, but I don't think OOM is something that
> > > occurs often enough to care about it /that/ much...
> > 
> > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case,
> > with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I
> > think it's a legitimate concern - i cannot know in advance whether a
> > freshly started process would trigger an OOM or not.
> 
> Shouldn't the runtime factor handle this, making sure the new process is
> killed? (Maybe not if you're almost OOM right from the word go, and run
> this process straight off... Hrm.)

I think the run time should probably be accounted into to this as
well. Basically start knocking off recent processes first, which are
likely to be childless, and start working your way up in age. The
reasoning here is that your less likely an important, long running
service.  Of course you could probably account for whether the process is
childless or not as well. 

Just my $0.02 on it..


Aaron

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Linus Torvalds



On Mon, 9 Oct 2000, Jim Gettys wrote:
> 
> 
> On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds 
><[EMAIL PROTECTED]>
> said:
> 
> >
> > The problem is that there is no way to keep track of them afterwards.
> >
> > So the process that gave X the bitmap dies. What now? Are we going to
> > depend on X un-counting the resources?
> >
> 
> X has to uncount the resources already, to free the memory in the X server
> allocated on behalf of that client.  X has to get this right, to be a long
> lived server (properly debugged X servers last many months without problems:
> unfortunately, a fair number of DDX's are buggy).

No, but my point is that it doesn't really work.

One of the biggest bitmaps is the background bitmap. So you have a client
that uploads it to X and then goes away. There's nobody to un-count to by
the time X decides to switch to another background.

Does that memory just disappear as far as the resource handling is
concerned when the client that originated it dies?

What happens with TCP connections? They might be local. Or they might not.
In either case X doesn't know whom to blame.

Basically, the only thing _I_ think X can do is to really say "oh, please
don't count my memory, because everything I do I do for my clients, not
for myself". 

THAT is my argument. Basically there is nothing we can reliably account.

So we might as well fall back on just saying "X is more important than
some random client", and have a mm niceness level. Which right now is
obviously approximated by the IO capabilities tests etc.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Jim Gettys



On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), Linus Torvalds <[EMAIL PROTECTED]>
said:

> 
> The problem is that there is no way to keep track of them afterwards.
> 
> So the process that gave X the bitmap dies. What now? Are we going to
> depend on X un-counting the resources?
> 

X has to uncount the resources already, to free the memory in the X server
allocated on behalf of that client.  X has to get this right, to be a long
lived server (properly debugged X servers last many months without problems:
unfortunately, a fair number of DDX's are buggy).

- Jim

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Linus Torvalds



On Mon, 9 Oct 2000, Rik van Riel wrote:
>
> > I'd prefer just X having a higher "mm nice level" or something.
> 
> Which it has, because:
> 
> 1) CAP_RAW_IO
> 2) p->euid == 0

Oh, I agree, but we might want to generalize this a bit so that root could
say "this process is important" and then drop root privileges and still
get "credited" for the fact that it's important.

It's not a big deal. It works for X right now.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Jim Gettys


> > Sounds like one needs in addition some mechanism for servers to "charge"
> clients for
> > consumption. X certainly knows on behalf of which connection resources
> > are created; the OS could then transfer this back to the appropriate client
> > (at least when on machine).
> 
> Definitely - and this is present in some non Unix OS's. We do pass credentials
> across AF_UNIX sockets so the mechanism is notionally there to provide the
> credentials to X, just not to use them

Stephen Tweedie, Dave Rosenthal, Keith Packard and myself had an extensive
discussion on similar ideas around process quantum scheduling (the X server
would like to be able to forward quantum to clients) as well at Usenix.
This is closely related, and needed to finally fully control interactive
feel in the face of "greedy" clients.

My memory is that it sounded like things could become very interesting
with such a facility, and might be ripe for 2.5.

Keith, Stephen, Dave, do you remember the details of our discussion?
- Jim

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Rik van Riel

On Mon, 9 Oct 2000, Linus Torvalds wrote:
> On Mon, 9 Oct 2000, Alan Cox wrote:
> > > consumption. X certainly knows on behalf of which connection resources
> > > are created; the OS could then transfer this back to the appropriate client
> > > (at least when on machine).
> > 
> > Definitely - and this is present in some non Unix OS's. We do pass credentials
> > across AF_UNIX sockets so the mechanism is notionally there to provide the 
> > credentials to X, just not to use them
> 
> The problem is that there is no way to keep track of them afterwards.
> 
> So the process that gave X the bitmap dies. What now? Are we going to
> depend on X un-counting the resources?
> 
> I'd prefer just X having a higher "mm nice level" or something.

Which it has, because:

1) CAP_RAW_IO
2) p->euid == 0

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Linus Torvalds



On Mon, 9 Oct 2000, Alan Cox wrote:
> > consumption. X certainly knows on behalf of which connection resources
> > are created; the OS could then transfer this back to the appropriate client
> > (at least when on machine).
> 
> Definitely - and this is present in some non Unix OS's. We do pass credentials
> across AF_UNIX sockets so the mechanism is notionally there to provide the 
> credentials to X, just not to use them

The problem is that there is no way to keep track of them afterwards.

So the process that gave X the bitmap dies. What now? Are we going to
depend on X un-counting the resources?

I'd prefer just X having a higher "mm nice level" or something.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Rik van Riel

On Mon, 9 Oct 2000, Ingo Molnar wrote:
> On Mon, 9 Oct 2000, Rik van Riel wrote:
> 
> > Would this complexity /really/ be worth it for the twice-yearly OOM
> > situation?
> 
> the only reason i suggested this was the init=/bin/bash, 4MB
> RAM, no swap emergency-bootup case. We must not kill init in
> that case - if the current code doesnt then great and none of
> this is needed.

I guess this requires some testing. If anybody can reproduce
the bad effects without going /too/ much out of the way of a
realistic scenario, the code needs to be fixed.

If it turns out to be a non-issue in all scenarios, there's
no need to make the code any more complex.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Andi Kleen

On Mon, Oct 09, 2000 at 10:28:38PM +0100, Alan Cox wrote:
> > Sounds like one needs in addition some mechanism for servers to "charge" clients 
>for
> > consumption. X certainly knows on behalf of which connection resources
> > are created; the OS could then transfer this back to the appropriate client
> > (at least when on machine).
> 
> Definitely - and this is present in some non Unix OS's. We do pass credentials
> across AF_UNIX sockets so the mechanism is notionally there to provide the 
> credentials to X, just not to use them

X can get the pid using SO_PEERCRED for unix connections. 

When the oom killer maintains some kind of badness value in the task_struct
it would be possible to add a charge() systemcall that manipulates it.

int charge(pid_t pid, int memorytobecharged) 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Paul Jakma

On Mon, 9 Oct 2000, David Ford wrote:

> Not if "init" is a particular program running on a router floppy for
> example.  The system may be designed to be a router and the userland
> monitor/control program is the only thing that runs and consumes 90% of the
> memory.  If a forked or spawned process starts up with high CPU that just
> tips it over the OOM edge, we don't really want to kill init even if it's
> taking "all" the memory and or "all" the cpu.

this is such a special case it is not worth considering - rather
leave it up to the designer of the router floppy to get his stuff
right.

the one thing that is clear from the many OOM flamewars is that no
OOM reaper algorithm will satisfy 100% of conditions 100% of the
time. So all Rik can do is optimise for the common case.

(roll on beancounting and proper resource limiting - the true but
heavyweight solution)

regards,
-- 
Paul Jakma  [EMAIL PROTECTED]
PGP5 key: http://www.clubi.ie/jakma/publickey.txt
---
Fortune:
Individualists unite!


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Ingo Molnar


On Mon, 9 Oct 2000, Rik van Riel wrote:

> Would this complexity /really/ be worth it for the twice-yearly OOM
> situation?

the only reason i suggested this was the init=/bin/bash, 4MB RAM, no swap
emergency-bootup case. We must not kill init in that case - if the current
code doesnt then great and none of this is needed.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Alan Cox

> Sounds like one needs in addition some mechanism for servers to "charge" clients for
> consumption. X certainly knows on behalf of which connection resources
> are created; the OS could then transfer this back to the appropriate client
> (at least when on machine).

Definitely - and this is present in some non Unix OS's. We do pass credentials
across AF_UNIX sockets so the mechanism is notionally there to provide the 
credentials to X, just not to use them
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Rik van Riel

On Mon, 9 Oct 2000, Ingo Molnar wrote:
> On Mon, 9 Oct 2000, Alan Cox wrote:
> 
> > Lets kill a 6 week long typical background compute job because
> > netscape exploded (and yes netscape has a child process)
> 
> in the paragraph you didnt quote i pointed this out and
> suggested adding all parent's badness value to children as well
> - so we'd end up killing netscape.

Would this complexity /really/ be worth it for the twice-yearly
OOM situation?

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Jim Gettys


> Sender: [EMAIL PROTECTED]
> From: "Andi Kleen" <[EMAIL PROTECTED]>
> Date: Mon, 9 Oct 2000 22:58:22 +0200
> To: Linus Torvalds <[EMAIL PROTECTED]>
> Cc: Andi Kleen <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL PROTECTED]>,
> Andrea Arcangeli <[EMAIL PROTECTED]>,
> Rik van Riel <[EMAIL PROTECTED]>,
> Byron Stanoszek <[EMAIL PROTECTED]>,
>     MM mailing list <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
> Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> -
> On Mon, Oct 09, 2000 at 01:52:21PM -0700, Linus Torvalds wrote:
> > One thing we _can_ (and probably should do) is to do a per-user memory
> > pressure thing - we have easy access to the "struct user_struct" (every
> > process has a direct pointer to it), and it should not be too bad to
> > maintain a per-user "VM pressure" counter.
> >
> > Then, instead of trying to use heuristics like "does this process have
> > children" etc, you'd have things like "is this user a nasty user", which
> > is a much more valid thing to do and can be used to find people who fork
> > tons of processes that are mid-sized but use a lot of memory due to just
> > being many..
> 
> Would not help much when "they" eat your memory by loading big bitmaps
> into the X server which runs as root (it seems there are many programs
> which are very good at this particular DOS ;)
> 

This is generic to any server program, not unique to X.

Sounds like one needs in addition some mechanism for servers to "charge" clients for
consumption. X certainly knows on behalf of which connection resources
are created; the OS could then transfer this back to the appropriate client
(at least when on machine).

- Jim

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   3   >