Re: the new VM

2000-10-09 Thread Christoph Rohland

Rik van Riel <[EMAIL PROTECTED]> writes:

> Hmmm, could you help me with drawing up a selection algorithm
> on how to choose which SHM segment to destroy when we run OOM?
> 
> The criteria would be about the same as with normal programs:
> 
> 1) minimise the amount of work lost
> 2) try to protect 'innocent' stuff
> 3) try to kill only one thing
> 4) don't surprise the user, but chose something that
>the user will expect to be killed/destroyed

First we only kill segments with no attachees. There are circumstances
under normal load where you have these. (SAP R/3 will do this all the
time on Linux 2.4) 

So perhaps we could signal shm that we killed a process and let it try
to find a segment where this process was the last attachee. This would
be a good candidate.

If this does not help either we could do two different things:
1) kill the biggest nonattached segment
2) kill the segment which was longest detached

Greetings
Christoph

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-10-09 Thread Christoph Rohland

Rik van Riel [EMAIL PROTECTED] writes:

 Hmmm, could you help me with drawing up a selection algorithm
 on how to choose which SHM segment to destroy when we run OOM?
 
 The criteria would be about the same as with normal programs:
 
 1) minimise the amount of work lost
 2) try to protect 'innocent' stuff
 3) try to kill only one thing
 4) don't surprise the user, but chose something that
the user will expect to be killed/destroyed

First we only kill segments with no attachees. There are circumstances
under normal load where you have these. (SAP R/3 will do this all the
time on Linux 2.4) 

So perhaps we could signal shm that we killed a process and let it try
to find a segment where this process was the last attachee. This would
be a good candidate.

If this does not help either we could do two different things:
1) kill the biggest nonattached segment
2) kill the segment which was longest detached

Greetings
Christoph

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-10-06 Thread Rik van Riel

[replying to a really old email now that I've started work
 on integrating the OOM handler]

On 25 Sep 2000, Christoph Rohland wrote:
> Rik van Riel <[EMAIL PROTECTED]> writes:
> 
> > > Because as you said the machine can lockup when you run out of memory.
> > 
> > The fix for this is to kill a user process when you're OOM
> > (you need to do this anyway).
> > 
> > The last few allocations of the "condemned" process can come
> > frome the reserved pages and the process we killed will exit just
> > fine.
> 
> It's slightly offtopic, but you should think about detached shm
> segments in yout OOM killer. As many of the high end
> applications like databases and e.g. SAP have most of the memory
> in shm segments you easily end up killing a lot of processes
> without freeing a lot of memory. I see this often in my shm
> tests.

Hmmm, could you help me with drawing up a selection algorithm
on how to choose which SHM segment to destroy when we run OOM?

The criteria would be about the same as with normal programs:

1) minimise the amount of work lost
2) try to protect 'innocent' stuff
3) try to kill only one thing
4) don't surprise the user, but chose something that
   the user will expect to be killed/destroyed

regards,

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-10-06 Thread Rik van Riel

[replying to a really old email now that I've started work
 on integrating the OOM handler]

On 25 Sep 2000, Christoph Rohland wrote:
 Rik van Riel [EMAIL PROTECTED] writes:
 
   Because as you said the machine can lockup when you run out of memory.
  
  The fix for this is to kill a user process when you're OOM
  (you need to do this anyway).
  
  The last few allocations of the "condemned" process can come
  frome the reserved pages and the process we killed will exit just
  fine.
 
 It's slightly offtopic, but you should think about detached shm
 segments in yout OOM killer. As many of the high end
 applications like databases and e.g. SAP have most of the memory
 in shm segments you easily end up killing a lot of processes
 without freeing a lot of memory. I see this often in my shm
 tests.

Hmmm, could you help me with drawing up a selection algorithm
on how to choose which SHM segment to destroy when we run OOM?

The criteria would be about the same as with normal programs:

1) minimise the amount of work lost
2) try to protect 'innocent' stuff
3) try to kill only one thing
4) don't surprise the user, but chose something that
   the user will expect to be killed/destroyed

regards,

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-27 Thread Andrea Arcangeli

On Wed, Sep 27, 2000 at 09:42:45AM +0200, Ingo Molnar wrote:
> such screwups by checking for NULL and trying to handle it. I suggest to
> rather fix those screwups.

How do you know which is the minimal amount of RAM that allows you not to be in
the screwedup state?

We for sure need a kind of counter for the special dynamic structures but I'm
not sure if that should account the static stuff as well.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-27 Thread yodaiken

On Wed, Sep 27, 2000 at 09:42:45AM +0200, Ingo Molnar wrote:
> 
> On Tue, 26 Sep 2000, Pavel Machek wrote:
> of the VM allocation issues. Returning NULL in kmalloc() is just a way to
> say: 'oops, we screwed up somewhere'. And i'd suggest to not work around

That is not at all how it is currently used in the kernel. 

> such screwups by checking for NULL and trying to handle it. I suggest to
> rather fix those screwups.

Kmalloc returns null when there is not enough memory to satisfy the request. What's
wrong with that?


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-27 Thread Ingo Molnar


On Tue, 26 Sep 2000, Pavel Machek wrote:

> Okay, I'm user on small machine and I'm doing stupid thing: I've got
> 6MB ram, and I keep inserting modules. I insert module_1mb.o. Then I
> insert module_1mb.o. Repeat. How does it end? I think that
> kmalloc(GFP_KERNEL) *has* to return NULL at some point.

if a stupid root user keeps inserting bogus modules :-) then thats a
problem, no matter what. I can DoS your system if given the right to
insert arbitrary size modules, even if kmalloc returns NULL. For such
things explicit highlevel protection is needed - completely independently
of the VM allocation issues. Returning NULL in kmalloc() is just a way to
say: 'oops, we screwed up somewhere'. And i'd suggest to not work around
such screwups by checking for NULL and trying to handle it. I suggest to
rather fix those screwups.

the __GFP_SOFT suggestion handles these things nicely.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-27 Thread Ingo Molnar


On Tue, 26 Sep 2000, Pavel Machek wrote:

 Okay, I'm user on small machine and I'm doing stupid thing: I've got
 6MB ram, and I keep inserting modules. I insert module_1mb.o. Then I
 insert module_1mb.o. Repeat. How does it end? I think that
 kmalloc(GFP_KERNEL) *has* to return NULL at some point.

if a stupid root user keeps inserting bogus modules :-) then thats a
problem, no matter what. I can DoS your system if given the right to
insert arbitrary size modules, even if kmalloc returns NULL. For such
things explicit highlevel protection is needed - completely independently
of the VM allocation issues. Returning NULL in kmalloc() is just a way to
say: 'oops, we screwed up somewhere'. And i'd suggest to not work around
such screwups by checking for NULL and trying to handle it. I suggest to
rather fix those screwups.

the __GFP_SOFT suggestion handles these things nicely.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-27 Thread Andrea Arcangeli

On Wed, Sep 27, 2000 at 09:42:45AM +0200, Ingo Molnar wrote:
 such screwups by checking for NULL and trying to handle it. I suggest to
 rather fix those screwups.

How do you know which is the minimal amount of RAM that allows you not to be in
the screwedup state?

We for sure need a kind of counter for the special dynamic structures but I'm
not sure if that should account the static stuff as well.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-26 Thread Andrea Arcangeli

On Tue, Sep 26, 2000 at 09:10:16PM +0200, Pavel Machek wrote:
> Hi!
> > > i talked about GFP_KERNEL, not GFP_USER. Even in the case of GFP_USER i
> > 
> > My bad, you're right I was talking about GFP_USER indeed.
> > 
> > But even GFP_KERNEL allocations like the init of a module or any other thing
> > that is static sized during production just checking the retval
> > looks be ok.
> 
> Okay, I'm user on small machine and I'm doing stupid thing: I've got
> 6MB ram, and I keep inserting modules. I insert module_1mb.o. Then I
> insert module_1mb.o. Repeat. How does it end? I think that
> kmalloc(GFP_KERNEL) *has* to return NULL at some point. 

I agree and that's what I said since the first place. GFP_KERNEL must return
null when the system is truly out of memory or the kernel will deadlock at that
time. In the sentence you quoted I meant that both GFP_USER and most GFP_KERNEL
could only keep to check the retval even in the long term to be correct
(checking for NULL, that in turn means GFP_KERNEL _will_ return NULL
eventually).

There's no need of special resource accounting for many static sized data
structure in kernel (this accounting is necessary only for some of the dynamic
things that grows and shrink during production and that can't be reclaimed
synchronously when memory goes low by blocking in the allocator, like
pagetables skbs on gbit ethernet and other things).

Not sure if at the end we'll need to account also the static parts to
get the dynamic part right.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-26 Thread Pavel Machek

Hi!
> > i talked about GFP_KERNEL, not GFP_USER. Even in the case of GFP_USER i
> 
> My bad, you're right I was talking about GFP_USER indeed.
> 
> But even GFP_KERNEL allocations like the init of a module or any other thing
> that is static sized during production just checking the retval
> looks be ok.

Okay, I'm user on small machine and I'm doing stupid thing: I've got
6MB ram, and I keep inserting modules. I insert module_1mb.o. Then I
insert module_1mb.o. Repeat. How does it end? I think that
kmalloc(GFP_KERNEL) *has* to return NULL at some point. 

Killing apps is not a solution: If my insmoder is smaller than module
I'm trying to insert, and it happens to be the only process, you just
will not be able to kmalloc(GFP_KERNEL, sizeof(module)). Will you
panic at the end?

Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-26 Thread Christoph Rohland

Hi Rik,

Rik van Riel <[EMAIL PROTECTED]> writes:

> > Because as you said the machine can lockup when you run out of memory.
> 
> The fix for this is to kill a user process when you're OOM
> (you need to do this anyway).
> 
> The last few allocations of the "condemned" process can come
> frome the reserved pages and the process we killed will exit just
> fine.

It's slightly offtopic, but you should think about detached shm
segments in yout OOM killer. As many of the high end applications like
databases and e.g. SAP have most of the memory in shm segments you
easily end up killing a lot of processes without freeing a lot of
memory. I see this often in my shm tests.

Greetings
Christoph

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-26 Thread Christoph Rohland

Hi Rik,

Rik van Riel [EMAIL PROTECTED] writes:

  Because as you said the machine can lockup when you run out of memory.
 
 The fix for this is to kill a user process when you're OOM
 (you need to do this anyway).
 
 The last few allocations of the "condemned" process can come
 frome the reserved pages and the process we killed will exit just
 fine.

It's slightly offtopic, but you should think about detached shm
segments in yout OOM killer. As many of the high end applications like
databases and e.g. SAP have most of the memory in shm segments you
easily end up killing a lot of processes without freeing a lot of
memory. I see this often in my shm tests.

Greetings
Christoph

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-26 Thread Pavel Machek

Hi!
  i talked about GFP_KERNEL, not GFP_USER. Even in the case of GFP_USER i
 
 My bad, you're right I was talking about GFP_USER indeed.
 
 But even GFP_KERNEL allocations like the init of a module or any other thing
 that is static sized during production just checking the retval
 looks be ok.

Okay, I'm user on small machine and I'm doing stupid thing: I've got
6MB ram, and I keep inserting modules. I insert module_1mb.o. Then I
insert module_1mb.o. Repeat. How does it end? I think that
kmalloc(GFP_KERNEL) *has* to return NULL at some point. 

Killing apps is not a solution: If my insmoder is smaller than module
I'm trying to insert, and it happens to be the only process, you just
will not be able to kmalloc(GFP_KERNEL, sizeof(module)). Will you
panic at the end?

Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-26 Thread Andrea Arcangeli

On Tue, Sep 26, 2000 at 09:10:16PM +0200, Pavel Machek wrote:
 Hi!
   i talked about GFP_KERNEL, not GFP_USER. Even in the case of GFP_USER i
  
  My bad, you're right I was talking about GFP_USER indeed.
  
  But even GFP_KERNEL allocations like the init of a module or any other thing
  that is static sized during production just checking the retval
  looks be ok.
 
 Okay, I'm user on small machine and I'm doing stupid thing: I've got
 6MB ram, and I keep inserting modules. I insert module_1mb.o. Then I
 insert module_1mb.o. Repeat. How does it end? I think that
 kmalloc(GFP_KERNEL) *has* to return NULL at some point. 

I agree and that's what I said since the first place. GFP_KERNEL must return
null when the system is truly out of memory or the kernel will deadlock at that
time. In the sentence you quoted I meant that both GFP_USER and most GFP_KERNEL
could only keep to check the retval even in the long term to be correct
(checking for NULL, that in turn means GFP_KERNEL _will_ return NULL
eventually).

There's no need of special resource accounting for many static sized data
structure in kernel (this accounting is necessary only for some of the dynamic
things that grows and shrink during production and that can't be reclaimed
synchronously when memory goes low by blocking in the allocator, like
pagetables skbs on gbit ethernet and other things).

Not sure if at the end we'll need to account also the static parts to
get the dynamic part right.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Rik van Riel

On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
> On Mon, Sep 25, 2000 at 04:27:24PM +0200, Ingo Molnar wrote:
> > i think an application should not fail due to other applications
> > allocating too much RAM. OOM behavior should be a central thing and based
> 
> At least Linus's point is that doing perfect accounting (at
> least on the userspace allocation side) may cause you to waste
> resources, failing even if you could still run and I tend to
> agree with him. We're lazy on that side and that's global win in
> most cases.

OK, so do you guys want my OOM-killer selection code
in 2.4? ;)

(that will fix the OOM case in the rare situations
where it occurs and do the expected thing most of the
time)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:40:44PM +0100, Stephen C. Tweedie wrote:
> Allowing GFP_ATOMIC to eat PF_MEMALLOC's last-chance pages is the
> wrong thing to do if we want to guarantee swapper progress under
> extreme load.

You're definitely right. We at least need the garantee of the memory to
allocate the bhs on top of the swap cache while we atttempt to swapout one page
(that path can't fail at the moment).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 05:16:06PM +0200, Ingo Molnar wrote:
> situation is just 1% RAM away from the 'root cannot log in', situation.

The root cannot log in is a little different. Just think that in the "root
cannot log in" you only need to press SYSRQ+E (or as worse +I).

If all tasks in the systems are hanging into the GFP loop SYSRQ+I won't solve
the deadlock.

Ok you can add a signal check in the memory balancing code but that looks an
ugly hack that shows the difference between the two cases (the one Alan pointed
out is real deadlock, the current one is kind of live lock that can go away any
time, while the deadlock can reach the point where it can't be recovered
without an hack from an irq somewhere).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 05:26:59PM +0200, Ingo Molnar wrote:
> 
> On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
> 
> > > i think the GFP_USER case should do the oom logic within __alloc_pages(),
> > 
> > What's the difference of implementing the logic outside alloc_pages?
> > Putting the logic inside looks not clean design to me.
> 
> it gives consistency and simplicity. The allocators themselves do not have
> to care about oom.


There are many cases where it is simple to do:

  if( alloc(r1) == fail) goto freeall
  if( alloc(r2) == fail) goto freeall
  if( alloc(r3) == fail) goto freeall

And the alloc functions don't know how to "freeall".

Perhaps it would be good to do an alloc_vec allocation in these cases.
  alloc_vec[0].size = n;
  ..
  alloc_vec[n].size = 0;
  if(kmalloc_all(alloc_vec) == FAIL) return -ENOMEM;
  else  alloc_vec[i].ptr is the pointer.




-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Alan Cox wrote:

> Unless Im missing something here think about this case
> 
> 2 active processes, no swap
> 
> #1#2
> kmalloc 32K   kmalloc 16K
> OKOK
> kmalloc 16K   kmalloc 32K
> block block
> 
> so GFP_KERNEL has to be able to fail - it can wait for I/O in some
> cases with care, but when we have no pages left something has to give

you are right, i agree that synchronous OOM for higher-order allocations
must be preserved (just like ATOMIC allocations). But the overwhelming
majority of allocations is done at page granularity.

with multi-page allocations and the need for physically contiguous
buffers, the problem cannot be solved.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> > i think the GFP_USER case should do the oom logic within __alloc_pages(),
> 
> What's the difference of implementing the logic outside alloc_pages?
> Putting the logic inside looks not clean design to me.

it gives consistency and simplicity. The allocators themselves do not have
to care about oom.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Swap on RAID; was: Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000 [EMAIL PROTECTED] wrote:

> > this is fixed in 2.4. The 2.2 RAID code is frozen, and has known
> > limitations (ie. due to the above RAID1 cannot be used as a swap-device).

> as commonly patched in by RedHat?  Should I instead use a swap file
> for a machine that should be fault-tolerant against a drive failure?

the answer is yes. RAID5 will not deadlock due to VM problems, but RAID5
might have other problems if the device is being reconstructed *and* used
for swap.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Swap on RAID; was: Re: the new VM

2000-09-25 Thread parsley

Ingo Molnar wrote:

> this is fixed in 2.4. The 2.2 RAID code is frozen, and has known
> limitations (ie. due to the above RAID1 cannot be used as a swap-device).

Eh, just to be clear about this: does this apply to the RAID 0.90 code
as commonly patched in by RedHat?  Should I instead use a swap file for
a machine that should be fault-tolerant against a drive failure?

regards,
David
-- 
David L. Parsley
Network Administrator
Roanoke College
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:43:44PM +0200, Ingo Molnar wrote:
> i talked about GFP_KERNEL, not GFP_USER. Even in the case of GFP_USER i

My bad, you're right I was talking about GFP_USER indeed.

But even GFP_KERNEL allocations like the init of a module or any other thing
that is static sized during production just checking the retval looks be ok.

> believe the right place to oom is via a signal, not in the gfp() case.

Signal can be trapped and ignored by malicious task. We had that security
problem until 2.2.14 IIRC.

> (because oom situation in the gfp() case is a completely random and
> statistical event, which might have no connection at all to the behavior
> of that given process.)

I agree we should have more information about the behaviour of the system
and I think a per-task page fault rate should work in practice.

But my question isn't what you do when you're OOM, but is _how_ do you
notice that you're OOM?

In the GFP_USER case simply checking when GFP fails looks right to me.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Alan Cox

> > Because as you said the machine can lockup when you run out of memory.
> 
> well, i think all kernel-space allocations have to be limited carefully,
> denying succeeding allocations is not a solution against over-allocation,
> especially in a multi-user environment.

GFP_KERNEL has to be able to fail for 2.4. Otherwise you can get everything
jammed in kernel space waiting on GFP_KERNEL and if the swapper cannot make
space you die.

The alternative approach where it cannot fail has to be at higher levels so
you can release other resources that might need freeing for deadlock avoidance
before you retry


Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Rik van Riel

On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
> On Mon, Sep 25, 2000 at 03:02:58PM +0200, Ingo Molnar wrote:
> > On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
> > 
> > > Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed
> > > that is a showstopper bug. [...]
> > 
> > why?
> 
> Because as you said the machine can lockup when you run out of memory.

The fix for this is to kill a user process when you're OOM
(you need to do this anyway).

The last few allocations of the "condemned" process can come
frome the reserved pages and the process we killed will exit just
fine.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 11:26:48AM -0300, Marcelo Tosatti wrote:
> This thread keeps freeing pages from the inactive clean list when needed
> (when zone->free_pages < zone->pages_low), making them available for
> atomic allocations.

This is flawed. It's the irq that have to shrink the memory itself. It can't
certainly reschedule kreclaimd and wait it to do the work.

Increasing the free_pages_min limit is the _only_ alternative to having
irqs that are able to shrink clean cache (and hopefully that "feature"
will be resurrected soon since it's the only way to go right now). 

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> At least Linus's point is that doing perfect accounting (at least on
> the userspace allocation side) may cause you to waste resources,
> failing even if you could still run and I tend to agree with him.
> We're lazy on that side and that's global win in most cases.

well, as i said, i agree that being lazy on the user-space side (which is
by far the biggest RAM allocator in a typical system) makes sense - and we
can handle it cleanly.

Being lazy on the kernel-space side is the default behavior for us kernel
hackers :-) but i dont think it's the right thing in the long term.

> We are finegrined with page granularity, not with the mmap
> granularity. The point is that not all the mmapped regions are going
> to be pagedin. Think a program that only after 1 hour did all the
> calculations that allocated all the memory it requested with malloc.  
> Before the hour passes the unused memory can still be used for other
> things and that's what the user also expects when he runs `free`.

i think you've completely missed the fact that i made exactly this point
in my previous mail.

'user-space laziness': correct
'kernel-space laziness': dangerous

i talked about GFP_KERNEL, not GFP_USER. Even in the case of GFP_USER i
believe the right place to oom is via a signal, not in the gfp() case.
(because oom situation in the gfp() case is a completely random and
statistical event, which might have no connection at all to the behavior
of that given process.)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:27:24PM +0200, Ingo Molnar wrote:
> i think an application should not fail due to other applications
> allocating too much RAM. OOM behavior should be a central thing and based

At least Linus's point is that doing perfect accounting (at least on the
userspace allocation side) may cause you to waste resources, failing even if
you could still run and I tend to agree with him. We're lazy on that
side and that's global win in most cases.

We are finegrined with page granularity, not with the mmap granularity.

The point is that not all the mmapped regions are going to be pagedin.  Think a
program that only after 1 hour did all the calculations that allocated all
the memory it requested with malloc.  Before the hour passes the unused memory
can still be used for other things and that's what the user also expects
when he runs `free`.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Marcelo Tosatti


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:



> I talked with Alexey about this and it seems the best way is to have a
> per-socket reservation of clean cache in function of the receive window.  So we
> don't need an huge atomic pool but we can have a special lru with an irq
> spinlock that is able to shrink cache from irq as well.

In the current 2.4 VM code, there is a kernel thread called
"kreclaimd".

This thread keeps freeing pages from the inactive clean list when needed
(when zone->free_pages < zone->pages_low), making them available for
atomic allocations.

Do you consider pages_low pages as a "huge atomic pool" ? 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> I'm not sure if we should restrict the limiting only to the cases that
> needs them. For example do_anonymous_page looks a place that could
> rely on the GFP retval.

i think an application should not fail due to other applications
allocating too much RAM. OOM behavior should be a central thing and based
on allocation patterns, not pure luck or unluck. I always found it rude to
SIGBUS when some other application is abusing RAM but the oom detector has
not yet killed it off.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:04:14PM +0200, Ingo Molnar wrote:
> exactly, and this is why if a higher level lets through a GFP_KERNEL, then
> it *must* succeed. Otherwise either the higher level code is buggy, or the
> VM balance is buggy, but we want to have clear signs of it.

I'm not sure if we should restrict the limiting only to the cases that needs
them. For example do_anonymous_page looks a place that could rely on the
GFP retval.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:39:51PM +0200, Ingo Molnar wrote:
> Andrea, if you really mean this then you should not be let near the VM
> balancing code :-)

What I mean is that the VM balancing is in the lower layer that knows anything
about the per-socket gigabit ethernet skbs limits, the limit should live at the
higher layer. For most code just checking for NULL in GFP is fine (for example
do_anonymous_page). It's the caller (not the VM balancing developer) that
shouldn't be let near his code if it allows his code to fill all the physical
ram with his stuff causing the machine to run OOM.

> > Most dynamic big caches and kernel data can be shrinked dynamically
> > during memory pressure (pheraps except skbs and I agree that for skbs
> > on gigabit ethernet the thing is a little different).
> 
> a big 'except'. You dont need gigabit for that, to the contrary, if the

I talked with Alexey about this and it seems the best way is to have a
per-socket reservation of clean cache in function of the receive window.  So we
don't need an huge atomic pool but we can have a special lru with an irq
spinlock that is able to shrink cache from irq as well.

> about how many D.O.S. attacks there are possible without implicit or
> explicit bean counting.

Again: the bean counting and all the limit happens at the higher layer.  I
shouldn't know anything about it when I play with the lower layer GFP memory
balancing code.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> Again: the bean counting and all the limit happens at the higher
> layer.  I shouldn't know anything about it when I play with the lower
> layer GFP memory balancing code.

exactly, and this is why if a higher level lets through a GFP_KERNEL, then
it *must* succeed. Otherwise either the higher level code is buggy, or the
VM balance is buggy, but we want to have clear signs of it.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> > yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO
> > latencies)
> 
> Very good! Many thanks Ingo.

this was actually coded/fixed by Neil Brown - so the kudos go to him!

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> And if the careful limit avoids the deadlock in the layer above
> alloc_pages, then it will also avoid alloc_pages to return NULL and
> you won't need an infinite loop in first place (unless the memory
> balancing is buggy).

yes i like this property very much because it unearths VM balancing bugs,
which plagued us for so long and are so hard to detect. But statistically
it's also possible that try_to_free_pages() frees a page and alloc_pages()
done on another CPU (or in IRQ context) 'steals' the page. This can
happen, because the VM right now guarantees no straight path from
deallocator to allocator. (and it's not necessery to guarantee it, given
the varying nature of allocation requests.)

> GFP should return NULL only if the machine is out of memory. The
> kernel can be written in a way that never deadlocks when the machine
> is out of memory just checking the GFP retval. I don't think any
> in-kernel resource limit is necessary to have things reliable and
> fast. [...]

Andrea, if you really mean this then you should not be let near the VM
balancing code :-)

> Most dynamic big caches and kernel data can be shrinked dynamically
> during memory pressure (pheraps except skbs and I agree that for skbs
> on gigabit ethernet the thing is a little different).

a big 'except'. You dont need gigabit for that, to the contrary, if the
network is slow it's easier to overallocate within the kernel. Ask Alan
about how many D.O.S. attacks there are possible without implicit or
explicit bean counting.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:21:01PM +0200, Ingo Molnar wrote:
> yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO
> latencies)

Very good! Many thanks Ingo.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:12:58PM +0200, Ingo Molnar wrote:
> well, i think all kernel-space allocations have to be limited carefully,

When a machine without a gigabit ethernet runs oom it's userspace that
allocated the memory via page faults not the kernel.

And if the careful limit avoids the deadlock in the layer above alloc_pages,
then it will also avoid alloc_pages to return NULL and you won't need an
infinite loop in first place (unless the memory balancing is buggy).

GFP should return NULL only if the machine is out of memory. The kernel can be
written in a way that never deadlocks when the machine is out of memory just
checking the GFP retval. I don't think any in-kernel resource limit is
necessary to have things reliable and fast. Most dynamic big caches and kernel
data can be shrinked dynamically during memory pressure (pheraps except skbs
and I agree that for skbs on gigabit ethernet the thing is a little different).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> Is it safe to sleep on the waitqueue in the kmalloc fail path in
> raid1?

yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO
latencies)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> > huh, what do you mean?
> 
> I mean this:
> 
>   while (!( /* FIXME: now we are rather fault tolerant than nice */

this is fixed in 2.4. The 2.2 RAID code is frozen, and has known
limitations (ie. due to the above RAID1 cannot be used as a swap-device).

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:04:10PM +0200, Ingo Molnar wrote:
> 
> On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
> 
> > Please fix raid1 instead of making things worse.
> 
> huh, what do you mean?

I mean this:

while (!( /* FIXME: now we are rather fault tolerant than nice */
mirror_bh[i] = kmalloc (sizeof (struct buffer_head), GFP_KERNEL)
) )

I've seen in the 2.4.0-test9-pre6 raid1 code the above is gone (and this looks
very promising :)), it is at least proof that some care about the deadlock is
been taken) and you instead sleep on a waitqueue now. While it's not obvious at
all that sleeping on the waitqueue is not deadlock prone (for example getblk
sleeps on a waitqueue bit it's deadlock prone too), at least it's not an
infinite loop anymore and that's still better.

Is it safe to sleep on the waitqueue in the kmalloc fail path in raid1?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> > > Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed
> > > that is a showstopper bug. [...]
> > 
> > why?
> 
> Because as you said the machine can lockup when you run out of memory.

well, i think all kernel-space allocations have to be limited carefully,
denying succeeding allocations is not a solution against over-allocation,
especially in a multi-user environment.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:02:58PM +0200, Ingo Molnar wrote:
> 
> On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
> 
> > Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed
> > that is a showstopper bug. [...]
> 
> why?

Because as you said the machine can lockup when you run out of memory.

> FYI, i havent put it there.

Ok.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> Please fix raid1 instead of making things worse.

huh, what do you mean?

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

> Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed
> that is a showstopper bug. [...]

why?

> machine power for simulations runs out of memory all the time. If you
> put this kind of obvious deadlock into the main kernel allocator

FYI, i havent put it there.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 12:42:09PM +0200, Ingo Molnar wrote:
> believe could simplify unrelated kernel code significantly. Eg. no need to
> check for NULL pointers on most allocations, a GFP_KERNEL allocation
> always succeeds, end of story. This behavior also has the 'nice'

Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed that is a
showstopper bug. We also have another showstopper bug in getblk that will be
hard to fix because people was used to rely on it and they wrote dealdock prone
code.

You should know that people not running benchmarks and and using the machine
power for simulations runs out of memory all the time. If you put this kind of
obvious deadlock into the main kernel allocator you'll screwup the hard work to
fix all the other deadlock problems during OOM that is been done so far. Please
fix raid1 instead of making things worse.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 12:42:09PM +0200, Ingo Molnar wrote:
 believe could simplify unrelated kernel code significantly. Eg. no need to
 check for NULL pointers on most allocations, a GFP_KERNEL allocation
 always succeeds, end of story. This behavior also has the 'nice'

Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed that is a
showstopper bug. We also have another showstopper bug in getblk that will be
hard to fix because people was used to rely on it and they wrote dealdock prone
code.

You should know that people not running benchmarks and and using the machine
power for simulations runs out of memory all the time. If you put this kind of
obvious deadlock into the main kernel allocator you'll screwup the hard work to
fix all the other deadlock problems during OOM that is been done so far. Please
fix raid1 instead of making things worse.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

 Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed
 that is a showstopper bug. [...]

why?

 machine power for simulations runs out of memory all the time. If you
 put this kind of obvious deadlock into the main kernel allocator

FYI, i havent put it there.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

 Please fix raid1 instead of making things worse.

huh, what do you mean?

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:02:58PM +0200, Ingo Molnar wrote:
 
 On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
 
  Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed
  that is a showstopper bug. [...]
 
 why?

Because as you said the machine can lockup when you run out of memory.

 FYI, i havent put it there.

Ok.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

   Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed
   that is a showstopper bug. [...]
  
  why?
 
 Because as you said the machine can lockup when you run out of memory.

well, i think all kernel-space allocations have to be limited carefully,
denying succeeding allocations is not a solution against over-allocation,
especially in a multi-user environment.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:04:10PM +0200, Ingo Molnar wrote:
 
 On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
 
  Please fix raid1 instead of making things worse.
 
 huh, what do you mean?

I mean this:

while (!( /* FIXME: now we are rather fault tolerant than nice */
mirror_bh[i] = kmalloc (sizeof (struct buffer_head), GFP_KERNEL)
) )

I've seen in the 2.4.0-test9-pre6 raid1 code the above is gone (and this looks
very promising :)), it is at least proof that some care about the deadlock is
been taken) and you instead sleep on a waitqueue now. While it's not obvious at
all that sleeping on the waitqueue is not deadlock prone (for example getblk
sleeps on a waitqueue bit it's deadlock prone too), at least it's not an
infinite loop anymore and that's still better.

Is it safe to sleep on the waitqueue in the kmalloc fail path in raid1?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

  huh, what do you mean?
 
 I mean this:
 
   while (!( /* FIXME: now we are rather fault tolerant than nice */

this is fixed in 2.4. The 2.2 RAID code is frozen, and has known
limitations (ie. due to the above RAID1 cannot be used as a swap-device).

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

 Is it safe to sleep on the waitqueue in the kmalloc fail path in
 raid1?

yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO
latencies)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:12:58PM +0200, Ingo Molnar wrote:
 well, i think all kernel-space allocations have to be limited carefully,

When a machine without a gigabit ethernet runs oom it's userspace that
allocated the memory via page faults not the kernel.

And if the careful limit avoids the deadlock in the layer above alloc_pages,
then it will also avoid alloc_pages to return NULL and you won't need an
infinite loop in first place (unless the memory balancing is buggy).

GFP should return NULL only if the machine is out of memory. The kernel can be
written in a way that never deadlocks when the machine is out of memory just
checking the GFP retval. I don't think any in-kernel resource limit is
necessary to have things reliable and fast. Most dynamic big caches and kernel
data can be shrinked dynamically during memory pressure (pheraps except skbs
and I agree that for skbs on gigabit ethernet the thing is a little different).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:21:01PM +0200, Ingo Molnar wrote:
 yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO
 latencies)

Very good! Many thanks Ingo.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

 And if the careful limit avoids the deadlock in the layer above
 alloc_pages, then it will also avoid alloc_pages to return NULL and
 you won't need an infinite loop in first place (unless the memory
 balancing is buggy).

yes i like this property very much because it unearths VM balancing bugs,
which plagued us for so long and are so hard to detect. But statistically
it's also possible that try_to_free_pages() frees a page and alloc_pages()
done on another CPU (or in IRQ context) 'steals' the page. This can
happen, because the VM right now guarantees no straight path from
deallocator to allocator. (and it's not necessery to guarantee it, given
the varying nature of allocation requests.)

 GFP should return NULL only if the machine is out of memory. The
 kernel can be written in a way that never deadlocks when the machine
 is out of memory just checking the GFP retval. I don't think any
 in-kernel resource limit is necessary to have things reliable and
 fast. [...]

Andrea, if you really mean this then you should not be let near the VM
balancing code :-)

 Most dynamic big caches and kernel data can be shrinked dynamically
 during memory pressure (pheraps except skbs and I agree that for skbs
 on gigabit ethernet the thing is a little different).

a big 'except'. You dont need gigabit for that, to the contrary, if the
network is slow it's easier to overallocate within the kernel. Ask Alan
about how many D.O.S. attacks there are possible without implicit or
explicit bean counting.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

  yes. every RAID1-bh has a bound lifetime. (bound by worst-case IO
  latencies)
 
 Very good! Many thanks Ingo.

this was actually coded/fixed by Neil Brown - so the kudos go to him!

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

 Again: the bean counting and all the limit happens at the higher
 layer.  I shouldn't know anything about it when I play with the lower
 layer GFP memory balancing code.

exactly, and this is why if a higher level lets through a GFP_KERNEL, then
it *must* succeed. Otherwise either the higher level code is buggy, or the
VM balance is buggy, but we want to have clear signs of it.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:04:14PM +0200, Ingo Molnar wrote:
 exactly, and this is why if a higher level lets through a GFP_KERNEL, then
 it *must* succeed. Otherwise either the higher level code is buggy, or the
 VM balance is buggy, but we want to have clear signs of it.

I'm not sure if we should restrict the limiting only to the cases that needs
them. For example do_anonymous_page looks a place that could rely on the
GFP retval.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

 I'm not sure if we should restrict the limiting only to the cases that
 needs them. For example do_anonymous_page looks a place that could
 rely on the GFP retval.

i think an application should not fail due to other applications
allocating too much RAM. OOM behavior should be a central thing and based
on allocation patterns, not pure luck or unluck. I always found it rude to
SIGBUS when some other application is abusing RAM but the oom detector has
not yet killed it off.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Marcelo Tosatti


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

snip

 I talked with Alexey about this and it seems the best way is to have a
 per-socket reservation of clean cache in function of the receive window.  So we
 don't need an huge atomic pool but we can have a special lru with an irq
 spinlock that is able to shrink cache from irq as well.

In the current 2.4 VM code, there is a kernel thread called
"kreclaimd".

This thread keeps freeing pages from the inactive clean list when needed
(when zone-free_pages  zone-pages_low), making them available for
atomic allocations.

Do you consider pages_low pages as a "huge atomic pool" ? 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:27:24PM +0200, Ingo Molnar wrote:
 i think an application should not fail due to other applications
 allocating too much RAM. OOM behavior should be a central thing and based

At least Linus's point is that doing perfect accounting (at least on the
userspace allocation side) may cause you to waste resources, failing even if
you could still run and I tend to agree with him. We're lazy on that
side and that's global win in most cases.

We are finegrined with page granularity, not with the mmap granularity.

The point is that not all the mmapped regions are going to be pagedin.  Think a
program that only after 1 hour did all the calculations that allocated all
the memory it requested with malloc.  Before the hour passes the unused memory
can still be used for other things and that's what the user also expects
when he runs `free`.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 11:26:48AM -0300, Marcelo Tosatti wrote:
 This thread keeps freeing pages from the inactive clean list when needed
 (when zone-free_pages  zone-pages_low), making them available for
 atomic allocations.

This is flawed. It's the irq that have to shrink the memory itself. It can't
certainly reschedule kreclaimd and wait it to do the work.

Increasing the free_pages_min limit is the _only_ alternative to having
irqs that are able to shrink clean cache (and hopefully that "feature"
will be resurrected soon since it's the only way to go right now). 

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Rik van Riel

On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
 On Mon, Sep 25, 2000 at 03:02:58PM +0200, Ingo Molnar wrote:
  On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
  
   Sorry I totally disagree. If GFP_KERNEL are garanteeded to succeed
   that is a showstopper bug. [...]
  
  why?
 
 Because as you said the machine can lockup when you run out of memory.

The fix for this is to kill a user process when you're OOM
(you need to do this anyway).

The last few allocations of the "condemned" process can come
frome the reserved pages and the process we killed will exit just
fine.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Alan Cox

  Because as you said the machine can lockup when you run out of memory.
 
 well, i think all kernel-space allocations have to be limited carefully,
 denying succeeding allocations is not a solution against over-allocation,
 especially in a multi-user environment.

GFP_KERNEL has to be able to fail for 2.4. Otherwise you can get everything
jammed in kernel space waiting on GFP_KERNEL and if the swapper cannot make
space you die.

The alternative approach where it cannot fail has to be at higher levels so
you can release other resources that might need freeing for deadlock avoidance
before you retry


Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:43:44PM +0200, Ingo Molnar wrote:
 i talked about GFP_KERNEL, not GFP_USER. Even in the case of GFP_USER i

My bad, you're right I was talking about GFP_USER indeed.

But even GFP_KERNEL allocations like the init of a module or any other thing
that is static sized during production just checking the retval looks be ok.

 believe the right place to oom is via a signal, not in the gfp() case.

Signal can be trapped and ignored by malicious task. We had that security
problem until 2.2.14 IIRC.

 (because oom situation in the gfp() case is a completely random and
 statistical event, which might have no connection at all to the behavior
 of that given process.)

I agree we should have more information about the behaviour of the system
and I think a per-task page fault rate should work in practice.

But my question isn't what you do when you're OOM, but is _how_ do you
notice that you're OOM?

In the GFP_USER case simply checking when GFP fails looks right to me.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Swap on RAID; was: Re: the new VM

2000-09-25 Thread parsley

Ingo Molnar wrote:

 this is fixed in 2.4. The 2.2 RAID code is frozen, and has known
 limitations (ie. due to the above RAID1 cannot be used as a swap-device).

Eh, just to be clear about this: does this apply to the RAID 0.90 code
as commonly patched in by RedHat?  Should I instead use a swap file for
a machine that should be fault-tolerant against a drive failure?

regards,
David
-- 
David L. Parsley
Network Administrator
Roanoke College
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Swap on RAID; was: Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000 [EMAIL PROTECTED] wrote:

  this is fixed in 2.4. The 2.2 RAID code is frozen, and has known
  limitations (ie. due to the above RAID1 cannot be used as a swap-device).

 as commonly patched in by RedHat?  Should I instead use a swap file
 for a machine that should be fault-tolerant against a drive failure?

the answer is yes. RAID5 will not deadlock due to VM problems, but RAID5
might have other problems if the device is being reconstructed *and* used
for swap.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Andrea Arcangeli wrote:

  i think the GFP_USER case should do the oom logic within __alloc_pages(),
 
 What's the difference of implementing the logic outside alloc_pages?
 Putting the logic inside looks not clean design to me.

it gives consistency and simplicity. The allocators themselves do not have
to care about oom.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Alan Cox wrote:

 Unless Im missing something here think about this case
 
 2 active processes, no swap
 
 #1#2
 kmalloc 32K   kmalloc 16K
 OKOK
 kmalloc 16K   kmalloc 32K
 block block
 
 so GFP_KERNEL has to be able to fail - it can wait for I/O in some
 cases with care, but when we have no pages left something has to give

you are right, i agree that synchronous OOM for higher-order allocations
must be preserved (just like ATOMIC allocations). But the overwhelming
majority of allocations is done at page granularity.

with multi-page allocations and the need for physically contiguous
buffers, the problem cannot be solved.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 05:16:06PM +0200, Ingo Molnar wrote:
 situation is just 1% RAM away from the 'root cannot log in', situation.

The root cannot log in is a little different. Just think that in the "root
cannot log in" you only need to press SYSRQ+E (or as worse +I).

If all tasks in the systems are hanging into the GFP loop SYSRQ+I won't solve
the deadlock.

Ok you can add a signal check in the memory balancing code but that looks an
ugly hack that shows the difference between the two cases (the one Alan pointed
out is real deadlock, the current one is kind of live lock that can go away any
time, while the deadlock can reach the point where it can't be recovered
without an hack from an irq somewhere).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:40:44PM +0100, Stephen C. Tweedie wrote:
 Allowing GFP_ATOMIC to eat PF_MEMALLOC's last-chance pages is the
 wrong thing to do if we want to guarantee swapper progress under
 extreme load.

You're definitely right. We at least need the garantee of the memory to
allocate the bhs on top of the swap cache while we atttempt to swapout one page
(that path can't fail at the moment).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread Rik van Riel

On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
 On Mon, Sep 25, 2000 at 04:27:24PM +0200, Ingo Molnar wrote:
  i think an application should not fail due to other applications
  allocating too much RAM. OOM behavior should be a central thing and based
 
 At least Linus's point is that doing perfect accounting (at
 least on the userspace allocation side) may cause you to waste
 resources, failing even if you could still run and I tend to
 agree with him. We're lazy on that side and that's global win in
 most cases.

OK, so do you guys want my OOM-killer selection code
in 2.4? ;)

(that will fix the OOM case in the rare situations
where it occurs and do the expected thing most of the
time)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VM

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 05:26:59PM +0200, Ingo Molnar wrote:
 
 On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
 
   i think the GFP_USER case should do the oom logic within __alloc_pages(),
  
  What's the difference of implementing the logic outside alloc_pages?
  Putting the logic inside looks not clean design to me.
 
 it gives consistency and simplicity. The allocators themselves do not have
 to care about oom.


There are many cases where it is simple to do:

  if( alloc(r1) == fail) goto freeall
  if( alloc(r2) == fail) goto freeall
  if( alloc(r3) == fail) goto freeall

And the alloc functions don't know how to "freeall".

Perhaps it would be good to do an alloc_vec allocation in these cases.
  alloc_vec[0].size = n;
  ..
  alloc_vec[n].size = 0;
  if(kmalloc_all(alloc_vec) == FAIL) return -ENOMEM;
  else  alloc_vec[i].ptr is the pointer.




-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH *] new VM patch for 2.4.0-test8

2000-09-15 Thread Rik van Riel

On Fri, 15 Sep 2000, James Lewis Nance wrote:
> On Fri, Sep 15, 2000 at 10:09:57PM -0300, Rik van Riel wrote:
> > Hi,
> > 
> > today I released a new VM patch with 4 small improvements:
> 
> Are these 4 improvements in the code test9-pre1 patch that Linus
> just released?

No. But I have a patch (that I will mail to the list
once I've tested it a bit more).

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH *] new VM patch for 2.4.0-test8

2000-09-15 Thread Rik van Riel

On Fri, 15 Sep 2000, James Lewis Nance wrote:
> On Fri, Sep 15, 2000 at 10:09:57PM -0300, Rik van Riel wrote:
> > Hi,
> > 
> > today I released a new VM patch with 4 small improvements:
> 
> Are these 4 improvements in the code test9-pre1 patch that Linus
> just released?

Oh well, I may as well give it now ;)

The patch below upgrades 2.4.0-test9-pre1 VM to a
VM with the 4 changes...

They /should/ be stable, but I'd really appreciate
a bit more testing before I give the patch to Linus.

(I know the VM patch included in 2.4.0-test9-pre1 is
stable, that one got a heavier testing than any VM patch
I ever made. I was testing the system so heavily that I
had to upgrade my 8139too driver and other things to keep
the system from crashing ;))

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/


--- linux-2.4.8-test9-pre1/fs/buffer.c.orig Fri Sep 15 23:23:09 2000
+++ linux-2.4.8-test9-pre1/fs/buffer.c  Fri Sep 15 23:26:24 2000
@@ -705,7 +705,6 @@
 static void refill_freelist(int size)
 {
if (!grow_buffers(size)) {
-   //wakeup_bdflush(1);
balance_dirty(NODEV);
wakeup_kswapd(1);
}
@@ -863,15 +862,14 @@
 
dirty = size_buffers_type[BUF_DIRTY] >> PAGE_SHIFT;
tot = nr_free_buffer_pages();
-// tot -= size_buffers_type[BUF_PROTECTED] >> PAGE_SHIFT;
 
dirty *= 200;
soft_dirty_limit = tot * bdf_prm.b_un.nfract;
hard_dirty_limit = soft_dirty_limit * 2;
 
/* First, check for the "real" dirty limit. */
-   if (dirty > soft_dirty_limit || inactive_shortage()) {
-   if (dirty > hard_dirty_limit)
+   if (dirty > soft_dirty_limit) {
+   if (dirty > hard_dirty_limit || inactive_shortage())
return 1;
return 0;
}
@@ -2279,7 +2277,9 @@
 {
struct buffer_head * tmp, * bh = page->buffers;
int index = BUFSIZE_INDEX(bh->b_size);
+   int loop = 0;
 
+cleaned_buffers_try_again:
spin_lock(_list_lock);
write_lock(_table_lock);
spin_lock(_list[index].lock);
@@ -2325,8 +2325,14 @@
spin_unlock(_list[index].lock);
write_unlock(_table_lock);
spin_unlock(_list_lock);
-   if (wait)
+   if (wait) {
sync_page_buffers(bh, wait);
+   /* We waited synchronously, so we can free the buffers. */
+   if (wait > 1 && !loop) {
+   loop = 1;
+   goto cleaned_buffers_try_again;
+   }
+   }
return 0;
 }
 
--- linux-2.4.8-test9-pre1/mm/swap.c.orig   Fri Sep 15 23:23:11 2000
+++ linux-2.4.8-test9-pre1/mm/swap.cFri Sep 15 23:24:23 2000
@@ -161,14 +161,19 @@
 * Don't touch it if it's not on the active list.
 * (some pages aren't on any list at all)
 */
-   if (PageActive(page) && (page_count(page) == 1 || page->buffers) &&
+   if (PageActive(page) && (page_count(page) <= 2 || page->buffers) &&
!page_ramdisk(page)) {
 
/*
 * We can move the page to the inactive_dirty list
 * if we know there is backing store available.
+*
+* We also move pages here that we cannot free yet,
+* but may be able to free later - because most likely
+* we're holding an extra reference on the page which
+* will be dropped right after deactivate_page().
 */
-   if (page->buffers) {
+   if (page->buffers || page_count(page) == 2) {
del_page_from_active_list(page);
add_page_to_inactive_dirty_list(page);
/*
@@ -181,8 +186,7 @@
add_page_to_inactive_clean_list(page);
}
/*
-* ELSE: no backing store available, leave it on
-* the active list.
+* OK, we cannot free the page. Leave it alone.
 */
}
 }  
--- linux-2.4.8-test9-pre1/mm/vmscan.c.orig Fri Sep 15 23:23:11 2000
+++ linux-2.4.8-test9-pre1/mm/vmscan.c  Fri Sep 15 23:32:10 2000
@@ -103,8 +103,8 @@
UnlockPage(page);
vma->vm_mm->rss--;
flush_tlb_page(vma, address);
-   page_cache_release(page);
deactivate_page(page);
+   page_cache_release(page);
goto out_failed;
}
 
@@ -681,19 +681,26 @@
if (freed_page && !free_shortage())
break;
continue;
+   } else if (page->mapping && !PageDirty(page)) {
+   /*
+* If a page had an extra reference in
+* deactivate_page(), we will 

Re: [PATCH *] new VM patch for 2.4.0-test8

2000-09-15 Thread James Lewis Nance

On Fri, Sep 15, 2000 at 10:09:57PM -0300, Rik van Riel wrote:
> Hi,
> 
> today I released a new VM patch with 4 small improvements:

Are these 4 improvements in the code test9-pre1 patch that Linus just
released?

Jim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH *] new VM patch for 2.4.0-test8

2000-09-15 Thread James Lewis Nance

On Fri, Sep 15, 2000 at 10:09:57PM -0300, Rik van Riel wrote:
 Hi,
 
 today I released a new VM patch with 4 small improvements:

Are these 4 improvements in the code test9-pre1 patch that Linus just
released?

Jim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH *] new VM patch for 2.4.0-test8

2000-09-15 Thread Rik van Riel

On Fri, 15 Sep 2000, James Lewis Nance wrote:
 On Fri, Sep 15, 2000 at 10:09:57PM -0300, Rik van Riel wrote:
  Hi,
  
  today I released a new VM patch with 4 small improvements:
 
 Are these 4 improvements in the code test9-pre1 patch that Linus
 just released?

Oh well, I may as well give it now ;)

The patch below upgrades 2.4.0-test9-pre1 VM to a
VM with the 4 changes...

They /should/ be stable, but I'd really appreciate
a bit more testing before I give the patch to Linus.

(I know the VM patch included in 2.4.0-test9-pre1 is
stable, that one got a heavier testing than any VM patch
I ever made. I was testing the system so heavily that I
had to upgrade my 8139too driver and other things to keep
the system from crashing ;))

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/


--- linux-2.4.8-test9-pre1/fs/buffer.c.orig Fri Sep 15 23:23:09 2000
+++ linux-2.4.8-test9-pre1/fs/buffer.c  Fri Sep 15 23:26:24 2000
@@ -705,7 +705,6 @@
 static void refill_freelist(int size)
 {
if (!grow_buffers(size)) {
-   //wakeup_bdflush(1);
balance_dirty(NODEV);
wakeup_kswapd(1);
}
@@ -863,15 +862,14 @@
 
dirty = size_buffers_type[BUF_DIRTY]  PAGE_SHIFT;
tot = nr_free_buffer_pages();
-// tot -= size_buffers_type[BUF_PROTECTED]  PAGE_SHIFT;
 
dirty *= 200;
soft_dirty_limit = tot * bdf_prm.b_un.nfract;
hard_dirty_limit = soft_dirty_limit * 2;
 
/* First, check for the "real" dirty limit. */
-   if (dirty  soft_dirty_limit || inactive_shortage()) {
-   if (dirty  hard_dirty_limit)
+   if (dirty  soft_dirty_limit) {
+   if (dirty  hard_dirty_limit || inactive_shortage())
return 1;
return 0;
}
@@ -2279,7 +2277,9 @@
 {
struct buffer_head * tmp, * bh = page-buffers;
int index = BUFSIZE_INDEX(bh-b_size);
+   int loop = 0;
 
+cleaned_buffers_try_again:
spin_lock(lru_list_lock);
write_lock(hash_table_lock);
spin_lock(free_list[index].lock);
@@ -2325,8 +2325,14 @@
spin_unlock(free_list[index].lock);
write_unlock(hash_table_lock);
spin_unlock(lru_list_lock);
-   if (wait)
+   if (wait) {
sync_page_buffers(bh, wait);
+   /* We waited synchronously, so we can free the buffers. */
+   if (wait  1  !loop) {
+   loop = 1;
+   goto cleaned_buffers_try_again;
+   }
+   }
return 0;
 }
 
--- linux-2.4.8-test9-pre1/mm/swap.c.orig   Fri Sep 15 23:23:11 2000
+++ linux-2.4.8-test9-pre1/mm/swap.cFri Sep 15 23:24:23 2000
@@ -161,14 +161,19 @@
 * Don't touch it if it's not on the active list.
 * (some pages aren't on any list at all)
 */
-   if (PageActive(page)  (page_count(page) == 1 || page-buffers) 
+   if (PageActive(page)  (page_count(page) = 2 || page-buffers) 
!page_ramdisk(page)) {
 
/*
 * We can move the page to the inactive_dirty list
 * if we know there is backing store available.
+*
+* We also move pages here that we cannot free yet,
+* but may be able to free later - because most likely
+* we're holding an extra reference on the page which
+* will be dropped right after deactivate_page().
 */
-   if (page-buffers) {
+   if (page-buffers || page_count(page) == 2) {
del_page_from_active_list(page);
add_page_to_inactive_dirty_list(page);
/*
@@ -181,8 +186,7 @@
add_page_to_inactive_clean_list(page);
}
/*
-* ELSE: no backing store available, leave it on
-* the active list.
+* OK, we cannot free the page. Leave it alone.
 */
}
 }  
--- linux-2.4.8-test9-pre1/mm/vmscan.c.orig Fri Sep 15 23:23:11 2000
+++ linux-2.4.8-test9-pre1/mm/vmscan.c  Fri Sep 15 23:32:10 2000
@@ -103,8 +103,8 @@
UnlockPage(page);
vma-vm_mm-rss--;
flush_tlb_page(vma, address);
-   page_cache_release(page);
deactivate_page(page);
+   page_cache_release(page);
goto out_failed;
}
 
@@ -681,19 +681,26 @@
if (freed_page  !free_shortage())
break;
continue;
+   } else if (page-mapping  !PageDirty(page)) {
+   /*
+* If a page had an extra reference in
+* deactivate_page(), we will find it here.
+