Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Linus Torvalds


On Sun, 8 Jul 2001, Rik van Riel wrote:
>
> If __wait_on_buffer and ___wait_on_page get stuck, this could
> mean a page doesn't get unlocked.  When this is happening, we
> may well be running into a dozens of pages which aren't getting
> properly unlocked on IO completion.

Absolutely. But that, in turn, should cause just others getting stuck, not
running, no?

Anyway, having looked at the buffer case, I htink I found a potentially
nasty bug: "unlock_buffer()" with a buffer cout of zero.

Why is this nasty? unlock_buffer() does:

extern inline void unlock_buffer(struct buffer_head *bh)
{
clear_bit(BH_Lock, >b_state);
smp_mb__after_clear_bit();
if (waitqueue_active(>b_wait))
wake_up(>b_wait);
}

but by doing the "clear_bit()", it also potentially free's the buffer, so
an interrupt coming in (or another CPU) can end up doing a kfree() on the
bh.

At which point the "waitqueue_active()" and the wakeup call are operating
on random memory.

This does not explain __wait_on_buffer(), but it's a bug none-the-less.

Anybody can find anything else fishy with buffer handling?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Rik van Riel

On Sun, 8 Jul 2001, Linus Torvalds wrote:


>  (a) _had_ the page been on any of the aging lists, it would have been
>  aged down every time we passed it, and
>  (b) it's obviously been aged up every time we passed it in the VM so far
>  (because it hadn't been added to the swap cache earlier).

>  - an anonymous page, by the time we add it to the swap cache, would have
>been aged down and up roughly the same number of times.

Hmmm, indeed.  I guess this also means page aging in its
current form cannot even work well with exponential down
aging since the down aging on the pageout list always
cancels out the up aging in swap_out() ...

I guess it's time we found some volunteers to experiment
with linear down aging (page->age--;) since that one will
be able to withstand pages being referenced only in the
page tables.

(now, off to a project 4000 km from home for the next 2
weeks ... bbl)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Rik van Riel

On Sun, 8 Jul 2001, Linus Torvalds wrote:
> On Sun, 8 Jul 2001, Rik van Riel wrote:
> >
> > ... Bingo.  You hit the infamous __wait_on_buffer / ___wait_on_page
> > bug. I've seen this for quite a while now on our quad xeon test
> > machine, with some kernel versions it can be reproduced in minutes,
> > with others it won't trigger at all.
>
> Hmm.. That would explain why the "tar" gets stuck, but why does the whole
> machine grind to a halt with all other processes being marked runnable?

If __wait_on_buffer and ___wait_on_page get stuck, this could
mean a page doesn't get unlocked.  When this is happening, we
may well be running into a dozens of pages which aren't getting
properly unlocked on IO completion.

This in turn would get the rest of the system stuck in the
pageout code path, eating CPU like crazy.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Linus Torvalds


On Sun, 8 Jul 2001, Rik van Riel wrote:
>
> ... Bingo.  You hit the infamous __wait_on_buffer / ___wait_on_page
> bug. I've seen this for quite a while now on our quad xeon test
> machine, with some kernel versions it can be reproduced in minutes,
> with others it won't trigger at all.

Hmm.. That would explain why the "tar" gets stuck, but why does the whole
machine grind to a halt with all other processes being marked runnable?

> I hope there is somebody out there who can RELIABLY trigger
> this bug, so we have a chance of tracking it down.
>
> > tar
> > Trace; c012f2da <__wait_on_buffer+6a/8c>
> > Trace; c01303c9 

I wonder if "getblk()" returned a locked not-up-to-date buffer.. That
would explain how the buffer stays locked forever - the "ll_rw_block()"
will not actually submit any IO on a locked buffer, so there won't be any
IO to release it.

And it's interesting to see that this happens for a _inode_ block, not a
data block - which could easily have been dirty and scheduled for a
write-out. So I wonder if there is some race between "write buffer and try
to free it" and "getblk()".

The locking in "try_to_free_buffers()" is rather anal, so I don't see how
this could happen, but..

That still doesn't explain why everybody is busy running. I'd have
expected all the processes to end up waiting for the page or buffer, not
stuck in a live-lock.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Mike Galbraith

On Sun, 8 Jul 2001, Rik van Riel wrote:

> On Sun, 8 Jul 2001, Mike Galbraith wrote:
>
> > is very oom with no disk activity.  It _looks_ (xmm and vmstat) like
> > it just ran out of cleanable dirty pages.  With or without swap,
>
> ... Bingo.  You hit the infamous __wait_on_buffer / ___wait_on_page
> bug. I've seen this for quite a while now on our quad xeon test
> machine, with some kernel versions it can be reproduced in minutes,
> with others it won't trigger at all.
>
> And after a recompile it's usually gone ...
>
> I hope there is somebody out there who can RELIABLY trigger
> this bug, so we have a chance of tracking it down.

Well, my box seems to think I'm a somebody.  If it changes it's mind,
I'll let you know.  I'll throw whatever rocks I can find at it to get
it all angry and confused.  You sneak up behind it and do the stake and
mallot number.

tar -rvf /dev/null /usr/local (10 gig of.. mess) with X/KDE running
seems 100% repeatable here.

'scuse me while I go recompile again and hope it just goes away ;-)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Linus Torvalds


On Sat, 7 Jul 2001, Rik van Riel wrote:
>
> Not quite. The more a page has been used, the higher the
> page->age will be. This means the system has a way to
> distinguish between anonymous pages which were used once
> and anonymous pages which are used lots of times.

Wrong.

We already _have_ that aging: it's called "do not add anonymous pages to
the page cache unless they are old".

Pages that are used lots of time won't ever _get_ to the point where they
get added to the swap cache, because they are always marked young.

So by the time we get to this point, we _know_ what the age should be. I
tried to explain this to you earlier. We should NOT use the old
"page->age", because that one is 100% and totally bogus. It has _nothing_
to do with the page age. It's ben randomly incremented, without ever
having been on any of the aging lists, and as such it is a totally bogus
number.

In comparison, just setting page->age to PAGE_AGE_START is _not_ a random
number. It's a reasonable number that depends on the _knowledge_ that

 (a) _had_ the page been on any of the aging lists, it would have been
 aged down every time we passed it, and
 (b) it's obviously been aged up every time we passed it in the VM so far
 (because it hadn't been added to the swap cache earlier).

Are you with me?

Now, add to the above two _facts_, the knowledge that the aging of the VM
space is done roughly at the same rate as the aging of the active lists
(we call "swap_out()" every time we age the active list when under memory
pressure, and they go through similar percentages of their respective
address spaces), and you get

 - an anonymous page, by the time we add it to the swap cache, would have
   been aged down and up roughly the same number of times.

Ergo, it's age _should_ be the same as PAGE_AGE_START.

> > That would certainly help explain why aging doesn't work for some people.
>
> As would your patch ;)

No. Do the math. My patch gives the age the _right_ age. Previously, it
had a completely random age that had _nothing_ to do with any other page
age.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Rik van Riel

On Sun, 8 Jul 2001, Mike Galbraith wrote:

> is very oom with no disk activity.  It _looks_ (xmm and vmstat) like
> it just ran out of cleanable dirty pages.  With or without swap,

... Bingo.  You hit the infamous __wait_on_buffer / ___wait_on_page
bug. I've seen this for quite a while now on our quad xeon test
machine, with some kernel versions it can be reproduced in minutes,
with others it won't trigger at all.

And after a recompile it's usually gone ...

I hope there is somebody out there who can RELIABLY trigger
this bug, so we have a chance of tracking it down.

> tar
> Trace; c012f2da <__wait_on_buffer+6a/8c>
> Trace; c01303c9 
> Trace; c01500ea 
> Trace; c01411f5 
> Trace; c0141416 
> Trace; c0150b03 
> Trace; c0138401 
> Trace; c0137aed 
> Trace; c01389d8 <__user_walk+3c/58>
> Trace; c0135cc6 
> Trace; c0106ae3 

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Rik van Riel

On Sun, 8 Jul 2001, Mike Galbraith wrote:

 is very oom with no disk activity.  It _looks_ (xmm and vmstat) like
 it just ran out of cleanable dirty pages.  With or without swap,

... Bingo.  You hit the infamous __wait_on_buffer / ___wait_on_page
bug. I've seen this for quite a while now on our quad xeon test
machine, with some kernel versions it can be reproduced in minutes,
with others it won't trigger at all.

And after a recompile it's usually gone ...

I hope there is somebody out there who can RELIABLY trigger
this bug, so we have a chance of tracking it down.

 tar
 Trace; c012f2da __wait_on_buffer+6a/8c
 Trace; c01303c9 bread+45/64
 Trace; c01500ea ext2_read_inode+fe/3c8
 Trace; c01411f5 get_new_inode+d1/15c
 Trace; c0141416 iget4+c2/d4
 Trace; c0150b03 ext2_lookup+43/68
 Trace; c0138401 path_walk+529/748
 Trace; c0137aed getname+5d/9c
 Trace; c01389d8 __user_walk+3c/58
 Trace; c0135cc6 sys_lstat64+16/70
 Trace; c0106ae3 system_call+33/38

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Linus Torvalds


On Sat, 7 Jul 2001, Rik van Riel wrote:

 Not quite. The more a page has been used, the higher the
 page-age will be. This means the system has a way to
 distinguish between anonymous pages which were used once
 and anonymous pages which are used lots of times.

Wrong.

We already _have_ that aging: it's called do not add anonymous pages to
the page cache unless they are old.

Pages that are used lots of time won't ever _get_ to the point where they
get added to the swap cache, because they are always marked young.

So by the time we get to this point, we _know_ what the age should be. I
tried to explain this to you earlier. We should NOT use the old
page-age, because that one is 100% and totally bogus. It has _nothing_
to do with the page age. It's ben randomly incremented, without ever
having been on any of the aging lists, and as such it is a totally bogus
number.

In comparison, just setting page-age to PAGE_AGE_START is _not_ a random
number. It's a reasonable number that depends on the _knowledge_ that

 (a) _had_ the page been on any of the aging lists, it would have been
 aged down every time we passed it, and
 (b) it's obviously been aged up every time we passed it in the VM so far
 (because it hadn't been added to the swap cache earlier).

Are you with me?

Now, add to the above two _facts_, the knowledge that the aging of the VM
space is done roughly at the same rate as the aging of the active lists
(we call swap_out() every time we age the active list when under memory
pressure, and they go through similar percentages of their respective
address spaces), and you get

 - an anonymous page, by the time we add it to the swap cache, would have
   been aged down and up roughly the same number of times.

Ergo, it's age _should_ be the same as PAGE_AGE_START.

  That would certainly help explain why aging doesn't work for some people.

 As would your patch ;)

No. Do the math. My patch gives the age the _right_ age. Previously, it
had a completely random age that had _nothing_ to do with any other page
age.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Mike Galbraith

On Sun, 8 Jul 2001, Rik van Riel wrote:

 On Sun, 8 Jul 2001, Mike Galbraith wrote:

  is very oom with no disk activity.  It _looks_ (xmm and vmstat) like
  it just ran out of cleanable dirty pages.  With or without swap,

 ... Bingo.  You hit the infamous __wait_on_buffer / ___wait_on_page
 bug. I've seen this for quite a while now on our quad xeon test
 machine, with some kernel versions it can be reproduced in minutes,
 with others it won't trigger at all.

 And after a recompile it's usually gone ...

 I hope there is somebody out there who can RELIABLY trigger
 this bug, so we have a chance of tracking it down.

Well, my box seems to think I'm a somebody.  If it changes it's mind,
I'll let you know.  I'll throw whatever rocks I can find at it to get
it all angry and confused.  You sneak up behind it and do the stake and
mallot number.

tar -rvf /dev/null /usr/local (10 gig of.. mess) with X/KDE running
seems 100% repeatable here.

'scuse me while I go recompile again and hope it just goes away ;-)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Linus Torvalds


On Sun, 8 Jul 2001, Rik van Riel wrote:

 ... Bingo.  You hit the infamous __wait_on_buffer / ___wait_on_page
 bug. I've seen this for quite a while now on our quad xeon test
 machine, with some kernel versions it can be reproduced in minutes,
 with others it won't trigger at all.

Hmm.. That would explain why the tar gets stuck, but why does the whole
machine grind to a halt with all other processes being marked runnable?

 I hope there is somebody out there who can RELIABLY trigger
 this bug, so we have a chance of tracking it down.

  tar
  Trace; c012f2da __wait_on_buffer+6a/8c
  Trace; c01303c9 bread+45/64

I wonder if getblk() returned a locked not-up-to-date buffer.. That
would explain how the buffer stays locked forever - the ll_rw_block()
will not actually submit any IO on a locked buffer, so there won't be any
IO to release it.

And it's interesting to see that this happens for a _inode_ block, not a
data block - which could easily have been dirty and scheduled for a
write-out. So I wonder if there is some race between write buffer and try
to free it and getblk().

The locking in try_to_free_buffers() is rather anal, so I don't see how
this could happen, but..

That still doesn't explain why everybody is busy running. I'd have
expected all the processes to end up waiting for the page or buffer, not
stuck in a live-lock.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Rik van Riel

On Sun, 8 Jul 2001, Linus Torvalds wrote:
 On Sun, 8 Jul 2001, Rik van Riel wrote:
 
  ... Bingo.  You hit the infamous __wait_on_buffer / ___wait_on_page
  bug. I've seen this for quite a while now on our quad xeon test
  machine, with some kernel versions it can be reproduced in minutes,
  with others it won't trigger at all.

 Hmm.. That would explain why the tar gets stuck, but why does the whole
 machine grind to a halt with all other processes being marked runnable?

If __wait_on_buffer and ___wait_on_page get stuck, this could
mean a page doesn't get unlocked.  When this is happening, we
may well be running into a dozens of pages which aren't getting
properly unlocked on IO completion.

This in turn would get the rest of the system stuck in the
pageout code path, eating CPU like crazy.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Rik van Riel

On Sun, 8 Jul 2001, Linus Torvalds wrote:


  (a) _had_ the page been on any of the aging lists, it would have been
  aged down every time we passed it, and
  (b) it's obviously been aged up every time we passed it in the VM so far
  (because it hadn't been added to the swap cache earlier).

  - an anonymous page, by the time we add it to the swap cache, would have
been aged down and up roughly the same number of times.

Hmmm, indeed.  I guess this also means page aging in its
current form cannot even work well with exponential down
aging since the down aging on the pageout list always
cancels out the up aging in swap_out() ...

I guess it's time we found some volunteers to experiment
with linear down aging (page-age--;) since that one will
be able to withstand pages being referenced only in the
page tables.

(now, off to a project 4000 km from home for the next 2
weeks ... bbl)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-08 Thread Linus Torvalds


On Sun, 8 Jul 2001, Rik van Riel wrote:

 If __wait_on_buffer and ___wait_on_page get stuck, this could
 mean a page doesn't get unlocked.  When this is happening, we
 may well be running into a dozens of pages which aren't getting
 properly unlocked on IO completion.

Absolutely. But that, in turn, should cause just others getting stuck, not
running, no?

Anyway, having looked at the buffer case, I htink I found a potentially
nasty bug: unlock_buffer() with a buffer cout of zero.

Why is this nasty? unlock_buffer() does:

extern inline void unlock_buffer(struct buffer_head *bh)
{
clear_bit(BH_Lock, bh-b_state);
smp_mb__after_clear_bit();
if (waitqueue_active(bh-b_wait))
wake_up(bh-b_wait);
}

but by doing the clear_bit(), it also potentially free's the buffer, so
an interrupt coming in (or another CPU) can end up doing a kfree() on the
bh.

At which point the waitqueue_active() and the wakeup call are operating
on random memory.

This does not explain __wait_on_buffer(), but it's a bug none-the-less.

Anybody can find anything else fishy with buffer handling?

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Alan Cox wrote:

> > > Its certainly misleading. I got Jeff to try making oom return
> > > 4999 out of 5000 times regardless.
> >
> > In that case, he _is_ OOM.  ;)
>
> Hardly
>
> > 1) (almost) no free memory
> > 2) no free swap
> > 3) very little pagecache + buffer cache
>
> Large amounts of cache, which went away when the OOM code was neutered

So Jeff backed out my patch before testing yours? ;)

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Alan Cox

> > Its certainly misleading. I got Jeff to try making oom return
> > 4999 out of 5000 times regardless.
> 
> In that case, he _is_ OOM.  ;)

Hardly

> 1) (almost) no free memory
> 2) no free swap
> 3) very little pagecache + buffer cache

Large amounts of cache, which went away when the OOM code was neutered

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Rik van Riel wrote:
> 
> On Sat, 7 Jul 2001, Alan Cox wrote:
> 
> > > instead. That way the vmstat output might be more useful, although vmstat
> > > obviously won't know about the new "SwapCache:" field..
> > >
> > > Can you try that, and see if something else stands out once the misleading
> > > accounting is taken care of?
> >
> > Its certainly misleading. I got Jeff to try making oom return
> > 4999 out of 5000 times regardless.
> 
> In that case, he _is_ OOM.  ;)
> 
> 1) (almost) no free memory
> 2) no free swap
> 3) very little pagecache + buffer cache

It got -considerably- farther after Alan's suggested hack to the OOM
killer; so at least in this instance, OOM killer appeared to me to be
killing too early...

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Alan Cox

> But neutering the OOM killer like Alan suggested may be a rather valid
> approach anyway. Delaying the killing sounds valid: if we're truly
> livelocked on the VM, we'll be calling down to the OOM killer so much that
> it's probably quite valid to say "only return 1 after X iterations".

Its hiding the real accounting screw up with a 'goes bang at random less 
often' - nice hack, but IMHO bad long term approach. We need to get the maths
right. We had similar 2.2 problems the other way (with nasty deadlocks)
until Andrea fixed that


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Alan Cox wrote:

> > instead. That way the vmstat output might be more useful, although vmstat
> > obviously won't know about the new "SwapCache:" field..
> >
> > Can you try that, and see if something else stands out once the misleading
> > accounting is taken care of?
>
> Its certainly misleading. I got Jeff to try making oom return
> 4999 out of 5000 times regardless.

In that case, he _is_ OOM.  ;)

1) (almost) no free memory
2) no free swap
3) very little pagecache + buffer cache

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Alan Cox

> instead. That way the vmstat output might be more useful, although vmstat
> obviously won't know about the new "SwapCache:" field..
> 
> Can you try that, and see if something else stands out once the misleading
> accounting is taken care of?

Its certainly misleading. I got Jeff to try making oom return 4999 out of 5000
times regardless.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Linus Torvalds wrote:

> In fact, I do not see any part of the whole path that sets the
> page age at all, so we're basically using a completely
> uninitialized field here (it's been initialized way back when
> the page was allocated, but because it hasn't been part of the
> normal aging scheme it has only been aged up, never down, so the
> value is pretty much random by the time we actually add it to
> the swap cache pool).

Not quite. The more a page has been used, the higher the
page->age will be. This means the system has a way to
distinguish between anonymous pages which were used once
and anonymous pages which are used lots of times.


> Suggested fix:

[snip disabling of page aging for anonymous memory]

> That would certainly help explain why aging doesn't work for some people.

As would your patch ;)

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Linus Torvalds


On Sat, 7 Jul 2001, Rik van Riel wrote:
>
> Not at all. Note that try_to_swap_out() will happily
> create swap cache pages with a very high page->age,
> pages which are in absolutely no danger of being
> evicted from memory...

That seems to be a bug in "add_to_swap_cache()".

In fact, I do not see any part of the whole path that sets the page age at
all, so we're basically using a completely uninitialized field here (it's
been initialized way back when the page was allocated, but because it
hasn't been part of the normal aging scheme it has only been aged up,
never down, so the value is pretty much random by the time we actually add
it to the swap cache pool).

Suggested fix:

--- v2.4.6/linux/mm/swap_state.cTue Jul  3 17:08:22 2001
+++ linux/mm/swap_state.c   Sat Jul  7 11:49:13 2001
@@ -81,6 +81,7 @@
BUG();
flags = page->flags & ~((1 << PG_error) | (1 << PG_arch_1));
page->flags = flags | (1 << PG_uptodate);
+   page->age = PAGE_AGE_START;
add_to_page_cache_locked(page, _space, entry.val);
 }

Does that make a difference for people?

That would certainly help explain why aging doesn't work for some people.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Jeff Garzik wrote:

> Sigh.  since I am a VM ignoramus I doubt my opinion matters much
> at all here... but it would be nice if oddball configurations
> like 384MB with 50MB swap could be supported.

It would be fun if we had 48 hours in a day, too ;)

This particular thing has been on the TODO list of the
VM developers for a while, but we just haven't gotten
around to it.

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Linus Torvalds wrote:
> 
> On Sat, 7 Jul 2001, Jeff Garzik wrote:
> > Linus Torvalds wrote:
> > >
> > > Now, the fact that the system appears unusable does obviously mean that
> > > something is wrong. But you're barking up the wrong tree.
> >
> > Two more additional data points,
> >
> > 1) partially kernel-unrelated.  MDK's "make" macro didn't support
> > alpha's /proc/cpuinfo output, "make -j$numprocs" became "make -j" and
> > fun ensued.
> 
> Ahh, well..
> 
> The kernel source code is set up to scale quite well, so yes a "make -j"
> will parallellise a bit teoo well for most machines, and you'll certainly
> run out of memory on just about anything (I routinely get load averages of
> 30+, and yes, you need at least half a GB of RAM for it to not be
> unpleasant - and probably more like a full gigabyte on an alpha).

"make -j" is a lot of fun on a dual athlon w/ 512mb :)

> So I definitely think the kernel likely did the right thing. It's not even
> clear that the OOM killer might not have been right - due to the 2.4.x
> swap space allocation, 256MB of swap-space is a bit tight on a 384MB
> machine that actually wants to use a lot of memory.

Sigh.  since I am a VM ignoramus I doubt my opinion matters much at all
here... but it would be nice if oddball configurations like 384MB with
50MB swap could be supported.  I don't ask that it perform optimally at
all, but at least the machine should behave predictably...

This type of swap configuration makes sense for, "my working set is
pretty much always in RAM, including i/dcache, but let's have some swap
just-in-case"


> > 2) I agree that 200MB into swap and 200MB into cache isn't bad per se,
> > but when it triggers the OOM killer it is bad.
> 
> Note that it might easily have been 256MB into swap (ie it had eaten _all_
> of your swap) at some stage - and you just didn't see it in the vmstat
> output because obviously at that point the machine was a bit loaded.

I'm pretty sure swap was 100% full.  I should have sysrq'd and checked
but I forgot.


> But neutering the OOM killer like Alan suggested may be a rather valid
> approach anyway. Delaying the killing sounds valid: if we're truly
> livelocked on the VM, we'll be calling down to the OOM killer so much that
> it's probably quite valid to say "only return 1 after X iterations".

cnt % 5000 may have been a bit extreme but it was fun to see it thrash. 
sysrq was pretty much the only talking point into the system.

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Jeff Garzik wrote:

> 2) I agree that 200MB into swap and 200MB into cache isn't bad
> per se, but when it triggers the OOM killer it is bad.

Please read my patch for the OOM killer. It substracts the
swap cache from the cache figure you quote and ONLY goes
into oom_kill() if the page & buffer cache together take
less than 4% of memory (see /proc/sys/vm/{buffermem,pagecache}).

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Linus Torvalds


On Sat, 7 Jul 2001, Jeff Garzik wrote:
> Linus Torvalds wrote:
> >
> > Now, the fact that the system appears unusable does obviously mean that
> > something is wrong. But you're barking up the wrong tree.
>
> Two more additional data points,
>
> 1) partially kernel-unrelated.  MDK's "make" macro didn't support
> alpha's /proc/cpuinfo output, "make -j$numprocs" became "make -j" and
> fun ensued.

Ahh, well..

The kernel source code is set up to scale quite well, so yes a "make -j"
will parallellise a bit teoo well for most machines, and you'll certainly
run out of memory on just about anything (I routinely get load averages of
30+, and yes, you need at least half a GB of RAM for it to not be
unpleasant - and probably more like a full gigabyte on an alpha).

So I definitely think the kernel likely did the right thing. It's not even
clear that the OOM killer might not have been right - due to the 2.4.x
swap space allocation, 256MB of swap-space is a bit tight on a 384MB
machine that actually wants to use a lot of memory.

> 2) I agree that 200MB into swap and 200MB into cache isn't bad per se,
> but when it triggers the OOM killer it is bad.

Note that it might easily have been 256MB into swap (ie it had eaten _all_
of your swap) at some stage - and you just didn't see it in the vmstat
output because obviously at that point the machine was a bit loaded.

But neutering the OOM killer like Alan suggested may be a rather valid
approach anyway. Delaying the killing sounds valid: if we're truly
livelocked on the VM, we'll be calling down to the OOM killer so much that
it's probably quite valid to say "only return 1 after X iterations".

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Linus Torvalds wrote:
> 
> On Sat, 7 Jul 2001, Jeff Garzik wrote:
> >
> > When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha,
> > the system goes --deeply-- into swap.  Not pretty at all.  The system
> > will be 200MB+ into swap, with 200MB+ in cache!  I presume this affects
> > 2.4.7-release also.
> 
> Note that "200MB+ into swap, with 200MB+ in cache" is NOT bad in itself.
> 
> It only means that we have scanned the VM, and allocated swap-space for
> 200MB worth of VM space. It does NOT necessarily mean that any actual
> swapping has been taking place: you should realize that the "cache" is
> likely to be not at least partly the _swap_ cache that hasn't been written
> out.
> 
> This is an accounting problem, nothing more. It looks strange, but it's
> normal.
> 
> Now, the fact that the system appears unusable does obviously mean that
> something is wrong. But you're barking up the wrong tree.

Two more additional data points,

1) partially kernel-unrelated.  MDK's "make" macro didn't support
alpha's /proc/cpuinfo output, "make -j$numprocs" became "make -j" and
fun ensued.

2) I agree that 200MB into swap and 200MB into cache isn't bad per se,
but when it triggers the OOM killer it is bad.

Alan suggested that I insert the following into the OOM killer code, as
the last test before returning 1.

cnt++;
if ((cnt % 5000) != 0)
return 0;

I did this, and while watching "vmstat 3", the cache was indeed being
trimmed, whereas it was not before.

So, the OOM killer appears to be getting triggered early, but the rest
of the report was my screwup not the kernel.

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Linus Torvalds


On Sat, 7 Jul 2001, Jeff Garzik wrote:
>
> When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha,
> the system goes --deeply-- into swap.  Not pretty at all.  The system
> will be 200MB+ into swap, with 200MB+ in cache!  I presume this affects
> 2.4.7-release also.

Note that "200MB+ into swap, with 200MB+ in cache" is NOT bad in itself.

It only means that we have scanned the VM, and allocated swap-space for
200MB worth of VM space. It does NOT necessarily mean that any actual
swapping has been taking place: you should realize that the "cache" is
likely to be not at least partly the _swap_ cache that hasn't been written
out.

This is an accounting problem, nothing more. It looks strange, but it's
normal.

Now, the fact that the system appears unusable does obviously mean that
something is wrong. But you're barking up the wrong tree.

Although it might be the "right tree" in the sense that we might want to
remove the swap cache from the "cached" output in /proc/meminfo. It might
be more useful to separate out "Cached" and "SwapCache": add a new line to
/proc/meminfo that is "swapper_space.nr_pages", and make the current code
that does

atomic_read(_cache_size)

do

(atomic_read(_cache_size) - swapper_space.nrpages)

instead. That way the vmstat output might be more useful, although vmstat
obviously won't know about the new "SwapCache:" field..

Can you try that, and see if something else stands out once the misleading
accounting is taken care of?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Jeff Garzik wrote:
> 
> Oh this is a fun one :)
> 
> When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha,
> the system goes --deeply-- into swap.  Not pretty at all.  The system
> will be 200MB+ into swap, with 200MB+ in cache!  I presume this affects
> 2.4.7-release also.
> 
> System has 256MB of swap, and 384MB of RAM.
> 
> Only patches applied are Rik's recent OOM killer friendliness patch, and
> Andrea's ksoftirq patch.
> 
> I ran "vmstat 3" throughout the build, and that output is attached.  I
> also manually ran "ps wwwaux >> ps.txt" periodically.  This second
> output is not overly helpful, because the system was swapping and
> unuseable for the times when the 'ps' output would be most useful.

Sorry, I forgot to mention that OOM killer kicked in twice.  You can
probably pick out the points where it kicked in, in the vmstat output.

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Oh this is a fun one :)

When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha,
the system goes --deeply-- into swap.  Not pretty at all.  The system
will be 200MB+ into swap, with 200MB+ in cache!  I presume this affects
2.4.7-release also.

System has 256MB of swap, and 384MB of RAM.

Only patches applied are Rik's recent OOM killer friendliness patch, and
Andrea's ksoftirq patch.

I ran "vmstat 3" throughout the build, and that output is attached.  I
also manually ran "ps wwwaux >> ps.txt" periodically.  This second
output is not overly helpful, because the system was swapping and
unuseable for the times when the 'ps' output would be most useful.

Both outputs are attached, as they compressed pretty nicely.

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
 vmstat.txt.bz2
 ps.txt.bz2


VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Oh this is a fun one :)

When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha,
the system goes --deeply-- into swap.  Not pretty at all.  The system
will be 200MB+ into swap, with 200MB+ in cache!  I presume this affects
2.4.7-release also.

System has 256MB of swap, and 384MB of RAM.

Only patches applied are Rik's recent OOM killer friendliness patch, and
Andrea's ksoftirq patch.

I ran vmstat 3 throughout the build, and that output is attached.  I
also manually ran ps wwwaux  ps.txt periodically.  This second
output is not overly helpful, because the system was swapping and
unuseable for the times when the 'ps' output would be most useful.

Both outputs are attached, as they compressed pretty nicely.

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
 vmstat.txt.bz2
 ps.txt.bz2


Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Jeff Garzik wrote:
 
 Oh this is a fun one :)
 
 When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha,
 the system goes --deeply-- into swap.  Not pretty at all.  The system
 will be 200MB+ into swap, with 200MB+ in cache!  I presume this affects
 2.4.7-release also.
 
 System has 256MB of swap, and 384MB of RAM.
 
 Only patches applied are Rik's recent OOM killer friendliness patch, and
 Andrea's ksoftirq patch.
 
 I ran vmstat 3 throughout the build, and that output is attached.  I
 also manually ran ps wwwaux  ps.txt periodically.  This second
 output is not overly helpful, because the system was swapping and
 unuseable for the times when the 'ps' output would be most useful.

Sorry, I forgot to mention that OOM killer kicked in twice.  You can
probably pick out the points where it kicked in, in the vmstat output.

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Linus Torvalds


On Sat, 7 Jul 2001, Jeff Garzik wrote:

 When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha,
 the system goes --deeply-- into swap.  Not pretty at all.  The system
 will be 200MB+ into swap, with 200MB+ in cache!  I presume this affects
 2.4.7-release also.

Note that 200MB+ into swap, with 200MB+ in cache is NOT bad in itself.

It only means that we have scanned the VM, and allocated swap-space for
200MB worth of VM space. It does NOT necessarily mean that any actual
swapping has been taking place: you should realize that the cache is
likely to be not at least partly the _swap_ cache that hasn't been written
out.

This is an accounting problem, nothing more. It looks strange, but it's
normal.

Now, the fact that the system appears unusable does obviously mean that
something is wrong. But you're barking up the wrong tree.

Although it might be the right tree in the sense that we might want to
remove the swap cache from the cached output in /proc/meminfo. It might
be more useful to separate out Cached and SwapCache: add a new line to
/proc/meminfo that is swapper_space.nr_pages, and make the current code
that does

atomic_read(page_cache_size)

do

(atomic_read(page_cache_size) - swapper_space.nrpages)

instead. That way the vmstat output might be more useful, although vmstat
obviously won't know about the new SwapCache: field..

Can you try that, and see if something else stands out once the misleading
accounting is taken care of?

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Linus Torvalds wrote:
 
 On Sat, 7 Jul 2001, Jeff Garzik wrote:
 
  When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha,
  the system goes --deeply-- into swap.  Not pretty at all.  The system
  will be 200MB+ into swap, with 200MB+ in cache!  I presume this affects
  2.4.7-release also.
 
 Note that 200MB+ into swap, with 200MB+ in cache is NOT bad in itself.
 
 It only means that we have scanned the VM, and allocated swap-space for
 200MB worth of VM space. It does NOT necessarily mean that any actual
 swapping has been taking place: you should realize that the cache is
 likely to be not at least partly the _swap_ cache that hasn't been written
 out.
 
 This is an accounting problem, nothing more. It looks strange, but it's
 normal.
 
 Now, the fact that the system appears unusable does obviously mean that
 something is wrong. But you're barking up the wrong tree.

Two more additional data points,

1) partially kernel-unrelated.  MDK's make macro didn't support
alpha's /proc/cpuinfo output, make -j$numprocs became make -j and
fun ensued.

2) I agree that 200MB into swap and 200MB into cache isn't bad per se,
but when it triggers the OOM killer it is bad.

Alan suggested that I insert the following into the OOM killer code, as
the last test before returning 1.

cnt++;
if ((cnt % 5000) != 0)
return 0;

I did this, and while watching vmstat 3, the cache was indeed being
trimmed, whereas it was not before.

So, the OOM killer appears to be getting triggered early, but the rest
of the report was my screwup not the kernel.

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Linus Torvalds


On Sat, 7 Jul 2001, Jeff Garzik wrote:
 Linus Torvalds wrote:
 
  Now, the fact that the system appears unusable does obviously mean that
  something is wrong. But you're barking up the wrong tree.

 Two more additional data points,

 1) partially kernel-unrelated.  MDK's make macro didn't support
 alpha's /proc/cpuinfo output, make -j$numprocs became make -j and
 fun ensued.

Ahh, well..

The kernel source code is set up to scale quite well, so yes a make -j
will parallellise a bit teoo well for most machines, and you'll certainly
run out of memory on just about anything (I routinely get load averages of
30+, and yes, you need at least half a GB of RAM for it to not be
unpleasant - and probably more like a full gigabyte on an alpha).

So I definitely think the kernel likely did the right thing. It's not even
clear that the OOM killer might not have been right - due to the 2.4.x
swap space allocation, 256MB of swap-space is a bit tight on a 384MB
machine that actually wants to use a lot of memory.

 2) I agree that 200MB into swap and 200MB into cache isn't bad per se,
 but when it triggers the OOM killer it is bad.

Note that it might easily have been 256MB into swap (ie it had eaten _all_
of your swap) at some stage - and you just didn't see it in the vmstat
output because obviously at that point the machine was a bit loaded.

But neutering the OOM killer like Alan suggested may be a rather valid
approach anyway. Delaying the killing sounds valid: if we're truly
livelocked on the VM, we'll be calling down to the OOM killer so much that
it's probably quite valid to say only return 1 after X iterations.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Jeff Garzik wrote:

 2) I agree that 200MB into swap and 200MB into cache isn't bad
 per se, but when it triggers the OOM killer it is bad.

Please read my patch for the OOM killer. It substracts the
swap cache from the cache figure you quote and ONLY goes
into oom_kill() if the page  buffer cache together take
less than 4% of memory (see /proc/sys/vm/{buffermem,pagecache}).

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   we are concerned about the GNU General Public License (GPL)


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Linus Torvalds wrote:
 
 On Sat, 7 Jul 2001, Jeff Garzik wrote:
  Linus Torvalds wrote:
  
   Now, the fact that the system appears unusable does obviously mean that
   something is wrong. But you're barking up the wrong tree.
 
  Two more additional data points,
 
  1) partially kernel-unrelated.  MDK's make macro didn't support
  alpha's /proc/cpuinfo output, make -j$numprocs became make -j and
  fun ensued.
 
 Ahh, well..
 
 The kernel source code is set up to scale quite well, so yes a make -j
 will parallellise a bit teoo well for most machines, and you'll certainly
 run out of memory on just about anything (I routinely get load averages of
 30+, and yes, you need at least half a GB of RAM for it to not be
 unpleasant - and probably more like a full gigabyte on an alpha).

make -j is a lot of fun on a dual athlon w/ 512mb :)

 So I definitely think the kernel likely did the right thing. It's not even
 clear that the OOM killer might not have been right - due to the 2.4.x
 swap space allocation, 256MB of swap-space is a bit tight on a 384MB
 machine that actually wants to use a lot of memory.

Sigh.  since I am a VM ignoramus I doubt my opinion matters much at all
here... but it would be nice if oddball configurations like 384MB with
50MB swap could be supported.  I don't ask that it perform optimally at
all, but at least the machine should behave predictably...

This type of swap configuration makes sense for, my working set is
pretty much always in RAM, including i/dcache, but let's have some swap
just-in-case


  2) I agree that 200MB into swap and 200MB into cache isn't bad per se,
  but when it triggers the OOM killer it is bad.
 
 Note that it might easily have been 256MB into swap (ie it had eaten _all_
 of your swap) at some stage - and you just didn't see it in the vmstat
 output because obviously at that point the machine was a bit loaded.

I'm pretty sure swap was 100% full.  I should have sysrq'd and checked
but I forgot.


 But neutering the OOM killer like Alan suggested may be a rather valid
 approach anyway. Delaying the killing sounds valid: if we're truly
 livelocked on the VM, we'll be calling down to the OOM killer so much that
 it's probably quite valid to say only return 1 after X iterations.

cnt % 5000 may have been a bit extreme but it was fun to see it thrash. 
sysrq was pretty much the only talking point into the system.

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Jeff Garzik wrote:

 Sigh.  since I am a VM ignoramus I doubt my opinion matters much
 at all here... but it would be nice if oddball configurations
 like 384MB with 50MB swap could be supported.

It would be fun if we had 48 hours in a day, too ;)

This particular thing has been on the TODO list of the
VM developers for a while, but we just haven't gotten
around to it.

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   we are concerned about the GNU General Public License (GPL)


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Linus Torvalds


On Sat, 7 Jul 2001, Rik van Riel wrote:

 Not at all. Note that try_to_swap_out() will happily
 create swap cache pages with a very high page-age,
 pages which are in absolutely no danger of being
 evicted from memory...

That seems to be a bug in add_to_swap_cache().

In fact, I do not see any part of the whole path that sets the page age at
all, so we're basically using a completely uninitialized field here (it's
been initialized way back when the page was allocated, but because it
hasn't been part of the normal aging scheme it has only been aged up,
never down, so the value is pretty much random by the time we actually add
it to the swap cache pool).

Suggested fix:

--- v2.4.6/linux/mm/swap_state.cTue Jul  3 17:08:22 2001
+++ linux/mm/swap_state.c   Sat Jul  7 11:49:13 2001
@@ -81,6 +81,7 @@
BUG();
flags = page-flags  ~((1  PG_error) | (1  PG_arch_1));
page-flags = flags | (1  PG_uptodate);
+   page-age = PAGE_AGE_START;
add_to_page_cache_locked(page, swapper_space, entry.val);
 }

Does that make a difference for people?

That would certainly help explain why aging doesn't work for some people.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Linus Torvalds wrote:

 In fact, I do not see any part of the whole path that sets the
 page age at all, so we're basically using a completely
 uninitialized field here (it's been initialized way back when
 the page was allocated, but because it hasn't been part of the
 normal aging scheme it has only been aged up, never down, so the
 value is pretty much random by the time we actually add it to
 the swap cache pool).

Not quite. The more a page has been used, the higher the
page-age will be. This means the system has a way to
distinguish between anonymous pages which were used once
and anonymous pages which are used lots of times.


 Suggested fix:

[snip disabling of page aging for anonymous memory]

 That would certainly help explain why aging doesn't work for some people.

As would your patch ;)

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   we are concerned about the GNU General Public License (GPL)


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Alan Cox

 instead. That way the vmstat output might be more useful, although vmstat
 obviously won't know about the new SwapCache: field..
 
 Can you try that, and see if something else stands out once the misleading
 accounting is taken care of?

Its certainly misleading. I got Jeff to try making oom return 4999 out of 5000
times regardless.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Alan Cox wrote:

  instead. That way the vmstat output might be more useful, although vmstat
  obviously won't know about the new SwapCache: field..
 
  Can you try that, and see if something else stands out once the misleading
  accounting is taken care of?

 Its certainly misleading. I got Jeff to try making oom return
 4999 out of 5000 times regardless.

In that case, he _is_ OOM.  ;)

1) (almost) no free memory
2) no free swap
3) very little pagecache + buffer cache

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   we are concerned about the GNU General Public License (GPL)


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Alan Cox

 But neutering the OOM killer like Alan suggested may be a rather valid
 approach anyway. Delaying the killing sounds valid: if we're truly
 livelocked on the VM, we'll be calling down to the OOM killer so much that
 it's probably quite valid to say only return 1 after X iterations.

Its hiding the real accounting screw up with a 'goes bang at random less 
often' - nice hack, but IMHO bad long term approach. We need to get the maths
right. We had similar 2.2 problems the other way (with nasty deadlocks)
until Andrea fixed that


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Jeff Garzik

Rik van Riel wrote:
 
 On Sat, 7 Jul 2001, Alan Cox wrote:
 
   instead. That way the vmstat output might be more useful, although vmstat
   obviously won't know about the new SwapCache: field..
  
   Can you try that, and see if something else stands out once the misleading
   accounting is taken care of?
 
  Its certainly misleading. I got Jeff to try making oom return
  4999 out of 5000 times regardless.
 
 In that case, he _is_ OOM.  ;)
 
 1) (almost) no free memory
 2) no free swap
 3) very little pagecache + buffer cache

It got -considerably- farther after Alan's suggested hack to the OOM
killer; so at least in this instance, OOM killer appeared to me to be
killing too early...

-- 
Jeff Garzik  | A recent study has shown that too much soup
Building 1024| can cause malaise in laboratory mice.
MandrakeSoft |
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Alan Cox

  Its certainly misleading. I got Jeff to try making oom return
  4999 out of 5000 times regardless.
 
 In that case, he _is_ OOM.  ;)

Hardly

 1) (almost) no free memory
 2) no free swap
 3) very little pagecache + buffer cache

Large amounts of cache, which went away when the OOM code was neutered

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM in 2.4.7-pre hurts...

2001-07-07 Thread Rik van Riel

On Sat, 7 Jul 2001, Alan Cox wrote:

   Its certainly misleading. I got Jeff to try making oom return
   4999 out of 5000 times regardless.
 
  In that case, he _is_ OOM.  ;)

 Hardly

  1) (almost) no free memory
  2) no free swap
  3) very little pagecache + buffer cache

 Large amounts of cache, which went away when the OOM code was neutered

So Jeff backed out my patch before testing yours? ;)

Rik
--
Executive summary of a recent Microsoft press release:
   we are concerned about the GNU General Public License (GPL)


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/