Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Rik van Riel wrote: > > If __wait_on_buffer and ___wait_on_page get stuck, this could > mean a page doesn't get unlocked. When this is happening, we > may well be running into a dozens of pages which aren't getting > properly unlocked on IO completion. Absolutely. But that, in turn, should cause just others getting stuck, not running, no? Anyway, having looked at the buffer case, I htink I found a potentially nasty bug: "unlock_buffer()" with a buffer cout of zero. Why is this nasty? unlock_buffer() does: extern inline void unlock_buffer(struct buffer_head *bh) { clear_bit(BH_Lock, >b_state); smp_mb__after_clear_bit(); if (waitqueue_active(>b_wait)) wake_up(>b_wait); } but by doing the "clear_bit()", it also potentially free's the buffer, so an interrupt coming in (or another CPU) can end up doing a kfree() on the bh. At which point the "waitqueue_active()" and the wakeup call are operating on random memory. This does not explain __wait_on_buffer(), but it's a bug none-the-less. Anybody can find anything else fishy with buffer handling? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Linus Torvalds wrote: > (a) _had_ the page been on any of the aging lists, it would have been > aged down every time we passed it, and > (b) it's obviously been aged up every time we passed it in the VM so far > (because it hadn't been added to the swap cache earlier). > - an anonymous page, by the time we add it to the swap cache, would have >been aged down and up roughly the same number of times. Hmmm, indeed. I guess this also means page aging in its current form cannot even work well with exponential down aging since the down aging on the pageout list always cancels out the up aging in swap_out() ... I guess it's time we found some volunteers to experiment with linear down aging (page->age--;) since that one will be able to withstand pages being referenced only in the page tables. (now, off to a project 4000 km from home for the next 2 weeks ... bbl) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Linus Torvalds wrote: > On Sun, 8 Jul 2001, Rik van Riel wrote: > > > > ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page > > bug. I've seen this for quite a while now on our quad xeon test > > machine, with some kernel versions it can be reproduced in minutes, > > with others it won't trigger at all. > > Hmm.. That would explain why the "tar" gets stuck, but why does the whole > machine grind to a halt with all other processes being marked runnable? If __wait_on_buffer and ___wait_on_page get stuck, this could mean a page doesn't get unlocked. When this is happening, we may well be running into a dozens of pages which aren't getting properly unlocked on IO completion. This in turn would get the rest of the system stuck in the pageout code path, eating CPU like crazy. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Rik van Riel wrote: > > ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page > bug. I've seen this for quite a while now on our quad xeon test > machine, with some kernel versions it can be reproduced in minutes, > with others it won't trigger at all. Hmm.. That would explain why the "tar" gets stuck, but why does the whole machine grind to a halt with all other processes being marked runnable? > I hope there is somebody out there who can RELIABLY trigger > this bug, so we have a chance of tracking it down. > > > tar > > Trace; c012f2da <__wait_on_buffer+6a/8c> > > Trace; c01303c9 I wonder if "getblk()" returned a locked not-up-to-date buffer.. That would explain how the buffer stays locked forever - the "ll_rw_block()" will not actually submit any IO on a locked buffer, so there won't be any IO to release it. And it's interesting to see that this happens for a _inode_ block, not a data block - which could easily have been dirty and scheduled for a write-out. So I wonder if there is some race between "write buffer and try to free it" and "getblk()". The locking in "try_to_free_buffers()" is rather anal, so I don't see how this could happen, but.. That still doesn't explain why everybody is busy running. I'd have expected all the processes to end up waiting for the page or buffer, not stuck in a live-lock. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Rik van Riel wrote: > On Sun, 8 Jul 2001, Mike Galbraith wrote: > > > is very oom with no disk activity. It _looks_ (xmm and vmstat) like > > it just ran out of cleanable dirty pages. With or without swap, > > ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page > bug. I've seen this for quite a while now on our quad xeon test > machine, with some kernel versions it can be reproduced in minutes, > with others it won't trigger at all. > > And after a recompile it's usually gone ... > > I hope there is somebody out there who can RELIABLY trigger > this bug, so we have a chance of tracking it down. Well, my box seems to think I'm a somebody. If it changes it's mind, I'll let you know. I'll throw whatever rocks I can find at it to get it all angry and confused. You sneak up behind it and do the stake and mallot number. tar -rvf /dev/null /usr/local (10 gig of.. mess) with X/KDE running seems 100% repeatable here. 'scuse me while I go recompile again and hope it just goes away ;-) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Rik van Riel wrote: > > Not quite. The more a page has been used, the higher the > page->age will be. This means the system has a way to > distinguish between anonymous pages which were used once > and anonymous pages which are used lots of times. Wrong. We already _have_ that aging: it's called "do not add anonymous pages to the page cache unless they are old". Pages that are used lots of time won't ever _get_ to the point where they get added to the swap cache, because they are always marked young. So by the time we get to this point, we _know_ what the age should be. I tried to explain this to you earlier. We should NOT use the old "page->age", because that one is 100% and totally bogus. It has _nothing_ to do with the page age. It's ben randomly incremented, without ever having been on any of the aging lists, and as such it is a totally bogus number. In comparison, just setting page->age to PAGE_AGE_START is _not_ a random number. It's a reasonable number that depends on the _knowledge_ that (a) _had_ the page been on any of the aging lists, it would have been aged down every time we passed it, and (b) it's obviously been aged up every time we passed it in the VM so far (because it hadn't been added to the swap cache earlier). Are you with me? Now, add to the above two _facts_, the knowledge that the aging of the VM space is done roughly at the same rate as the aging of the active lists (we call "swap_out()" every time we age the active list when under memory pressure, and they go through similar percentages of their respective address spaces), and you get - an anonymous page, by the time we add it to the swap cache, would have been aged down and up roughly the same number of times. Ergo, it's age _should_ be the same as PAGE_AGE_START. > > That would certainly help explain why aging doesn't work for some people. > > As would your patch ;) No. Do the math. My patch gives the age the _right_ age. Previously, it had a completely random age that had _nothing_ to do with any other page age. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Mike Galbraith wrote: > is very oom with no disk activity. It _looks_ (xmm and vmstat) like > it just ran out of cleanable dirty pages. With or without swap, ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page bug. I've seen this for quite a while now on our quad xeon test machine, with some kernel versions it can be reproduced in minutes, with others it won't trigger at all. And after a recompile it's usually gone ... I hope there is somebody out there who can RELIABLY trigger this bug, so we have a chance of tracking it down. > tar > Trace; c012f2da <__wait_on_buffer+6a/8c> > Trace; c01303c9 > Trace; c01500ea > Trace; c01411f5 > Trace; c0141416 > Trace; c0150b03 > Trace; c0138401 > Trace; c0137aed > Trace; c01389d8 <__user_walk+3c/58> > Trace; c0135cc6 > Trace; c0106ae3 Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Mike Galbraith wrote: is very oom with no disk activity. It _looks_ (xmm and vmstat) like it just ran out of cleanable dirty pages. With or without swap, ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page bug. I've seen this for quite a while now on our quad xeon test machine, with some kernel versions it can be reproduced in minutes, with others it won't trigger at all. And after a recompile it's usually gone ... I hope there is somebody out there who can RELIABLY trigger this bug, so we have a chance of tracking it down. tar Trace; c012f2da __wait_on_buffer+6a/8c Trace; c01303c9 bread+45/64 Trace; c01500ea ext2_read_inode+fe/3c8 Trace; c01411f5 get_new_inode+d1/15c Trace; c0141416 iget4+c2/d4 Trace; c0150b03 ext2_lookup+43/68 Trace; c0138401 path_walk+529/748 Trace; c0137aed getname+5d/9c Trace; c01389d8 __user_walk+3c/58 Trace; c0135cc6 sys_lstat64+16/70 Trace; c0106ae3 system_call+33/38 Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Rik van Riel wrote: Not quite. The more a page has been used, the higher the page-age will be. This means the system has a way to distinguish between anonymous pages which were used once and anonymous pages which are used lots of times. Wrong. We already _have_ that aging: it's called do not add anonymous pages to the page cache unless they are old. Pages that are used lots of time won't ever _get_ to the point where they get added to the swap cache, because they are always marked young. So by the time we get to this point, we _know_ what the age should be. I tried to explain this to you earlier. We should NOT use the old page-age, because that one is 100% and totally bogus. It has _nothing_ to do with the page age. It's ben randomly incremented, without ever having been on any of the aging lists, and as such it is a totally bogus number. In comparison, just setting page-age to PAGE_AGE_START is _not_ a random number. It's a reasonable number that depends on the _knowledge_ that (a) _had_ the page been on any of the aging lists, it would have been aged down every time we passed it, and (b) it's obviously been aged up every time we passed it in the VM so far (because it hadn't been added to the swap cache earlier). Are you with me? Now, add to the above two _facts_, the knowledge that the aging of the VM space is done roughly at the same rate as the aging of the active lists (we call swap_out() every time we age the active list when under memory pressure, and they go through similar percentages of their respective address spaces), and you get - an anonymous page, by the time we add it to the swap cache, would have been aged down and up roughly the same number of times. Ergo, it's age _should_ be the same as PAGE_AGE_START. That would certainly help explain why aging doesn't work for some people. As would your patch ;) No. Do the math. My patch gives the age the _right_ age. Previously, it had a completely random age that had _nothing_ to do with any other page age. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Rik van Riel wrote: On Sun, 8 Jul 2001, Mike Galbraith wrote: is very oom with no disk activity. It _looks_ (xmm and vmstat) like it just ran out of cleanable dirty pages. With or without swap, ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page bug. I've seen this for quite a while now on our quad xeon test machine, with some kernel versions it can be reproduced in minutes, with others it won't trigger at all. And after a recompile it's usually gone ... I hope there is somebody out there who can RELIABLY trigger this bug, so we have a chance of tracking it down. Well, my box seems to think I'm a somebody. If it changes it's mind, I'll let you know. I'll throw whatever rocks I can find at it to get it all angry and confused. You sneak up behind it and do the stake and mallot number. tar -rvf /dev/null /usr/local (10 gig of.. mess) with X/KDE running seems 100% repeatable here. 'scuse me while I go recompile again and hope it just goes away ;-) -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Rik van Riel wrote: ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page bug. I've seen this for quite a while now on our quad xeon test machine, with some kernel versions it can be reproduced in minutes, with others it won't trigger at all. Hmm.. That would explain why the tar gets stuck, but why does the whole machine grind to a halt with all other processes being marked runnable? I hope there is somebody out there who can RELIABLY trigger this bug, so we have a chance of tracking it down. tar Trace; c012f2da __wait_on_buffer+6a/8c Trace; c01303c9 bread+45/64 I wonder if getblk() returned a locked not-up-to-date buffer.. That would explain how the buffer stays locked forever - the ll_rw_block() will not actually submit any IO on a locked buffer, so there won't be any IO to release it. And it's interesting to see that this happens for a _inode_ block, not a data block - which could easily have been dirty and scheduled for a write-out. So I wonder if there is some race between write buffer and try to free it and getblk(). The locking in try_to_free_buffers() is rather anal, so I don't see how this could happen, but.. That still doesn't explain why everybody is busy running. I'd have expected all the processes to end up waiting for the page or buffer, not stuck in a live-lock. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Linus Torvalds wrote: On Sun, 8 Jul 2001, Rik van Riel wrote: ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page bug. I've seen this for quite a while now on our quad xeon test machine, with some kernel versions it can be reproduced in minutes, with others it won't trigger at all. Hmm.. That would explain why the tar gets stuck, but why does the whole machine grind to a halt with all other processes being marked runnable? If __wait_on_buffer and ___wait_on_page get stuck, this could mean a page doesn't get unlocked. When this is happening, we may well be running into a dozens of pages which aren't getting properly unlocked on IO completion. This in turn would get the rest of the system stuck in the pageout code path, eating CPU like crazy. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Linus Torvalds wrote: (a) _had_ the page been on any of the aging lists, it would have been aged down every time we passed it, and (b) it's obviously been aged up every time we passed it in the VM so far (because it hadn't been added to the swap cache earlier). - an anonymous page, by the time we add it to the swap cache, would have been aged down and up roughly the same number of times. Hmmm, indeed. I guess this also means page aging in its current form cannot even work well with exponential down aging since the down aging on the pageout list always cancels out the up aging in swap_out() ... I guess it's time we found some volunteers to experiment with linear down aging (page-age--;) since that one will be able to withstand pages being referenced only in the page tables. (now, off to a project 4000 km from home for the next 2 weeks ... bbl) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sun, 8 Jul 2001, Rik van Riel wrote: If __wait_on_buffer and ___wait_on_page get stuck, this could mean a page doesn't get unlocked. When this is happening, we may well be running into a dozens of pages which aren't getting properly unlocked on IO completion. Absolutely. But that, in turn, should cause just others getting stuck, not running, no? Anyway, having looked at the buffer case, I htink I found a potentially nasty bug: unlock_buffer() with a buffer cout of zero. Why is this nasty? unlock_buffer() does: extern inline void unlock_buffer(struct buffer_head *bh) { clear_bit(BH_Lock, bh-b_state); smp_mb__after_clear_bit(); if (waitqueue_active(bh-b_wait)) wake_up(bh-b_wait); } but by doing the clear_bit(), it also potentially free's the buffer, so an interrupt coming in (or another CPU) can end up doing a kfree() on the bh. At which point the waitqueue_active() and the wakeup call are operating on random memory. This does not explain __wait_on_buffer(), but it's a bug none-the-less. Anybody can find anything else fishy with buffer handling? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Alan Cox wrote: > > > Its certainly misleading. I got Jeff to try making oom return > > > 4999 out of 5000 times regardless. > > > > In that case, he _is_ OOM. ;) > > Hardly > > > 1) (almost) no free memory > > 2) no free swap > > 3) very little pagecache + buffer cache > > Large amounts of cache, which went away when the OOM code was neutered So Jeff backed out my patch before testing yours? ;) Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
> > Its certainly misleading. I got Jeff to try making oom return > > 4999 out of 5000 times regardless. > > In that case, he _is_ OOM. ;) Hardly > 1) (almost) no free memory > 2) no free swap > 3) very little pagecache + buffer cache Large amounts of cache, which went away when the OOM code was neutered - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
Rik van Riel wrote: > > On Sat, 7 Jul 2001, Alan Cox wrote: > > > > instead. That way the vmstat output might be more useful, although vmstat > > > obviously won't know about the new "SwapCache:" field.. > > > > > > Can you try that, and see if something else stands out once the misleading > > > accounting is taken care of? > > > > Its certainly misleading. I got Jeff to try making oom return > > 4999 out of 5000 times regardless. > > In that case, he _is_ OOM. ;) > > 1) (almost) no free memory > 2) no free swap > 3) very little pagecache + buffer cache It got -considerably- farther after Alan's suggested hack to the OOM killer; so at least in this instance, OOM killer appeared to me to be killing too early... -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
> But neutering the OOM killer like Alan suggested may be a rather valid > approach anyway. Delaying the killing sounds valid: if we're truly > livelocked on the VM, we'll be calling down to the OOM killer so much that > it's probably quite valid to say "only return 1 after X iterations". Its hiding the real accounting screw up with a 'goes bang at random less often' - nice hack, but IMHO bad long term approach. We need to get the maths right. We had similar 2.2 problems the other way (with nasty deadlocks) until Andrea fixed that - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Alan Cox wrote: > > instead. That way the vmstat output might be more useful, although vmstat > > obviously won't know about the new "SwapCache:" field.. > > > > Can you try that, and see if something else stands out once the misleading > > accounting is taken care of? > > Its certainly misleading. I got Jeff to try making oom return > 4999 out of 5000 times regardless. In that case, he _is_ OOM. ;) 1) (almost) no free memory 2) no free swap 3) very little pagecache + buffer cache regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
> instead. That way the vmstat output might be more useful, although vmstat > obviously won't know about the new "SwapCache:" field.. > > Can you try that, and see if something else stands out once the misleading > accounting is taken care of? Its certainly misleading. I got Jeff to try making oom return 4999 out of 5000 times regardless. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Linus Torvalds wrote: > In fact, I do not see any part of the whole path that sets the > page age at all, so we're basically using a completely > uninitialized field here (it's been initialized way back when > the page was allocated, but because it hasn't been part of the > normal aging scheme it has only been aged up, never down, so the > value is pretty much random by the time we actually add it to > the swap cache pool). Not quite. The more a page has been used, the higher the page->age will be. This means the system has a way to distinguish between anonymous pages which were used once and anonymous pages which are used lots of times. > Suggested fix: [snip disabling of page aging for anonymous memory] > That would certainly help explain why aging doesn't work for some people. As would your patch ;) regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Rik van Riel wrote: > > Not at all. Note that try_to_swap_out() will happily > create swap cache pages with a very high page->age, > pages which are in absolutely no danger of being > evicted from memory... That seems to be a bug in "add_to_swap_cache()". In fact, I do not see any part of the whole path that sets the page age at all, so we're basically using a completely uninitialized field here (it's been initialized way back when the page was allocated, but because it hasn't been part of the normal aging scheme it has only been aged up, never down, so the value is pretty much random by the time we actually add it to the swap cache pool). Suggested fix: --- v2.4.6/linux/mm/swap_state.cTue Jul 3 17:08:22 2001 +++ linux/mm/swap_state.c Sat Jul 7 11:49:13 2001 @@ -81,6 +81,7 @@ BUG(); flags = page->flags & ~((1 << PG_error) | (1 << PG_arch_1)); page->flags = flags | (1 << PG_uptodate); + page->age = PAGE_AGE_START; add_to_page_cache_locked(page, _space, entry.val); } Does that make a difference for people? That would certainly help explain why aging doesn't work for some people. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Jeff Garzik wrote: > Sigh. since I am a VM ignoramus I doubt my opinion matters much > at all here... but it would be nice if oddball configurations > like 384MB with 50MB swap could be supported. It would be fun if we had 48 hours in a day, too ;) This particular thing has been on the TODO list of the VM developers for a while, but we just haven't gotten around to it. regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
Linus Torvalds wrote: > > On Sat, 7 Jul 2001, Jeff Garzik wrote: > > Linus Torvalds wrote: > > > > > > Now, the fact that the system appears unusable does obviously mean that > > > something is wrong. But you're barking up the wrong tree. > > > > Two more additional data points, > > > > 1) partially kernel-unrelated. MDK's "make" macro didn't support > > alpha's /proc/cpuinfo output, "make -j$numprocs" became "make -j" and > > fun ensued. > > Ahh, well.. > > The kernel source code is set up to scale quite well, so yes a "make -j" > will parallellise a bit teoo well for most machines, and you'll certainly > run out of memory on just about anything (I routinely get load averages of > 30+, and yes, you need at least half a GB of RAM for it to not be > unpleasant - and probably more like a full gigabyte on an alpha). "make -j" is a lot of fun on a dual athlon w/ 512mb :) > So I definitely think the kernel likely did the right thing. It's not even > clear that the OOM killer might not have been right - due to the 2.4.x > swap space allocation, 256MB of swap-space is a bit tight on a 384MB > machine that actually wants to use a lot of memory. Sigh. since I am a VM ignoramus I doubt my opinion matters much at all here... but it would be nice if oddball configurations like 384MB with 50MB swap could be supported. I don't ask that it perform optimally at all, but at least the machine should behave predictably... This type of swap configuration makes sense for, "my working set is pretty much always in RAM, including i/dcache, but let's have some swap just-in-case" > > 2) I agree that 200MB into swap and 200MB into cache isn't bad per se, > > but when it triggers the OOM killer it is bad. > > Note that it might easily have been 256MB into swap (ie it had eaten _all_ > of your swap) at some stage - and you just didn't see it in the vmstat > output because obviously at that point the machine was a bit loaded. I'm pretty sure swap was 100% full. I should have sysrq'd and checked but I forgot. > But neutering the OOM killer like Alan suggested may be a rather valid > approach anyway. Delaying the killing sounds valid: if we're truly > livelocked on the VM, we'll be calling down to the OOM killer so much that > it's probably quite valid to say "only return 1 after X iterations". cnt % 5000 may have been a bit extreme but it was fun to see it thrash. sysrq was pretty much the only talking point into the system. -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Jeff Garzik wrote: > 2) I agree that 200MB into swap and 200MB into cache isn't bad > per se, but when it triggers the OOM killer it is bad. Please read my patch for the OOM killer. It substracts the swap cache from the cache figure you quote and ONLY goes into oom_kill() if the page & buffer cache together take less than 4% of memory (see /proc/sys/vm/{buffermem,pagecache}). regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Jeff Garzik wrote: > Linus Torvalds wrote: > > > > Now, the fact that the system appears unusable does obviously mean that > > something is wrong. But you're barking up the wrong tree. > > Two more additional data points, > > 1) partially kernel-unrelated. MDK's "make" macro didn't support > alpha's /proc/cpuinfo output, "make -j$numprocs" became "make -j" and > fun ensued. Ahh, well.. The kernel source code is set up to scale quite well, so yes a "make -j" will parallellise a bit teoo well for most machines, and you'll certainly run out of memory on just about anything (I routinely get load averages of 30+, and yes, you need at least half a GB of RAM for it to not be unpleasant - and probably more like a full gigabyte on an alpha). So I definitely think the kernel likely did the right thing. It's not even clear that the OOM killer might not have been right - due to the 2.4.x swap space allocation, 256MB of swap-space is a bit tight on a 384MB machine that actually wants to use a lot of memory. > 2) I agree that 200MB into swap and 200MB into cache isn't bad per se, > but when it triggers the OOM killer it is bad. Note that it might easily have been 256MB into swap (ie it had eaten _all_ of your swap) at some stage - and you just didn't see it in the vmstat output because obviously at that point the machine was a bit loaded. But neutering the OOM killer like Alan suggested may be a rather valid approach anyway. Delaying the killing sounds valid: if we're truly livelocked on the VM, we'll be calling down to the OOM killer so much that it's probably quite valid to say "only return 1 after X iterations". Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
Linus Torvalds wrote: > > On Sat, 7 Jul 2001, Jeff Garzik wrote: > > > > When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha, > > the system goes --deeply-- into swap. Not pretty at all. The system > > will be 200MB+ into swap, with 200MB+ in cache! I presume this affects > > 2.4.7-release also. > > Note that "200MB+ into swap, with 200MB+ in cache" is NOT bad in itself. > > It only means that we have scanned the VM, and allocated swap-space for > 200MB worth of VM space. It does NOT necessarily mean that any actual > swapping has been taking place: you should realize that the "cache" is > likely to be not at least partly the _swap_ cache that hasn't been written > out. > > This is an accounting problem, nothing more. It looks strange, but it's > normal. > > Now, the fact that the system appears unusable does obviously mean that > something is wrong. But you're barking up the wrong tree. Two more additional data points, 1) partially kernel-unrelated. MDK's "make" macro didn't support alpha's /proc/cpuinfo output, "make -j$numprocs" became "make -j" and fun ensued. 2) I agree that 200MB into swap and 200MB into cache isn't bad per se, but when it triggers the OOM killer it is bad. Alan suggested that I insert the following into the OOM killer code, as the last test before returning 1. cnt++; if ((cnt % 5000) != 0) return 0; I did this, and while watching "vmstat 3", the cache was indeed being trimmed, whereas it was not before. So, the OOM killer appears to be getting triggered early, but the rest of the report was my screwup not the kernel. -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Jeff Garzik wrote: > > When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha, > the system goes --deeply-- into swap. Not pretty at all. The system > will be 200MB+ into swap, with 200MB+ in cache! I presume this affects > 2.4.7-release also. Note that "200MB+ into swap, with 200MB+ in cache" is NOT bad in itself. It only means that we have scanned the VM, and allocated swap-space for 200MB worth of VM space. It does NOT necessarily mean that any actual swapping has been taking place: you should realize that the "cache" is likely to be not at least partly the _swap_ cache that hasn't been written out. This is an accounting problem, nothing more. It looks strange, but it's normal. Now, the fact that the system appears unusable does obviously mean that something is wrong. But you're barking up the wrong tree. Although it might be the "right tree" in the sense that we might want to remove the swap cache from the "cached" output in /proc/meminfo. It might be more useful to separate out "Cached" and "SwapCache": add a new line to /proc/meminfo that is "swapper_space.nr_pages", and make the current code that does atomic_read(_cache_size) do (atomic_read(_cache_size) - swapper_space.nrpages) instead. That way the vmstat output might be more useful, although vmstat obviously won't know about the new "SwapCache:" field.. Can you try that, and see if something else stands out once the misleading accounting is taken care of? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
Jeff Garzik wrote: > > Oh this is a fun one :) > > When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha, > the system goes --deeply-- into swap. Not pretty at all. The system > will be 200MB+ into swap, with 200MB+ in cache! I presume this affects > 2.4.7-release also. > > System has 256MB of swap, and 384MB of RAM. > > Only patches applied are Rik's recent OOM killer friendliness patch, and > Andrea's ksoftirq patch. > > I ran "vmstat 3" throughout the build, and that output is attached. I > also manually ran "ps wwwaux >> ps.txt" periodically. This second > output is not overly helpful, because the system was swapping and > unuseable for the times when the 'ps' output would be most useful. Sorry, I forgot to mention that OOM killer kicked in twice. You can probably pick out the points where it kicked in, in the vmstat output. -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
VM in 2.4.7-pre hurts...
Oh this is a fun one :) When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha, the system goes --deeply-- into swap. Not pretty at all. The system will be 200MB+ into swap, with 200MB+ in cache! I presume this affects 2.4.7-release also. System has 256MB of swap, and 384MB of RAM. Only patches applied are Rik's recent OOM killer friendliness patch, and Andrea's ksoftirq patch. I ran "vmstat 3" throughout the build, and that output is attached. I also manually ran "ps wwwaux >> ps.txt" periodically. This second output is not overly helpful, because the system was swapping and unuseable for the times when the 'ps' output would be most useful. Both outputs are attached, as they compressed pretty nicely. -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | vmstat.txt.bz2 ps.txt.bz2
VM in 2.4.7-pre hurts...
Oh this is a fun one :) When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha, the system goes --deeply-- into swap. Not pretty at all. The system will be 200MB+ into swap, with 200MB+ in cache! I presume this affects 2.4.7-release also. System has 256MB of swap, and 384MB of RAM. Only patches applied are Rik's recent OOM killer friendliness patch, and Andrea's ksoftirq patch. I ran vmstat 3 throughout the build, and that output is attached. I also manually ran ps wwwaux ps.txt periodically. This second output is not overly helpful, because the system was swapping and unuseable for the times when the 'ps' output would be most useful. Both outputs are attached, as they compressed pretty nicely. -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | vmstat.txt.bz2 ps.txt.bz2
Re: VM in 2.4.7-pre hurts...
Jeff Garzik wrote: Oh this is a fun one :) When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha, the system goes --deeply-- into swap. Not pretty at all. The system will be 200MB+ into swap, with 200MB+ in cache! I presume this affects 2.4.7-release also. System has 256MB of swap, and 384MB of RAM. Only patches applied are Rik's recent OOM killer friendliness patch, and Andrea's ksoftirq patch. I ran vmstat 3 throughout the build, and that output is attached. I also manually ran ps wwwaux ps.txt periodically. This second output is not overly helpful, because the system was swapping and unuseable for the times when the 'ps' output would be most useful. Sorry, I forgot to mention that OOM killer kicked in twice. You can probably pick out the points where it kicked in, in the vmstat output. -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Jeff Garzik wrote: When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha, the system goes --deeply-- into swap. Not pretty at all. The system will be 200MB+ into swap, with 200MB+ in cache! I presume this affects 2.4.7-release also. Note that 200MB+ into swap, with 200MB+ in cache is NOT bad in itself. It only means that we have scanned the VM, and allocated swap-space for 200MB worth of VM space. It does NOT necessarily mean that any actual swapping has been taking place: you should realize that the cache is likely to be not at least partly the _swap_ cache that hasn't been written out. This is an accounting problem, nothing more. It looks strange, but it's normal. Now, the fact that the system appears unusable does obviously mean that something is wrong. But you're barking up the wrong tree. Although it might be the right tree in the sense that we might want to remove the swap cache from the cached output in /proc/meminfo. It might be more useful to separate out Cached and SwapCache: add a new line to /proc/meminfo that is swapper_space.nr_pages, and make the current code that does atomic_read(page_cache_size) do (atomic_read(page_cache_size) - swapper_space.nrpages) instead. That way the vmstat output might be more useful, although vmstat obviously won't know about the new SwapCache: field.. Can you try that, and see if something else stands out once the misleading accounting is taken care of? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
Linus Torvalds wrote: On Sat, 7 Jul 2001, Jeff Garzik wrote: When building gcc-2.96 RPM using gcc-2.96 under kernel 2.4.7 on alpha, the system goes --deeply-- into swap. Not pretty at all. The system will be 200MB+ into swap, with 200MB+ in cache! I presume this affects 2.4.7-release also. Note that 200MB+ into swap, with 200MB+ in cache is NOT bad in itself. It only means that we have scanned the VM, and allocated swap-space for 200MB worth of VM space. It does NOT necessarily mean that any actual swapping has been taking place: you should realize that the cache is likely to be not at least partly the _swap_ cache that hasn't been written out. This is an accounting problem, nothing more. It looks strange, but it's normal. Now, the fact that the system appears unusable does obviously mean that something is wrong. But you're barking up the wrong tree. Two more additional data points, 1) partially kernel-unrelated. MDK's make macro didn't support alpha's /proc/cpuinfo output, make -j$numprocs became make -j and fun ensued. 2) I agree that 200MB into swap and 200MB into cache isn't bad per se, but when it triggers the OOM killer it is bad. Alan suggested that I insert the following into the OOM killer code, as the last test before returning 1. cnt++; if ((cnt % 5000) != 0) return 0; I did this, and while watching vmstat 3, the cache was indeed being trimmed, whereas it was not before. So, the OOM killer appears to be getting triggered early, but the rest of the report was my screwup not the kernel. -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Jeff Garzik wrote: Linus Torvalds wrote: Now, the fact that the system appears unusable does obviously mean that something is wrong. But you're barking up the wrong tree. Two more additional data points, 1) partially kernel-unrelated. MDK's make macro didn't support alpha's /proc/cpuinfo output, make -j$numprocs became make -j and fun ensued. Ahh, well.. The kernel source code is set up to scale quite well, so yes a make -j will parallellise a bit teoo well for most machines, and you'll certainly run out of memory on just about anything (I routinely get load averages of 30+, and yes, you need at least half a GB of RAM for it to not be unpleasant - and probably more like a full gigabyte on an alpha). So I definitely think the kernel likely did the right thing. It's not even clear that the OOM killer might not have been right - due to the 2.4.x swap space allocation, 256MB of swap-space is a bit tight on a 384MB machine that actually wants to use a lot of memory. 2) I agree that 200MB into swap and 200MB into cache isn't bad per se, but when it triggers the OOM killer it is bad. Note that it might easily have been 256MB into swap (ie it had eaten _all_ of your swap) at some stage - and you just didn't see it in the vmstat output because obviously at that point the machine was a bit loaded. But neutering the OOM killer like Alan suggested may be a rather valid approach anyway. Delaying the killing sounds valid: if we're truly livelocked on the VM, we'll be calling down to the OOM killer so much that it's probably quite valid to say only return 1 after X iterations. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Jeff Garzik wrote: 2) I agree that 200MB into swap and 200MB into cache isn't bad per se, but when it triggers the OOM killer it is bad. Please read my patch for the OOM killer. It substracts the swap cache from the cache figure you quote and ONLY goes into oom_kill() if the page buffer cache together take less than 4% of memory (see /proc/sys/vm/{buffermem,pagecache}). regards, Rik -- Executive summary of a recent Microsoft press release: we are concerned about the GNU General Public License (GPL) http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
Linus Torvalds wrote: On Sat, 7 Jul 2001, Jeff Garzik wrote: Linus Torvalds wrote: Now, the fact that the system appears unusable does obviously mean that something is wrong. But you're barking up the wrong tree. Two more additional data points, 1) partially kernel-unrelated. MDK's make macro didn't support alpha's /proc/cpuinfo output, make -j$numprocs became make -j and fun ensued. Ahh, well.. The kernel source code is set up to scale quite well, so yes a make -j will parallellise a bit teoo well for most machines, and you'll certainly run out of memory on just about anything (I routinely get load averages of 30+, and yes, you need at least half a GB of RAM for it to not be unpleasant - and probably more like a full gigabyte on an alpha). make -j is a lot of fun on a dual athlon w/ 512mb :) So I definitely think the kernel likely did the right thing. It's not even clear that the OOM killer might not have been right - due to the 2.4.x swap space allocation, 256MB of swap-space is a bit tight on a 384MB machine that actually wants to use a lot of memory. Sigh. since I am a VM ignoramus I doubt my opinion matters much at all here... but it would be nice if oddball configurations like 384MB with 50MB swap could be supported. I don't ask that it perform optimally at all, but at least the machine should behave predictably... This type of swap configuration makes sense for, my working set is pretty much always in RAM, including i/dcache, but let's have some swap just-in-case 2) I agree that 200MB into swap and 200MB into cache isn't bad per se, but when it triggers the OOM killer it is bad. Note that it might easily have been 256MB into swap (ie it had eaten _all_ of your swap) at some stage - and you just didn't see it in the vmstat output because obviously at that point the machine was a bit loaded. I'm pretty sure swap was 100% full. I should have sysrq'd and checked but I forgot. But neutering the OOM killer like Alan suggested may be a rather valid approach anyway. Delaying the killing sounds valid: if we're truly livelocked on the VM, we'll be calling down to the OOM killer so much that it's probably quite valid to say only return 1 after X iterations. cnt % 5000 may have been a bit extreme but it was fun to see it thrash. sysrq was pretty much the only talking point into the system. -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Jeff Garzik wrote: Sigh. since I am a VM ignoramus I doubt my opinion matters much at all here... but it would be nice if oddball configurations like 384MB with 50MB swap could be supported. It would be fun if we had 48 hours in a day, too ;) This particular thing has been on the TODO list of the VM developers for a while, but we just haven't gotten around to it. regards, Rik -- Executive summary of a recent Microsoft press release: we are concerned about the GNU General Public License (GPL) http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Rik van Riel wrote: Not at all. Note that try_to_swap_out() will happily create swap cache pages with a very high page-age, pages which are in absolutely no danger of being evicted from memory... That seems to be a bug in add_to_swap_cache(). In fact, I do not see any part of the whole path that sets the page age at all, so we're basically using a completely uninitialized field here (it's been initialized way back when the page was allocated, but because it hasn't been part of the normal aging scheme it has only been aged up, never down, so the value is pretty much random by the time we actually add it to the swap cache pool). Suggested fix: --- v2.4.6/linux/mm/swap_state.cTue Jul 3 17:08:22 2001 +++ linux/mm/swap_state.c Sat Jul 7 11:49:13 2001 @@ -81,6 +81,7 @@ BUG(); flags = page-flags ~((1 PG_error) | (1 PG_arch_1)); page-flags = flags | (1 PG_uptodate); + page-age = PAGE_AGE_START; add_to_page_cache_locked(page, swapper_space, entry.val); } Does that make a difference for people? That would certainly help explain why aging doesn't work for some people. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Linus Torvalds wrote: In fact, I do not see any part of the whole path that sets the page age at all, so we're basically using a completely uninitialized field here (it's been initialized way back when the page was allocated, but because it hasn't been part of the normal aging scheme it has only been aged up, never down, so the value is pretty much random by the time we actually add it to the swap cache pool). Not quite. The more a page has been used, the higher the page-age will be. This means the system has a way to distinguish between anonymous pages which were used once and anonymous pages which are used lots of times. Suggested fix: [snip disabling of page aging for anonymous memory] That would certainly help explain why aging doesn't work for some people. As would your patch ;) regards, Rik -- Executive summary of a recent Microsoft press release: we are concerned about the GNU General Public License (GPL) http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
instead. That way the vmstat output might be more useful, although vmstat obviously won't know about the new SwapCache: field.. Can you try that, and see if something else stands out once the misleading accounting is taken care of? Its certainly misleading. I got Jeff to try making oom return 4999 out of 5000 times regardless. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Alan Cox wrote: instead. That way the vmstat output might be more useful, although vmstat obviously won't know about the new SwapCache: field.. Can you try that, and see if something else stands out once the misleading accounting is taken care of? Its certainly misleading. I got Jeff to try making oom return 4999 out of 5000 times regardless. In that case, he _is_ OOM. ;) 1) (almost) no free memory 2) no free swap 3) very little pagecache + buffer cache regards, Rik -- Executive summary of a recent Microsoft press release: we are concerned about the GNU General Public License (GPL) http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
But neutering the OOM killer like Alan suggested may be a rather valid approach anyway. Delaying the killing sounds valid: if we're truly livelocked on the VM, we'll be calling down to the OOM killer so much that it's probably quite valid to say only return 1 after X iterations. Its hiding the real accounting screw up with a 'goes bang at random less often' - nice hack, but IMHO bad long term approach. We need to get the maths right. We had similar 2.2 problems the other way (with nasty deadlocks) until Andrea fixed that - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
Rik van Riel wrote: On Sat, 7 Jul 2001, Alan Cox wrote: instead. That way the vmstat output might be more useful, although vmstat obviously won't know about the new SwapCache: field.. Can you try that, and see if something else stands out once the misleading accounting is taken care of? Its certainly misleading. I got Jeff to try making oom return 4999 out of 5000 times regardless. In that case, he _is_ OOM. ;) 1) (almost) no free memory 2) no free swap 3) very little pagecache + buffer cache It got -considerably- farther after Alan's suggested hack to the OOM killer; so at least in this instance, OOM killer appeared to me to be killing too early... -- Jeff Garzik | A recent study has shown that too much soup Building 1024| can cause malaise in laboratory mice. MandrakeSoft | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
Its certainly misleading. I got Jeff to try making oom return 4999 out of 5000 times regardless. In that case, he _is_ OOM. ;) Hardly 1) (almost) no free memory 2) no free swap 3) very little pagecache + buffer cache Large amounts of cache, which went away when the OOM code was neutered - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM in 2.4.7-pre hurts...
On Sat, 7 Jul 2001, Alan Cox wrote: Its certainly misleading. I got Jeff to try making oom return 4999 out of 5000 times regardless. In that case, he _is_ OOM. ;) Hardly 1) (almost) no free memory 2) no free swap 3) very little pagecache + buffer cache Large amounts of cache, which went away when the OOM code was neutered So Jeff backed out my patch before testing yours? ;) Rik -- Executive summary of a recent Microsoft press release: we are concerned about the GNU General Public License (GPL) http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/