Re: scheduler went mad?
On Fri, 13 Apr 2001 01:02:21 +0200, Szabolcs Szakacsits said: > Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the > same. After the system tried *really* hard to get *one* free page and > couldn't managed why loop forever? To eat CPU and waiting for For what it's worth, this *IS NOT* the case I'm getting bit by: While kswapd was hung, I already had (from /proc/meminfo) MemFree: 34064 kB I suspect that kswapd is getting hung spinning on some *specific* requirement that it's falling short on? /Valdis PGP signature
Re: scheduler went mad?
On Thu, 12 Apr 2001, Rik van Riel wrote: > On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote: > > You mean without dropping out_of_memory() test in kswapd and calling > > oom_kill() in page fault [i.e. without additional patch]? > No. I think it's ok for __alloc_pages() to call oom_kill() > IF we turn out to be out of memory, but that should not even > be needed. Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the same. After the system tried *really* hard to get *one* free page and couldn't managed why loop forever? To eat CPU and waiting for out_of_memory() to *guess* when system is in OOM? I don't think so, if processes can't progress because system can't page in any of their pages, somebody must go. > Also, when a task in __alloc_pages() is OOM-killed, it will > have PF_MEMALLOC set and will immediately break out of the > loop. The rest of the system will spin around in the loop > until the victim has exited and then their allocations will > succeed. Yes, I think this is a problem. In page fault if OOM, "bad" process selected, scheduled, killed and everybody runs happily even without to notice system is low on memory. Fast and gracious process killing instead of slow, painful death IF out_of_memory() correctly detects OOM. Szaka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote: > You mean without dropping out_of_memory() test in kswapd and calling > oom_kill() in page fault [i.e. without additional patch]? No. I think it's ok for __alloc_pages() to call oom_kill() IF we turn out to be out of memory, but that should not even be needed. Also, when a task in __alloc_pages() is OOM-killed, it will have PF_MEMALLOC set and will immediately break out of the loop. The rest of the system will spin around in the loop until the victim has exited and then their allocations will succeed. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Rik van Riel wrote: > On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote: > > I still feel a bit unconfortable about processes looping forever in > > __alloc_pages and because of this oom_killer also can't be moved to > > page fault handler where I think its place should be. I'm using the > > patch below. > It's BROKEN. This means that if you have one task using up > all memory and you're waiting for the OOM kill of that task > to have effect, your syslogd, etc... will have their allocations > fail and will die. You mean without dropping out_of_memory() test in kswapd and calling oom_kill() in page fault [i.e. without additional patch]? Yes, you're competely true but I have the patch [see example below, 'm1' is the bad guy] just didn't have time to extensively test it and don't know whether there is side efffects getting rid of this infinite looping in __alloc_pages() but locked up processes apparently don't make people very happy ;) Szaka Out of Memory: Killed process 830 (m1), saved process 696 (httpd) procs memoryswap io system r b w swpd free buff cache si sobibo incs 6 0 0 0 9492100 1496 0 0 1386 2 2904 3877 5 0 0 0 7812104 1788 0 0 289 0 68922 5 0 0 0 6248104 1788 0 0 0 0 10819 5 0 0 0 4748108 1840 0 056 0 21921 5 0 0 0 3268108 1868 0 028 0 16523 5 0 1 0 1864 76 1868 0 0 0 5 12061 5 0 1 0 1432 76 1252 0 0 0 0 108 1130 5 0 1 0 1236 80796 0 065 0 246 4588 5 0 1 0 1236 80668 0 0 0 0 110 8869 6 0 1 0948112696 0 0 805 0 1814 8231 Out of Memory: Killed process 858 (m1), saved process 811 (vmstat) 5 0 1 0924152444 0 0 1153 0 2731 18231 4 0 1 0 1720148828 0 0 750 3 1711 1876 5 0 1 0 1156148760 0 0 290 0 723 1967 4 0 1 0 1152132664 0 070 0 277 7249 4 0 1 0 1140144560 0 054 0 238 7942 4 0 1 0 1140144460 0 032 0 212 7521 Out of Memory: Killed process 834 (m1), saved process 418 (identd) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said: > Do you have > 800Mb of RAM ? Following up - it just bit again (twice) The first time, it was xmms/kswapd fighting for CPU, and xmms was again immune to kill -9. Interestingly enough, several minutes later, I closed 'netscape', and xmms took the kill within a second or two. 10 minutes later, and another 2 programs that do audio got wedged up. Oddly enough, I did an 'su', and they broke loose immediately. I've ruled out i810_audio.c as a culprit - although I have programs that do audio hanging, *those* programs are always writing their data down a Unix socket to the actual process that writes to /dev/audio/dsp. Hmm.. 'su' writes to syslog, and netscape has a few Unix sockets too. Could the problem be related to running out of some resource related to Unix-domain sockets, which clears up once some socket is closed? Oddly enough, while I had 2 programs doing audio wedged, I was still seeing (hearing actually ;) *new* processes open a connection to esd and play sounds. Weird. -- Valdis Kletnieks Operating Systems Analyst Virginia Tech PGP signature
Re: scheduler went mad?
On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote: > On Thu, 12 Apr 2001, Marcelo Tosatti wrote: > > > This patch is broken, ignore it. > > Just removing wakeup_bdflush() is indeed correct. > > We already wakeup bdflush at try_to_free_buffers() anyway. > > I still feel a bit unconfortable about processes looping forever in > __alloc_pages and because of this oom_killer also can't be moved to > page fault handler where I think its place should be. I'm using the > patch below. It's BROKEN. This means that if you have one task using up all memory and you're waiting for the OOM kill of that task to have effect, your syslogd, etc... will have their allocations fail and will die. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Marcelo Tosatti wrote: > This patch is broken, ignore it. > Just removing wakeup_bdflush() is indeed correct. > We already wakeup bdflush at try_to_free_buffers() anyway. I still feel a bit unconfortable about processes looping forever in __alloc_pages and because of this oom_killer also can't be moved to page fault handler where I think its place should be. I'm using the patch below. Szaka --- mm/page_alloc.c.orig Sat Mar 31 19:07:22 2001 +++ mm/page_alloc.c Mon Apr 2 21:05:31 2001 @@ -453,8 +453,12 @@ */ if (gfp_mask & __GFP_WAIT) { memory_pressure++; - try_to_free_pages(gfp_mask); - wakeup_bdflush(0); + if (!try_to_free_pages(gfp_mask)); + return NULL; goto try_again; } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Marcelo Tosatti wrote: > This should fix it > > --- mm/page_alloc.c.orig Thu Apr 12 13:47:53 2001 > +++ mm/page_alloc.cThu Apr 12 13:48:06 2001 > @@ -454,7 +454,7 @@ > if (gfp_mask & __GFP_WAIT) { > memory_pressure++; > try_to_free_pages(gfp_mask); > - wakeup_bdflush(0); > + balance_dirty(NODEV); > goto try_again; > } Remember that we can ONLY do this if we have __GFP_IO ... regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Marcelo Tosatti wrote: > > I did :) > > This should fix it > > --- mm/page_alloc.c.orig Thu Apr 12 13:47:53 2001 > +++ mm/page_alloc.cThu Apr 12 13:48:06 2001 > @@ -454,7 +454,7 @@ > if (gfp_mask & __GFP_WAIT) { > memory_pressure++; > try_to_free_pages(gfp_mask); > - wakeup_bdflush(0); > + balance_dirty(NODEV); > goto try_again; > } This patch is broken, ignore it. Just removing wakeup_bdflush() is indeed correct. We already wakeup bdflush at try_to_free_buffers() anyway. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Rik van Riel wrote: > On Thu, 12 Apr 2001, Alan Cox wrote: > > > > 2.4.3-pre6 quietly made a very significant change there: > > > it used to say "if (!order) goto try_again;" and now just > > > says "goto try_again;". Which seems very sensible since > > > __GFP_WAIT is set, but I do wonder if it was a safe change. > > > We have mechanisms for freeing pages (order 0), but whether > > > any higher orders come out of that is a matter of chance. > > > > The fundamental problem is that it should say > > > > wait_for_mm_progress(); > > goto try_again; > > > > and we dont have that facility right now. > > >From mm/page_alloc.c, around line 453: > > if (gfp_mask & __GFP_WAIT) { > memory_pressure++; > try_to_free_pages(gfp_mask); > wakeup_bdflush(0); > goto try_again; > } > > I guess we should remove the wakeup_bdflush(0) ... who put it > there anyway ? I did :) This should fix it --- mm/page_alloc.c.orig Thu Apr 12 13:47:53 2001 +++ mm/page_alloc.cThu Apr 12 13:48:06 2001 @@ -454,7 +454,7 @@ if (gfp_mask & __GFP_WAIT) { memory_pressure++; try_to_free_pages(gfp_mask); - wakeup_bdflush(0); + balance_dirty(NODEV); goto try_again; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Alan Cox wrote: > > 2.4.3-pre6 quietly made a very significant change there: > > it used to say "if (!order) goto try_again;" and now just > > says "goto try_again;". Which seems very sensible since > > __GFP_WAIT is set, but I do wonder if it was a safe change. > > We have mechanisms for freeing pages (order 0), but whether > > any higher orders come out of that is a matter of chance. > > The fundamental problem is that it should say > > wait_for_mm_progress(); > goto try_again; > > and we dont have that facility right now. >From mm/page_alloc.c, around line 453: if (gfp_mask & __GFP_WAIT) { memory_pressure++; try_to_free_pages(gfp_mask); wakeup_bdflush(0); goto try_again; } I guess we should remove the wakeup_bdflush(0) ... who put it there anyway ? regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
> 2.4.3-pre6 quietly made a very significant change there: > it used to say "if (!order) goto try_again;" and now just > says "goto try_again;". Which seems very sensible since > __GFP_WAIT is set, but I do wonder if it was a safe change. > We have mechanisms for freeing pages (order 0), but whether > any higher orders come out of that is a matter of chance. The fundamental problem is that it should say wait_for_mm_progress(); goto try_again; and we dont have that facility right now. At that point the looping on failed allocations problem is ok as we will allow someone to make progress. That leaves the bounce buffers for > 800Mb RAM which currently are seriously horked and will loop and may even stack overflow by inspection - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Wed, Apr 11 2001, Josh McKinney wrote: > I had the almost exact same thing happen to me just yesterday, I started up > xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was > well, and still is. > > Same kernel as you too. I can tell you why this happens. Earlier kernels allocated one cd frame worth of data for cdda ripping, but it was recently bumped to allow as many as the ripping program asks for (up to 8). This requires a 4-5 page allocation on x86, which is of course not reliable. cdrom.c adjusts for failed allocations and drops to fewer number of frames (8 -> 4 -> 2 and then just 1), but apparently the vm isn't handling this too well if kswapd is going crazy. I can switch to a static 8 frame allocation, but IMHO the vm should be able to handle situations like this. It's not that unusual for a driver to ask for a bigger chunk of memory if it can go faster that way, and then be prepared to settle for less if need be. For cdda ripping, it really does make a difference. However, I can change ide-cd to do scatter gather in this case. It's the nicer thing to do anyway. Does cdda2wav have some sort of 'do X number of frames at the time' option? If so, use 1 and there should be no problems. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said: > > I've seen the same scenario about 2-3 times a week. kswapd and one or > > more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung > > on several occasions, and 2-3 times it's been xscreensaver (3.29) hung. > > The 'hung' processes are consistently immune to kill -9, even as root, which > > indicates to me that they're hung inside a kernel call or something. > > Do you have > 800Mb of RAM ? 256M of RAM, 256M of swap. Here's /proc/meminfo as I type: [~]3 cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 260276224 246419456 138567680 8347648 75317248 Swap: 271392768 58589184 212803584 MemTotal: 254176 kB MemFree: 13532 kB MemShared: 0 kB Buffers: 8152 kB Cached: 73552 kB Active: 49716 kB Inact_dirty: 28800 kB Inact_clean: 3188 kB Inact_target: 212 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 254176 kB LowFree: 13532 kB SwapTotal: 265032 kB SwapFree: 207816 kB [~]3 > > would explain the high context-switch rate. I'm not clear on how kswapd > > can end up getting stuck and failing to free up something - unless it ends > > up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't > > enough to get it the memory it needs, causing a deadlock/loop between > > kswapd and __alloc_pages/wakeup_kswapd(). > > bounce buffers for one It's a Dell Optiplex GX110, using IDE. Grepping for 'bounce buffer' in the source shows most hits in the SCSI code, and nothing obviously jumping out at me... Is it possible that i810_audio.c is to blame? I'm looking at alloc_dmabuf() in there, and it tries to grab a big chunk of memory for a DMA buffer (starting at order-4), which probably explains my __alloc_pages messages. In addition, I run Enlightenment with audio enabled - so it's quite possible that xscreensaver will generate a 'click' sound when it pops up its dialog window - again tossing us into i810_audio. (scenario there - mouse event happens while screen locked, xscreensaver wakes up and starts mapping a window - E plays the sound, hosing the i810_audio driver, and then when xscreensaver gets the CPU back, its next call for a page gets wedged up. Would it be worth applying Ed Tomlinson's icache/dcache patches and seeing if that helps? -- Valdis Kletnieks Operating Systems Analyst Virginia Tech PGP signature
Re: scheduler went mad?
On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote: > I've seen the same scenario about 2-3 times a week. kswapd and one or > more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung > on several occasions, and 2-3 times it's been xscreensaver (3.29) hung. > The 'hung' processes are consistently immune to kill -9, even as root, which > indicates to me that they're hung inside a kernel call or something. [snip] > __alloc_pages: 4-order allocation failed. > __alloc_pages: 3-order allocation failed. [snip] > In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will > cause it to loop around and try to get more memory. I'm wondering if [snip] > I'm running the 2.4.3 kernel 2.4.3-pre6 quietly made a very significant change there: it used to say "if (!order) goto try_again;" and now just says "goto try_again;". Which seems very sensible since __GFP_WAIT is set, but I do wonder if it was a safe change. We have mechanisms for freeing pages (order 0), but whether any higher orders come out of that is a matter of chance. (But of course, this may not be related to your problem, and your "N-order allocation failed" messages must have been from other instances than stuck in this loop.) Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
> I've seen the same scenario about 2-3 times a week. kswapd and one or > more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung > on several occasions, and 2-3 times it's been xscreensaver (3.29) hung. > The 'hung' processes are consistently immune to kill -9, even as root, which > indicates to me that they're hung inside a kernel call or something. Do you have > 800Mb of RAM ? > In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will > cause it to loop around and try to get more memory. I'm wondering if Even outside of that certain drivers also loop on alloc failures as does TCP. > would explain the high context-switch rate. I'm not clear on how kswapd > can end up getting stuck and failing to free up something - unless it ends > up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't > enough to get it the memory it needs, causing a deadlock/loop between > kswapd and __alloc_pages/wakeup_kswapd(). bounce buffers for one - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
I've seen the same scenario about 2-3 times a week. kswapd and one or more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung on several occasions, and 2-3 times it's been xscreensaver (3.29) hung. The 'hung' processes are consistently immune to kill -9, even as root, which indicates to me that they're hung inside a kernel call or something. Sometimes, something *else* will exit, and everything will 'break loose' and return to normal after a minute or so. It *may* not be related, but I also have a lot of this in 'dmesg': __alloc_pages: 4-order allocation failed. __alloc_pages: 3-order allocation failed. i810_audio: DMA overrun on send There was a recent posting re: the i810_audio driver amounting to "I've got one bug to fix and then I'll put up a patch" for the 'dma overrun' message. __alloc_pages doesn't give much information on who its caller was, so that's somewhat of a dead end... In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will cause it to loop around and try to get more memory. I'm wondering if the "hung" process is entering __alloc_pages(), and gets wedged in the 'try_again' loop - which has a call to wakeup_kswapd() inside it, which would explain the high context-switch rate. I'm not clear on how kswapd can end up getting stuck and failing to free up something - unless it ends up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't enough to get it the memory it needs, causing a deadlock/loop between kswapd and __alloc_pages/wakeup_kswapd(). Unfortunately, I've just exhausted my ability to debug this one here.. ;) I'm running the 2.4.3 kernel, with the following patches: Reiserfs: 2.4.3-3.6.25.quota.bz2 linux-2.4.3-knfsd-6.g.patch.gz linux-2.4.3-reiserfs-20010327.patch.bz2 IPv6: linux24-2.4.3-usagi-20010406.patch.gz Crypto: patch-int-2.4.3.1 am using ReiserFS-on-LVM for basically all filesystems, if that matters... -- Valdis Kletnieks Operating Systems Analyst Virginia Tech PGP signature
Re: scheduler went mad?
I've seen the same scenario about 2-3 times a week. kswapd and one or more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung on several occasions, and 2-3 times it's been xscreensaver (3.29) hung. The 'hung' processes are consistently immune to kill -9, even as root, which indicates to me that they're hung inside a kernel call or something. Sometimes, something *else* will exit, and everything will 'break loose' and return to normal after a minute or so. It *may* not be related, but I also have a lot of this in 'dmesg': __alloc_pages: 4-order allocation failed. __alloc_pages: 3-order allocation failed. i810_audio: DMA overrun on send There was a recent posting re: the i810_audio driver amounting to "I've got one bug to fix and then I'll put up a patch" for the 'dma overrun' message. __alloc_pages doesn't give much information on who its caller was, so that's somewhat of a dead end... In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will cause it to loop around and try to get more memory. I'm wondering if the "hung" process is entering __alloc_pages(), and gets wedged in the 'try_again' loop - which has a call to wakeup_kswapd() inside it, which would explain the high context-switch rate. I'm not clear on how kswapd can end up getting stuck and failing to free up something - unless it ends up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't enough to get it the memory it needs, causing a deadlock/loop between kswapd and __alloc_pages/wakeup_kswapd(). Unfortunately, I've just exhausted my ability to debug this one here.. ;) I'm running the 2.4.3 kernel, with the following patches: Reiserfs: 2.4.3-3.6.25.quota.bz2 linux-2.4.3-knfsd-6.g.patch.gz linux-2.4.3-reiserfs-20010327.patch.bz2 IPv6: linux24-2.4.3-usagi-20010406.patch.gz Crypto: patch-int-2.4.3.1 am using ReiserFS-on-LVM for basically all filesystems, if that matters... -- Valdis Kletnieks Operating Systems Analyst Virginia Tech PGP signature
Re: scheduler went mad?
I've seen the same scenario about 2-3 times a week. kswapd and one or more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung on several occasions, and 2-3 times it's been xscreensaver (3.29) hung. The 'hung' processes are consistently immune to kill -9, even as root, which indicates to me that they're hung inside a kernel call or something. Do you have 800Mb of RAM ? In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will cause it to loop around and try to get more memory. I'm wondering if Even outside of that certain drivers also loop on alloc failures as does TCP. would explain the high context-switch rate. I'm not clear on how kswapd can end up getting stuck and failing to free up something - unless it ends up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't enough to get it the memory it needs, causing a deadlock/loop between kswapd and __alloc_pages/wakeup_kswapd(). bounce buffers for one - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote: I've seen the same scenario about 2-3 times a week. kswapd and one or more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung on several occasions, and 2-3 times it's been xscreensaver (3.29) hung. The 'hung' processes are consistently immune to kill -9, even as root, which indicates to me that they're hung inside a kernel call or something. [snip] __alloc_pages: 4-order allocation failed. __alloc_pages: 3-order allocation failed. [snip] In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will cause it to loop around and try to get more memory. I'm wondering if [snip] I'm running the 2.4.3 kernel 2.4.3-pre6 quietly made a very significant change there: it used to say "if (!order) goto try_again;" and now just says "goto try_again;". Which seems very sensible since __GFP_WAIT is set, but I do wonder if it was a safe change. We have mechanisms for freeing pages (order 0), but whether any higher orders come out of that is a matter of chance. (But of course, this may not be related to your problem, and your "N-order allocation failed" messages must have been from other instances than stuck in this loop.) Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said: I've seen the same scenario about 2-3 times a week. kswapd and one or more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung on several occasions, and 2-3 times it's been xscreensaver (3.29) hung. The 'hung' processes are consistently immune to kill -9, even as root, which indicates to me that they're hung inside a kernel call or something. Do you have 800Mb of RAM ? 256M of RAM, 256M of swap. Here's /proc/meminfo as I type: [~]3 cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 260276224 246419456 138567680 8347648 75317248 Swap: 271392768 58589184 212803584 MemTotal: 254176 kB MemFree: 13532 kB MemShared: 0 kB Buffers: 8152 kB Cached: 73552 kB Active: 49716 kB Inact_dirty: 28800 kB Inact_clean: 3188 kB Inact_target: 212 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 254176 kB LowFree: 13532 kB SwapTotal: 265032 kB SwapFree: 207816 kB [~]3 would explain the high context-switch rate. I'm not clear on how kswapd can end up getting stuck and failing to free up something - unless it ends up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't enough to get it the memory it needs, causing a deadlock/loop between kswapd and __alloc_pages/wakeup_kswapd(). bounce buffers for one It's a Dell Optiplex GX110, using IDE. Grepping for 'bounce buffer' in the source shows most hits in the SCSI code, and nothing obviously jumping out at me... just speculating Is it possible that i810_audio.c is to blame? I'm looking at alloc_dmabuf() in there, and it tries to grab a big chunk of memory for a DMA buffer (starting at order-4), which probably explains my __alloc_pages messages. In addition, I run Enlightenment with audio enabled - so it's quite possible that xscreensaver will generate a 'click' sound when it pops up its dialog window - again tossing us into i810_audio. (scenario there - mouse event happens while screen locked, xscreensaver wakes up and starts mapping a window - E plays the sound, hosing the i810_audio driver, and then when xscreensaver gets the CPU back, its next call for a page gets wedged up. Would it be worth applying Ed Tomlinson's icache/dcache patches and seeing if that helps? -- Valdis Kletnieks Operating Systems Analyst Virginia Tech PGP signature
Re: scheduler went mad?
On Wed, Apr 11 2001, Josh McKinney wrote: I had the almost exact same thing happen to me just yesterday, I started up xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was well, and still is. Same kernel as you too. I can tell you why this happens. Earlier kernels allocated one cd frame worth of data for cdda ripping, but it was recently bumped to allow as many as the ripping program asks for (up to 8). This requires a 4-5 page allocation on x86, which is of course not reliable. cdrom.c adjusts for failed allocations and drops to fewer number of frames (8 - 4 - 2 and then just 1), but apparently the vm isn't handling this too well if kswapd is going crazy. I can switch to a static 8 frame allocation, but IMHO the vm should be able to handle situations like this. It's not that unusual for a driver to ask for a bigger chunk of memory if it can go faster that way, and then be prepared to settle for less if need be. For cdda ripping, it really does make a difference. However, I can change ide-cd to do scatter gather in this case. It's the nicer thing to do anyway. Does cdda2wav have some sort of 'do X number of frames at the time' option? If so, use 1 and there should be no problems. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
2.4.3-pre6 quietly made a very significant change there: it used to say "if (!order) goto try_again;" and now just says "goto try_again;". Which seems very sensible since __GFP_WAIT is set, but I do wonder if it was a safe change. We have mechanisms for freeing pages (order 0), but whether any higher orders come out of that is a matter of chance. The fundamental problem is that it should say wait_for_mm_progress(); goto try_again; and we dont have that facility right now. At that point the looping on failed allocations problem is ok as we will allow someone to make progress. That leaves the bounce buffers for 800Mb RAM which currently are seriously horked and will loop and may even stack overflow by inspection - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Alan Cox wrote: 2.4.3-pre6 quietly made a very significant change there: it used to say "if (!order) goto try_again;" and now just says "goto try_again;". Which seems very sensible since __GFP_WAIT is set, but I do wonder if it was a safe change. We have mechanisms for freeing pages (order 0), but whether any higher orders come out of that is a matter of chance. The fundamental problem is that it should say wait_for_mm_progress(); goto try_again; and we dont have that facility right now. From mm/page_alloc.c, around line 453: if (gfp_mask __GFP_WAIT) { memory_pressure++; try_to_free_pages(gfp_mask); wakeup_bdflush(0); goto try_again; } I guess we should remove the wakeup_bdflush(0) ... who put it there anyway ? regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Rik van Riel wrote: On Thu, 12 Apr 2001, Alan Cox wrote: 2.4.3-pre6 quietly made a very significant change there: it used to say "if (!order) goto try_again;" and now just says "goto try_again;". Which seems very sensible since __GFP_WAIT is set, but I do wonder if it was a safe change. We have mechanisms for freeing pages (order 0), but whether any higher orders come out of that is a matter of chance. The fundamental problem is that it should say wait_for_mm_progress(); goto try_again; and we dont have that facility right now. From mm/page_alloc.c, around line 453: if (gfp_mask __GFP_WAIT) { memory_pressure++; try_to_free_pages(gfp_mask); wakeup_bdflush(0); goto try_again; } I guess we should remove the wakeup_bdflush(0) ... who put it there anyway ? I did :) This should fix it --- mm/page_alloc.c.orig Thu Apr 12 13:47:53 2001 +++ mm/page_alloc.cThu Apr 12 13:48:06 2001 @@ -454,7 +454,7 @@ if (gfp_mask __GFP_WAIT) { memory_pressure++; try_to_free_pages(gfp_mask); - wakeup_bdflush(0); + balance_dirty(NODEV); goto try_again; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Marcelo Tosatti wrote: This should fix it --- mm/page_alloc.c.orig Thu Apr 12 13:47:53 2001 +++ mm/page_alloc.cThu Apr 12 13:48:06 2001 @@ -454,7 +454,7 @@ if (gfp_mask __GFP_WAIT) { memory_pressure++; try_to_free_pages(gfp_mask); - wakeup_bdflush(0); + balance_dirty(NODEV); goto try_again; } Remember that we can ONLY do this if we have __GFP_IO ... regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Marcelo Tosatti wrote: This patch is broken, ignore it. Just removing wakeup_bdflush() is indeed correct. We already wakeup bdflush at try_to_free_buffers() anyway. I still feel a bit unconfortable about processes looping forever in __alloc_pages and because of this oom_killer also can't be moved to page fault handler where I think its place should be. I'm using the patch below. Szaka --- mm/page_alloc.c.orig Sat Mar 31 19:07:22 2001 +++ mm/page_alloc.c Mon Apr 2 21:05:31 2001 @@ -453,8 +453,12 @@ */ if (gfp_mask __GFP_WAIT) { memory_pressure++; - try_to_free_pages(gfp_mask); - wakeup_bdflush(0); + if (!try_to_free_pages(gfp_mask)); + return NULL; goto try_again; } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote: On Thu, 12 Apr 2001, Marcelo Tosatti wrote: This patch is broken, ignore it. Just removing wakeup_bdflush() is indeed correct. We already wakeup bdflush at try_to_free_buffers() anyway. I still feel a bit unconfortable about processes looping forever in __alloc_pages and because of this oom_killer also can't be moved to page fault handler where I think its place should be. I'm using the patch below. It's BROKEN. This means that if you have one task using up all memory and you're waiting for the OOM kill of that task to have effect, your syslogd, etc... will have their allocations fail and will die. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said: Do you have 800Mb of RAM ? Following up - it just bit again (twice) The first time, it was xmms/kswapd fighting for CPU, and xmms was again immune to kill -9. Interestingly enough, several minutes later, I closed 'netscape', and xmms took the kill within a second or two. 10 minutes later, and another 2 programs that do audio got wedged up. Oddly enough, I did an 'su', and they broke loose immediately. I've ruled out i810_audio.c as a culprit - although I have programs that do audio hanging, *those* programs are always writing their data down a Unix socket to the actual process that writes to /dev/audio/dsp. Hmm.. 'su' writes to syslog, and netscape has a few Unix sockets too. Could the problem be related to running out of some resource related to Unix-domain sockets, which clears up once some socket is closed? Oddly enough, while I had 2 programs doing audio wedged, I was still seeing (hearing actually ;) *new* processes open a connection to esd and play sounds. Weird. -- Valdis Kletnieks Operating Systems Analyst Virginia Tech PGP signature
Re: scheduler went mad?
On Thu, 12 Apr 2001, Rik van Riel wrote: On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote: I still feel a bit unconfortable about processes looping forever in __alloc_pages and because of this oom_killer also can't be moved to page fault handler where I think its place should be. I'm using the patch below. It's BROKEN. This means that if you have one task using up all memory and you're waiting for the OOM kill of that task to have effect, your syslogd, etc... will have their allocations fail and will die. You mean without dropping out_of_memory() test in kswapd and calling oom_kill() in page fault [i.e. without additional patch]? Yes, you're competely true but I have the patch [see example below, 'm1' is the bad guy] just didn't have time to extensively test it and don't know whether there is side efffects getting rid of this infinite looping in __alloc_pages() but locked up processes apparently don't make people very happy ;) Szaka Out of Memory: Killed process 830 (m1), saved process 696 (httpd) procs memoryswap io system r b w swpd free buff cache si sobibo incs 6 0 0 0 9492100 1496 0 0 1386 2 2904 3877 5 0 0 0 7812104 1788 0 0 289 0 68922 5 0 0 0 6248104 1788 0 0 0 0 10819 5 0 0 0 4748108 1840 0 056 0 21921 5 0 0 0 3268108 1868 0 028 0 16523 5 0 1 0 1864 76 1868 0 0 0 5 12061 5 0 1 0 1432 76 1252 0 0 0 0 108 1130 5 0 1 0 1236 80796 0 065 0 246 4588 5 0 1 0 1236 80668 0 0 0 0 110 8869 6 0 1 0948112696 0 0 805 0 1814 8231 Out of Memory: Killed process 858 (m1), saved process 811 (vmstat) 5 0 1 0924152444 0 0 1153 0 2731 18231 4 0 1 0 1720148828 0 0 750 3 1711 1876 5 0 1 0 1156148760 0 0 290 0 723 1967 4 0 1 0 1152132664 0 070 0 277 7249 4 0 1 0 1140144560 0 054 0 238 7942 4 0 1 0 1140144460 0 032 0 212 7521 Out of Memory: Killed process 834 (m1), saved process 418 (identd) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote: You mean without dropping out_of_memory() test in kswapd and calling oom_kill() in page fault [i.e. without additional patch]? No. I think it's ok for __alloc_pages() to call oom_kill() IF we turn out to be out of memory, but that should not even be needed. Also, when a task in __alloc_pages() is OOM-killed, it will have PF_MEMALLOC set and will immediately break out of the loop. The rest of the system will spin around in the loop until the victim has exited and then their allocations will succeed. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Thu, 12 Apr 2001, Rik van Riel wrote: On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote: You mean without dropping out_of_memory() test in kswapd and calling oom_kill() in page fault [i.e. without additional patch]? No. I think it's ok for __alloc_pages() to call oom_kill() IF we turn out to be out of memory, but that should not even be needed. Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the same. After the system tried *really* hard to get *one* free page and couldn't managed why loop forever? To eat CPU and waiting for out_of_memory() to *guess* when system is in OOM? I don't think so, if processes can't progress because system can't page in any of their pages, somebody must go. Also, when a task in __alloc_pages() is OOM-killed, it will have PF_MEMALLOC set and will immediately break out of the loop. The rest of the system will spin around in the loop until the victim has exited and then their allocations will succeed. Yes, I think this is a problem. In page fault if OOM, "bad" process selected, scheduled, killed and everybody runs happily even without to notice system is low on memory. Fast and gracious process killing instead of slow, painful death IF out_of_memory() correctly detects OOM. Szaka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
On Fri, 13 Apr 2001 01:02:21 +0200, Szabolcs Szakacsits said: Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the same. After the system tried *really* hard to get *one* free page and couldn't managed why loop forever? To eat CPU and waiting for For what it's worth, this *IS NOT* the case I'm getting bit by: While kswapd was hung, I already had (from /proc/meminfo) MemFree: 34064 kB I suspect that kswapd is getting hung spinning on some *specific* requirement that it's falling short on? /Valdis PGP signature
Re: scheduler went mad?
I had the almost exact same thing happen to me just yesterday, I started up xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was well, and still is. Same kernel as you too. On approximately Wed, Apr 11, 2001 at 04:24:48PM +0200, Priit Randla wrote: > > > Hi, > > >Yesterday i tried to start cdda2wav but somehow it didn't do > anything. > It didn't die to kill -9 too. Machine was slow but usable. > vmstat 10 output: > > procs memoryswap io > system cpu > r b w swpd free buff cache si sobibo incs us > sy id > 2 0 1 2972 40916108 18292 0 0 0 0 121 12735 0 > 100 0 > 2 0 1 2972 40492108 18292 0 0 0 0 109 12740 1 > 99 0 > 2 0 1 2972 40492108 18292 0 0 0 0 103 12996 0 > 100 0 > 3 0 0 2972 40492108 18292 0 0 0 0 102 12932 0 > 100 0 > 3 0 1 2972 40492108 18292 0 0 0 0 131 12652 1 > 99 0 > 2 0 0 2972 40496108 18292 0 0 0 0 142 12562 1 > 99 0 > 2 0 0 2972 40500108 18292 0 0 0 0 120 12684 0 > 100 0 > 2 0 1 2972 40496108 18292 0 0 0 0 140 12480 1 > 99 0 > 2 0 0 2972 39952108 18292 0 0 0 0 160 11445 7 > 93 0 > 3 0 0 2972 39952108 18292 0 0 0 0 178 12295 2 > 98 0 > 2 0 0 2972 39956108 18292 0 0 0 0 214 11958 2 > 98 0 > 3 0 1 2972 39952108 18292 0 0 0 0 138 12579 1 > 99 0 > > cs field is absolutely ridiculous for my machine. > > > > ps showed cdda2wav & kswapd eating all of processor time. When i tried > to close > netscape, it hang too and joined cdda2wav and kswapd: > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME > COMMAND > 9990 priitr17 0 42380 41M 9928 R 0 32.5 33.4 21:47 > netscape-commun > 3 root 17 0 00 0 SW 0 32.3 0.0 11:12 > kswapd > 10538 priitr16 0848 0 R 0 32.3 0.0 11:09 > cdda2wav > 5 root 9 0 00 0 SW 0 1.5 0.0 0:19 > bdflush > 10616 priitr13 0 856 856 668 R 0 0.7 0.6 0:00 top > 657 root 9 0 21160 20M 1668 S 0 0.1 16.7 29:36 X > > > I couldn't leave X and had to kill it. After that, both netscape and > cdda2wav were > gone and everything looks normal since then. > I'm running 2.4.3ac3 right now. > > dmesg: > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler went mad?
I had the almost exact same thing happen to me just yesterday, I started up xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was well, and still is. Same kernel as you too. On approximately Wed, Apr 11, 2001 at 04:24:48PM +0200, Priit Randla wrote: Hi, Yesterday i tried to start cdda2wav but somehow it didn't do anything. It didn't die to kill -9 too. Machine was slow but usable. vmstat 10 output: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 2 0 1 2972 40916108 18292 0 0 0 0 121 12735 0 100 0 2 0 1 2972 40492108 18292 0 0 0 0 109 12740 1 99 0 2 0 1 2972 40492108 18292 0 0 0 0 103 12996 0 100 0 3 0 0 2972 40492108 18292 0 0 0 0 102 12932 0 100 0 3 0 1 2972 40492108 18292 0 0 0 0 131 12652 1 99 0 2 0 0 2972 40496108 18292 0 0 0 0 142 12562 1 99 0 2 0 0 2972 40500108 18292 0 0 0 0 120 12684 0 100 0 2 0 1 2972 40496108 18292 0 0 0 0 140 12480 1 99 0 2 0 0 2972 39952108 18292 0 0 0 0 160 11445 7 93 0 3 0 0 2972 39952108 18292 0 0 0 0 178 12295 2 98 0 2 0 0 2972 39956108 18292 0 0 0 0 214 11958 2 98 0 3 0 1 2972 39952108 18292 0 0 0 0 138 12579 1 99 0 cs field is absolutely ridiculous for my machine. ps showed cdda2wav kswapd eating all of processor time. When i tried to close netscape, it hang too and joined cdda2wav and kswapd: PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 9990 priitr17 0 42380 41M 9928 R 0 32.5 33.4 21:47 netscape-commun 3 root 17 0 00 0 SW 0 32.3 0.0 11:12 kswapd 10538 priitr16 0848 0 R 0 32.3 0.0 11:09 cdda2wav 5 root 9 0 00 0 SW 0 1.5 0.0 0:19 bdflush 10616 priitr13 0 856 856 668 R 0 0.7 0.6 0:00 top 657 root 9 0 21160 20M 1668 S 0 0.1 16.7 29:36 X I couldn't leave X and had to kill it. After that, both netscape and cdda2wav were gone and everything looks normal since then. I'm running 2.4.3ac3 right now. dmesg: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/