Re: scheduler went mad?

2001-04-12 Thread Valdis . Kletnieks

On Fri, 13 Apr 2001 01:02:21 +0200, Szabolcs Szakacsits said:

> Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the
> same. After the system tried *really* hard to get *one* free page and
> couldn't managed why loop forever? To eat CPU and waiting for

For what it's worth, this *IS NOT* the case I'm getting bit by:

While kswapd was hung, I already had (from /proc/meminfo)

MemFree: 34064 kB

I suspect that kswapd is getting hung spinning on some *specific*
requirement that it's falling short on?

/Valdis

 PGP signature


Re: scheduler went mad?

2001-04-12 Thread Szabolcs Szakacsits


On Thu, 12 Apr 2001, Rik van Riel wrote:
> On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
> > You mean without dropping out_of_memory() test in kswapd and calling
> > oom_kill() in page fault [i.e. without additional patch]?
> No.  I think it's ok for __alloc_pages() to call oom_kill()
> IF we turn out to be out of memory, but that should not even
> be needed.

Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the
same. After the system tried *really* hard to get *one* free page and
couldn't managed why loop forever? To eat CPU and waiting for
out_of_memory() to *guess* when system is in OOM? I don't think so, if
processes can't progress because system can't page in any of their
pages, somebody must go.

> Also, when a task in __alloc_pages() is OOM-killed, it will
> have PF_MEMALLOC set and will immediately break out of the
> loop. The rest of the system will spin around in the loop
> until the victim has exited and then their allocations will
> succeed.

Yes, I think this is a problem. In page fault if OOM, "bad" process
selected, scheduled, killed and everybody runs happily even without to
notice system is low on memory. Fast and gracious process killing
instead of slow, painful death IF out_of_memory() correctly detects OOM.

Szaka

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Rik van Riel

On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:

> You mean without dropping out_of_memory() test in kswapd and calling
> oom_kill() in page fault [i.e. without additional patch]?

No.  I think it's ok for __alloc_pages() to call oom_kill()
IF we turn out to be out of memory, but that should not even
be needed.

Also, when a task in __alloc_pages() is OOM-killed, it will
have PF_MEMALLOC set and will immediately break out of the
loop. The rest of the system will spin around in the loop
until the victim has exited and then their allocations will
succeed.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Szabolcs Szakacsits


On Thu, 12 Apr 2001, Rik van Riel wrote:
> On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
> > I still feel a bit unconfortable about processes looping forever in
> > __alloc_pages and because of this oom_killer also can't be moved to
> > page fault handler where I think its place should be. I'm using the
> > patch below.
> It's BROKEN.  This means that if you have one task using up
> all memory and you're waiting for the OOM kill of that task
> to have effect, your syslogd, etc... will have their allocations
> fail and will die.

You mean without dropping out_of_memory() test in kswapd and calling
oom_kill() in page fault [i.e. without additional patch]? Yes, you're
competely true but I have the patch [see example below, 'm1' is the bad
guy] just didn't have time to extensively test it and don't know whether
there is side efffects getting rid of this infinite looping in
__alloc_pages() but locked up processes apparently don't make people
very happy ;)

Szaka

Out of Memory: Killed process 830 (m1), saved process 696 (httpd)
   procs  memoryswap  io system
 r  b  w   swpd   free   buff  cache  si  sobibo   incs
 6  0  0  0   9492100   1496   0   0  1386 2 2904  3877
 5  0  0  0   7812104   1788   0   0   289 0  68922
 5  0  0  0   6248104   1788   0   0 0 0  10819
 5  0  0  0   4748108   1840   0   056 0  21921
 5  0  0  0   3268108   1868   0   028 0  16523
 5  0  1  0   1864 76   1868   0   0 0 5  12061
 5  0  1  0   1432 76   1252   0   0 0 0  108  1130
 5  0  1  0   1236 80796   0   065 0  246  4588
 5  0  1  0   1236 80668   0   0 0 0  110  8869
 6  0  1  0948112696   0   0   805 0 1814  8231
Out of Memory: Killed process 858 (m1), saved process 811 (vmstat)
 5  0  1  0924152444   0   0  1153 0 2731 18231
 4  0  1  0   1720148828   0   0   750 3 1711  1876
 5  0  1  0   1156148760   0   0   290 0  723  1967
 4  0  1  0   1152132664   0   070 0  277  7249
 4  0  1  0   1140144560   0   054 0  238  7942
 4  0  1  0   1140144460   0   032 0  212  7521
Out of Memory: Killed process 834 (m1), saved process 418 (identd)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Valdis . Kletnieks

On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said:

> Do you have > 800Mb of RAM ?

Following up - it just bit again (twice)

The first time, it was xmms/kswapd fighting for CPU, and xmms was again immune
to kill -9.  Interestingly enough, several minutes later, I closed 'netscape',
and xmms took the kill within a second or two.

10 minutes later, and another 2 programs that do audio got
wedged up. Oddly enough, I did an 'su', and they broke loose immediately.

I've ruled out i810_audio.c as a culprit - although I have programs that
do audio hanging, *those* programs are always writing their data down
a Unix socket to the actual process that writes to /dev/audio/dsp.
Hmm.. 'su' writes to syslog, and netscape has a few Unix sockets too.
Could the problem be related to running out of some resource related
to Unix-domain sockets, which clears up once some socket is closed?

Oddly enough, while I had 2 programs doing audio wedged, I was still
seeing (hearing actually ;) *new* processes open a connection to esd
and play sounds.  Weird.  


-- 
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech




 PGP signature


Re: scheduler went mad?

2001-04-12 Thread Rik van Riel

On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
> On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
> 
> > This patch is broken, ignore it.
> > Just removing wakeup_bdflush() is indeed correct.
> > We already wakeup bdflush at try_to_free_buffers() anyway.
> 
> I still feel a bit unconfortable about processes looping forever in
> __alloc_pages and because of this oom_killer also can't be moved to
> page fault handler where I think its place should be. I'm using the
> patch below.

It's BROKEN.  This means that if you have one task using up
all memory and you're waiting for the OOM kill of that task
to have effect, your syslogd, etc... will have their allocations
fail and will die.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Szabolcs Szakacsits


On Thu, 12 Apr 2001, Marcelo Tosatti wrote:

> This patch is broken, ignore it.
> Just removing wakeup_bdflush() is indeed correct.
> We already wakeup bdflush at try_to_free_buffers() anyway.

I still feel a bit unconfortable about processes looping forever in
__alloc_pages and because of this oom_killer also can't be moved to page
fault handler where I think its place should be. I'm using the patch
below.

Szaka

--- mm/page_alloc.c.orig  Sat Mar 31 19:07:22 2001
+++ mm/page_alloc.c Mon Apr  2 21:05:31 2001
@@ -453,8 +453,12 @@
 */
if (gfp_mask & __GFP_WAIT) {
memory_pressure++;
-   try_to_free_pages(gfp_mask);
-   wakeup_bdflush(0);
+   if (!try_to_free_pages(gfp_mask));
+   return NULL;
goto try_again;
}
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Rik van Riel

On Thu, 12 Apr 2001, Marcelo Tosatti wrote:

> This should fix it 
> 
> --- mm/page_alloc.c.orig   Thu Apr 12 13:47:53 2001
> +++ mm/page_alloc.cThu Apr 12 13:48:06 2001
> @@ -454,7 +454,7 @@
> if (gfp_mask & __GFP_WAIT) {
> memory_pressure++;
> try_to_free_pages(gfp_mask);
> -   wakeup_bdflush(0);
> +   balance_dirty(NODEV);
> goto try_again;
> }

Remember that we can ONLY do this if we have __GFP_IO ...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Marcelo Tosatti




On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
> 
> I did :)
> 
> This should fix it 
> 
> --- mm/page_alloc.c.orig   Thu Apr 12 13:47:53 2001
> +++ mm/page_alloc.cThu Apr 12 13:48:06 2001
> @@ -454,7 +454,7 @@
> if (gfp_mask & __GFP_WAIT) {
> memory_pressure++;
> try_to_free_pages(gfp_mask);
> -   wakeup_bdflush(0);
> +   balance_dirty(NODEV);
> goto try_again;
> }

This patch is broken, ignore it. 

Just removing wakeup_bdflush() is indeed correct. 

We already wakeup bdflush at try_to_free_buffers() anyway.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Marcelo Tosatti



On Thu, 12 Apr 2001, Rik van Riel wrote:

> On Thu, 12 Apr 2001, Alan Cox wrote:
> 
> > > 2.4.3-pre6 quietly made a very significant change there:
> > > it used to say "if (!order) goto try_again;" and now just
> > > says "goto try_again;".  Which seems very sensible since
> > > __GFP_WAIT is set, but I do wonder if it was a safe change.
> > > We have mechanisms for freeing pages (order 0), but whether
> > > any higher orders come out of that is a matter of chance.
> > 
> > The fundamental problem is that it should say
> > 
> > wait_for_mm_progress();
> > goto try_again;
> > 
> > and we dont have that facility right now.
> 
> >From mm/page_alloc.c, around line 453:
> 
> if (gfp_mask & __GFP_WAIT) {
> memory_pressure++;
> try_to_free_pages(gfp_mask);
> wakeup_bdflush(0);
> goto try_again;
> }
> 
> I guess we should remove the wakeup_bdflush(0) ... who put it
> there anyway ?

I did :)

This should fix it 

--- mm/page_alloc.c.orig   Thu Apr 12 13:47:53 2001
+++ mm/page_alloc.cThu Apr 12 13:48:06 2001
@@ -454,7 +454,7 @@
if (gfp_mask & __GFP_WAIT) {
memory_pressure++;
try_to_free_pages(gfp_mask);
-   wakeup_bdflush(0);
+   balance_dirty(NODEV);
goto try_again;
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Rik van Riel

On Thu, 12 Apr 2001, Alan Cox wrote:

> > 2.4.3-pre6 quietly made a very significant change there:
> > it used to say "if (!order) goto try_again;" and now just
> > says "goto try_again;".  Which seems very sensible since
> > __GFP_WAIT is set, but I do wonder if it was a safe change.
> > We have mechanisms for freeing pages (order 0), but whether
> > any higher orders come out of that is a matter of chance.
> 
> The fundamental problem is that it should say
> 
>   wait_for_mm_progress();
>   goto try_again;
> 
> and we dont have that facility right now.

>From mm/page_alloc.c, around line 453:

if (gfp_mask & __GFP_WAIT) {
memory_pressure++;
try_to_free_pages(gfp_mask);
wakeup_bdflush(0);
goto try_again;
}

I guess we should remove the wakeup_bdflush(0) ... who put it
there anyway ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Alan Cox

> 2.4.3-pre6 quietly made a very significant change there:
> it used to say "if (!order) goto try_again;" and now just
> says "goto try_again;".  Which seems very sensible since
> __GFP_WAIT is set, but I do wonder if it was a safe change.
> We have mechanisms for freeing pages (order 0), but whether
> any higher orders come out of that is a matter of chance.

The fundamental problem is that it should say

wait_for_mm_progress();
goto try_again;

and we dont have that facility right now. At that point the looping on
failed allocations problem is ok as we will allow someone to make progress.
That leaves the bounce buffers for > 800Mb RAM which currently are seriously
horked and will loop and may even stack overflow by inspection

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Jens Axboe

On Wed, Apr 11 2001, Josh McKinney wrote:
> I had the almost exact same thing happen to me just yesterday, I started up
> xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was 
> well, and still is.
> 
> Same kernel as you too.

I can tell you why this happens. Earlier kernels allocated one cd frame
worth of data for cdda ripping, but it was recently bumped to allow as
many as the ripping program asks for (up to 8). This requires a 4-5 page
allocation on x86, which is of course not reliable. cdrom.c adjusts for
failed allocations and drops to fewer number of frames (8 -> 4 -> 2 and
then just 1), but apparently the vm isn't handling this too well if
kswapd is going crazy.

I can switch to a static 8 frame allocation, but IMHO the vm should be
able to handle situations like this. It's not that unusual for a driver
to ask for a bigger chunk of memory if it can go faster that way, and
then be prepared to settle for less if need be. For cdda ripping, it
really does make a difference.

However, I can change ide-cd to do scatter gather in this case. It's the
nicer thing to do anyway. Does cdda2wav have some sort of 'do X number
of frames at the time' option? If so, use 1 and there should be no
problems.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Valdis . Kletnieks

On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said:
> > I've seen the same scenario about 2-3 times a week.  kswapd and one or
> > more processes all CPU bound, totalling to 100%.  I've had 'esdplay' hung
> > on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
> > The 'hung' processes are consistently immune to kill -9, even as root, which
> > indicates to me that they're hung inside a kernel call or something.
> 
> Do you have > 800Mb of RAM ?

256M of RAM, 256M of swap.

Here's /proc/meminfo as I type:
[~]3 cat /proc/meminfo 
total:used:free:  shared: buffers:  cached:
Mem:  260276224 246419456 138567680  8347648 75317248
Swap: 271392768 58589184 212803584
MemTotal:   254176 kB
MemFree: 13532 kB
MemShared:   0 kB
Buffers:  8152 kB
Cached:  73552 kB
Active:  49716 kB
Inact_dirty: 28800 kB
Inact_clean:  3188 kB
Inact_target:  212 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   254176 kB
LowFree: 13532 kB
SwapTotal:  265032 kB
SwapFree:   207816 kB
[~]3 

> > would explain the high context-switch rate.  I'm not clear on how kswapd
> > can end up getting stuck and failing to free up something - unless it ends
> > up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
> > enough to get it the memory it needs, causing a deadlock/loop between
> > kswapd and __alloc_pages/wakeup_kswapd().
> 
> bounce buffers for one

It's a Dell Optiplex GX110, using IDE.  Grepping for 'bounce buffer' in
the source shows most hits in the SCSI code, and nothing obviously jumping
out at me...

 Is it possible that i810_audio.c is to blame? I'm looking
at alloc_dmabuf() in there, and it tries to grab a big chunk of memory
for a DMA buffer (starting at order-4), which probably explains my __alloc_pages
messages.  In addition, I run Enlightenment with audio enabled - so it's
quite possible that xscreensaver will generate a 'click' sound when it
pops up its dialog window - again tossing us into i810_audio. (scenario
there - mouse event happens while screen locked, xscreensaver wakes up and
starts mapping a window - E plays the sound, hosing the i810_audio driver,
and then when xscreensaver gets the CPU back, its next call for a page
gets wedged up.

Would it be worth applying Ed Tomlinson's icache/dcache patches and seeing
if that helps?


-- 
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech



 PGP signature


Re: scheduler went mad?

2001-04-12 Thread Hugh Dickins

On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:
> I've seen the same scenario about 2-3 times a week.  kswapd and one or
> more processes all CPU bound, totalling to 100%.  I've had 'esdplay' hung
> on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
> The 'hung' processes are consistently immune to kill -9, even as root, which
> indicates to me that they're hung inside a kernel call or something.
[snip]
> __alloc_pages: 4-order allocation failed.
> __alloc_pages: 3-order allocation failed.
[snip]
> In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
> cause it to loop around and try to get more memory.  I'm wondering if
[snip]
> I'm running the 2.4.3 kernel

2.4.3-pre6 quietly made a very significant change there:
it used to say "if (!order) goto try_again;" and now just
says "goto try_again;".  Which seems very sensible since
__GFP_WAIT is set, but I do wonder if it was a safe change.
We have mechanisms for freeing pages (order 0), but whether
any higher orders come out of that is a matter of chance.

(But of course, this may not be related to your problem,
and your "N-order allocation failed" messages must have
been from other instances than stuck in this loop.)

Hugh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Alan Cox

> I've seen the same scenario about 2-3 times a week.  kswapd and one or
> more processes all CPU bound, totalling to 100%.  I've had 'esdplay' hung
> on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
> The 'hung' processes are consistently immune to kill -9, even as root, which
> indicates to me that they're hung inside a kernel call or something.

Do you have > 800Mb of RAM ?

> In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
> cause it to loop around and try to get more memory.  I'm wondering if

Even outside of that certain drivers also loop on alloc failures as does 
TCP.

> would explain the high context-switch rate.  I'm not clear on how kswapd
> can end up getting stuck and failing to free up something - unless it ends
> up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
> enough to get it the memory it needs, causing a deadlock/loop between
> kswapd and __alloc_pages/wakeup_kswapd().

bounce buffers for one

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Valdis . Kletnieks

I've seen the same scenario about 2-3 times a week.  kswapd and one or
more processes all CPU bound, totalling to 100%.  I've had 'esdplay' hung
on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
The 'hung' processes are consistently immune to kill -9, even as root, which
indicates to me that they're hung inside a kernel call or something.

Sometimes, something *else* will exit, and everything will 'break loose'
and return to normal after a minute or so.

It *may* not be related, but I also have a lot of this in 'dmesg':

__alloc_pages: 4-order allocation failed.
__alloc_pages: 3-order allocation failed.
i810_audio: DMA overrun on send

There was a recent posting re: the i810_audio driver amounting to "I've got
one bug to fix and then I'll put up a patch" for the 'dma overrun' message.
__alloc_pages doesn't give much information on who its caller was, so
that's somewhat of a dead end...

In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
cause it to loop around and try to get more memory.  I'm wondering if
the "hung" process is entering __alloc_pages(), and gets wedged in the
'try_again' loop - which has a call to wakeup_kswapd() inside it, which
would explain the high context-switch rate.  I'm not clear on how kswapd
can end up getting stuck and failing to free up something - unless it ends
up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
enough to get it the memory it needs, causing a deadlock/loop between
kswapd and __alloc_pages/wakeup_kswapd().

Unfortunately, I've just exhausted my ability to debug this one here.. ;) 

I'm running the 2.4.3 kernel, with the following patches:

Reiserfs: 2.4.3-3.6.25.quota.bz2
linux-2.4.3-knfsd-6.g.patch.gz
linux-2.4.3-reiserfs-20010327.patch.bz2

IPv6: linux24-2.4.3-usagi-20010406.patch.gz
Crypto: patch-int-2.4.3.1

am using ReiserFS-on-LVM for basically all filesystems, if that matters...

-- 
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech


 PGP signature


Re: scheduler went mad?

2001-04-12 Thread Valdis . Kletnieks

I've seen the same scenario about 2-3 times a week.  kswapd and one or
more processes all CPU bound, totalling to 100%.  I've had 'esdplay' hung
on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
The 'hung' processes are consistently immune to kill -9, even as root, which
indicates to me that they're hung inside a kernel call or something.

Sometimes, something *else* will exit, and everything will 'break loose'
and return to normal after a minute or so.

It *may* not be related, but I also have a lot of this in 'dmesg':

__alloc_pages: 4-order allocation failed.
__alloc_pages: 3-order allocation failed.
i810_audio: DMA overrun on send

There was a recent posting re: the i810_audio driver amounting to "I've got
one bug to fix and then I'll put up a patch" for the 'dma overrun' message.
__alloc_pages doesn't give much information on who its caller was, so
that's somewhat of a dead end...

In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
cause it to loop around and try to get more memory.  I'm wondering if
the "hung" process is entering __alloc_pages(), and gets wedged in the
'try_again' loop - which has a call to wakeup_kswapd() inside it, which
would explain the high context-switch rate.  I'm not clear on how kswapd
can end up getting stuck and failing to free up something - unless it ends
up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
enough to get it the memory it needs, causing a deadlock/loop between
kswapd and __alloc_pages/wakeup_kswapd().

Unfortunately, I've just exhausted my ability to debug this one here.. ;) 

I'm running the 2.4.3 kernel, with the following patches:

Reiserfs: 2.4.3-3.6.25.quota.bz2
linux-2.4.3-knfsd-6.g.patch.gz
linux-2.4.3-reiserfs-20010327.patch.bz2

IPv6: linux24-2.4.3-usagi-20010406.patch.gz
Crypto: patch-int-2.4.3.1

am using ReiserFS-on-LVM for basically all filesystems, if that matters...

-- 
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech


 PGP signature


Re: scheduler went mad?

2001-04-12 Thread Alan Cox

 I've seen the same scenario about 2-3 times a week.  kswapd and one or
 more processes all CPU bound, totalling to 100%.  I've had 'esdplay' hung
 on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
 The 'hung' processes are consistently immune to kill -9, even as root, which
 indicates to me that they're hung inside a kernel call or something.

Do you have  800Mb of RAM ?

 In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
 cause it to loop around and try to get more memory.  I'm wondering if

Even outside of that certain drivers also loop on alloc failures as does 
TCP.

 would explain the high context-switch rate.  I'm not clear on how kswapd
 can end up getting stuck and failing to free up something - unless it ends
 up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
 enough to get it the memory it needs, causing a deadlock/loop between
 kswapd and __alloc_pages/wakeup_kswapd().

bounce buffers for one

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Hugh Dickins

On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:
 I've seen the same scenario about 2-3 times a week.  kswapd and one or
 more processes all CPU bound, totalling to 100%.  I've had 'esdplay' hung
 on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
 The 'hung' processes are consistently immune to kill -9, even as root, which
 indicates to me that they're hung inside a kernel call or something.
[snip]
 __alloc_pages: 4-order allocation failed.
 __alloc_pages: 3-order allocation failed.
[snip]
 In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
 cause it to loop around and try to get more memory.  I'm wondering if
[snip]
 I'm running the 2.4.3 kernel

2.4.3-pre6 quietly made a very significant change there:
it used to say "if (!order) goto try_again;" and now just
says "goto try_again;".  Which seems very sensible since
__GFP_WAIT is set, but I do wonder if it was a safe change.
We have mechanisms for freeing pages (order 0), but whether
any higher orders come out of that is a matter of chance.

(But of course, this may not be related to your problem,
and your "N-order allocation failed" messages must have
been from other instances than stuck in this loop.)

Hugh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Valdis . Kletnieks

On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said:
  I've seen the same scenario about 2-3 times a week.  kswapd and one or
  more processes all CPU bound, totalling to 100%.  I've had 'esdplay' hung
  on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
  The 'hung' processes are consistently immune to kill -9, even as root, which
  indicates to me that they're hung inside a kernel call or something.
 
 Do you have  800Mb of RAM ?

256M of RAM, 256M of swap.

Here's /proc/meminfo as I type:
[~]3 cat /proc/meminfo 
total:used:free:  shared: buffers:  cached:
Mem:  260276224 246419456 138567680  8347648 75317248
Swap: 271392768 58589184 212803584
MemTotal:   254176 kB
MemFree: 13532 kB
MemShared:   0 kB
Buffers:  8152 kB
Cached:  73552 kB
Active:  49716 kB
Inact_dirty: 28800 kB
Inact_clean:  3188 kB
Inact_target:  212 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   254176 kB
LowFree: 13532 kB
SwapTotal:  265032 kB
SwapFree:   207816 kB
[~]3 

  would explain the high context-switch rate.  I'm not clear on how kswapd
  can end up getting stuck and failing to free up something - unless it ends
  up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
  enough to get it the memory it needs, causing a deadlock/loop between
  kswapd and __alloc_pages/wakeup_kswapd().
 
 bounce buffers for one

It's a Dell Optiplex GX110, using IDE.  Grepping for 'bounce buffer' in
the source shows most hits in the SCSI code, and nothing obviously jumping
out at me...

just speculating Is it possible that i810_audio.c is to blame? I'm looking
at alloc_dmabuf() in there, and it tries to grab a big chunk of memory
for a DMA buffer (starting at order-4), which probably explains my __alloc_pages
messages.  In addition, I run Enlightenment with audio enabled - so it's
quite possible that xscreensaver will generate a 'click' sound when it
pops up its dialog window - again tossing us into i810_audio. (scenario
there - mouse event happens while screen locked, xscreensaver wakes up and
starts mapping a window - E plays the sound, hosing the i810_audio driver,
and then when xscreensaver gets the CPU back, its next call for a page
gets wedged up.

Would it be worth applying Ed Tomlinson's icache/dcache patches and seeing
if that helps?


-- 
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech



 PGP signature


Re: scheduler went mad?

2001-04-12 Thread Jens Axboe

On Wed, Apr 11 2001, Josh McKinney wrote:
 I had the almost exact same thing happen to me just yesterday, I started up
 xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was 
 well, and still is.
 
 Same kernel as you too.

I can tell you why this happens. Earlier kernels allocated one cd frame
worth of data for cdda ripping, but it was recently bumped to allow as
many as the ripping program asks for (up to 8). This requires a 4-5 page
allocation on x86, which is of course not reliable. cdrom.c adjusts for
failed allocations and drops to fewer number of frames (8 - 4 - 2 and
then just 1), but apparently the vm isn't handling this too well if
kswapd is going crazy.

I can switch to a static 8 frame allocation, but IMHO the vm should be
able to handle situations like this. It's not that unusual for a driver
to ask for a bigger chunk of memory if it can go faster that way, and
then be prepared to settle for less if need be. For cdda ripping, it
really does make a difference.

However, I can change ide-cd to do scatter gather in this case. It's the
nicer thing to do anyway. Does cdda2wav have some sort of 'do X number
of frames at the time' option? If so, use 1 and there should be no
problems.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Alan Cox

 2.4.3-pre6 quietly made a very significant change there:
 it used to say "if (!order) goto try_again;" and now just
 says "goto try_again;".  Which seems very sensible since
 __GFP_WAIT is set, but I do wonder if it was a safe change.
 We have mechanisms for freeing pages (order 0), but whether
 any higher orders come out of that is a matter of chance.

The fundamental problem is that it should say

wait_for_mm_progress();
goto try_again;

and we dont have that facility right now. At that point the looping on
failed allocations problem is ok as we will allow someone to make progress.
That leaves the bounce buffers for  800Mb RAM which currently are seriously
horked and will loop and may even stack overflow by inspection

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Rik van Riel

On Thu, 12 Apr 2001, Alan Cox wrote:

  2.4.3-pre6 quietly made a very significant change there:
  it used to say "if (!order) goto try_again;" and now just
  says "goto try_again;".  Which seems very sensible since
  __GFP_WAIT is set, but I do wonder if it was a safe change.
  We have mechanisms for freeing pages (order 0), but whether
  any higher orders come out of that is a matter of chance.
 
 The fundamental problem is that it should say
 
   wait_for_mm_progress();
   goto try_again;
 
 and we dont have that facility right now.

From mm/page_alloc.c, around line 453:

if (gfp_mask  __GFP_WAIT) {
memory_pressure++;
try_to_free_pages(gfp_mask);
wakeup_bdflush(0);
goto try_again;
}

I guess we should remove the wakeup_bdflush(0) ... who put it
there anyway ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Marcelo Tosatti



On Thu, 12 Apr 2001, Rik van Riel wrote:

 On Thu, 12 Apr 2001, Alan Cox wrote:
 
   2.4.3-pre6 quietly made a very significant change there:
   it used to say "if (!order) goto try_again;" and now just
   says "goto try_again;".  Which seems very sensible since
   __GFP_WAIT is set, but I do wonder if it was a safe change.
   We have mechanisms for freeing pages (order 0), but whether
   any higher orders come out of that is a matter of chance.
  
  The fundamental problem is that it should say
  
  wait_for_mm_progress();
  goto try_again;
  
  and we dont have that facility right now.
 
 From mm/page_alloc.c, around line 453:
 
 if (gfp_mask  __GFP_WAIT) {
 memory_pressure++;
 try_to_free_pages(gfp_mask);
 wakeup_bdflush(0);
 goto try_again;
 }
 
 I guess we should remove the wakeup_bdflush(0) ... who put it
 there anyway ?

I did :)

This should fix it 

--- mm/page_alloc.c.orig   Thu Apr 12 13:47:53 2001
+++ mm/page_alloc.cThu Apr 12 13:48:06 2001
@@ -454,7 +454,7 @@
if (gfp_mask  __GFP_WAIT) {
memory_pressure++;
try_to_free_pages(gfp_mask);
-   wakeup_bdflush(0);
+   balance_dirty(NODEV);
goto try_again;
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Rik van Riel

On Thu, 12 Apr 2001, Marcelo Tosatti wrote:

 This should fix it 
 
 --- mm/page_alloc.c.orig   Thu Apr 12 13:47:53 2001
 +++ mm/page_alloc.cThu Apr 12 13:48:06 2001
 @@ -454,7 +454,7 @@
 if (gfp_mask  __GFP_WAIT) {
 memory_pressure++;
 try_to_free_pages(gfp_mask);
 -   wakeup_bdflush(0);
 +   balance_dirty(NODEV);
 goto try_again;
 }

Remember that we can ONLY do this if we have __GFP_IO ...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Szabolcs Szakacsits


On Thu, 12 Apr 2001, Marcelo Tosatti wrote:

 This patch is broken, ignore it.
 Just removing wakeup_bdflush() is indeed correct.
 We already wakeup bdflush at try_to_free_buffers() anyway.

I still feel a bit unconfortable about processes looping forever in
__alloc_pages and because of this oom_killer also can't be moved to page
fault handler where I think its place should be. I'm using the patch
below.

Szaka

--- mm/page_alloc.c.orig  Sat Mar 31 19:07:22 2001
+++ mm/page_alloc.c Mon Apr  2 21:05:31 2001
@@ -453,8 +453,12 @@
 */
if (gfp_mask  __GFP_WAIT) {
memory_pressure++;
-   try_to_free_pages(gfp_mask);
-   wakeup_bdflush(0);
+   if (!try_to_free_pages(gfp_mask));
+   return NULL;
goto try_again;
}
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Rik van Riel

On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
 On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
 
  This patch is broken, ignore it.
  Just removing wakeup_bdflush() is indeed correct.
  We already wakeup bdflush at try_to_free_buffers() anyway.
 
 I still feel a bit unconfortable about processes looping forever in
 __alloc_pages and because of this oom_killer also can't be moved to
 page fault handler where I think its place should be. I'm using the
 patch below.

It's BROKEN.  This means that if you have one task using up
all memory and you're waiting for the OOM kill of that task
to have effect, your syslogd, etc... will have their allocations
fail and will die.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Valdis . Kletnieks

On Thu, 12 Apr 2001 16:12:55 BST, Alan Cox said:

 Do you have  800Mb of RAM ?

Following up - it just bit again (twice)

The first time, it was xmms/kswapd fighting for CPU, and xmms was again immune
to kill -9.  Interestingly enough, several minutes later, I closed 'netscape',
and xmms took the kill within a second or two.

10 minutes later, and another 2 programs that do audio got
wedged up. Oddly enough, I did an 'su', and they broke loose immediately.

I've ruled out i810_audio.c as a culprit - although I have programs that
do audio hanging, *those* programs are always writing their data down
a Unix socket to the actual process that writes to /dev/audio/dsp.
Hmm.. 'su' writes to syslog, and netscape has a few Unix sockets too.
Could the problem be related to running out of some resource related
to Unix-domain sockets, which clears up once some socket is closed?

Oddly enough, while I had 2 programs doing audio wedged, I was still
seeing (hearing actually ;) *new* processes open a connection to esd
and play sounds.  Weird.  


-- 
Valdis Kletnieks
Operating Systems Analyst
Virginia Tech




 PGP signature


Re: scheduler went mad?

2001-04-12 Thread Szabolcs Szakacsits


On Thu, 12 Apr 2001, Rik van Riel wrote:
 On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
  I still feel a bit unconfortable about processes looping forever in
  __alloc_pages and because of this oom_killer also can't be moved to
  page fault handler where I think its place should be. I'm using the
  patch below.
 It's BROKEN.  This means that if you have one task using up
 all memory and you're waiting for the OOM kill of that task
 to have effect, your syslogd, etc... will have their allocations
 fail and will die.

You mean without dropping out_of_memory() test in kswapd and calling
oom_kill() in page fault [i.e. without additional patch]? Yes, you're
competely true but I have the patch [see example below, 'm1' is the bad
guy] just didn't have time to extensively test it and don't know whether
there is side efffects getting rid of this infinite looping in
__alloc_pages() but locked up processes apparently don't make people
very happy ;)

Szaka

Out of Memory: Killed process 830 (m1), saved process 696 (httpd)
   procs  memoryswap  io system
 r  b  w   swpd   free   buff  cache  si  sobibo   incs
 6  0  0  0   9492100   1496   0   0  1386 2 2904  3877
 5  0  0  0   7812104   1788   0   0   289 0  68922
 5  0  0  0   6248104   1788   0   0 0 0  10819
 5  0  0  0   4748108   1840   0   056 0  21921
 5  0  0  0   3268108   1868   0   028 0  16523
 5  0  1  0   1864 76   1868   0   0 0 5  12061
 5  0  1  0   1432 76   1252   0   0 0 0  108  1130
 5  0  1  0   1236 80796   0   065 0  246  4588
 5  0  1  0   1236 80668   0   0 0 0  110  8869
 6  0  1  0948112696   0   0   805 0 1814  8231
Out of Memory: Killed process 858 (m1), saved process 811 (vmstat)
 5  0  1  0924152444   0   0  1153 0 2731 18231
 4  0  1  0   1720148828   0   0   750 3 1711  1876
 5  0  1  0   1156148760   0   0   290 0  723  1967
 4  0  1  0   1152132664   0   070 0  277  7249
 4  0  1  0   1140144560   0   054 0  238  7942
 4  0  1  0   1140144460   0   032 0  212  7521
Out of Memory: Killed process 834 (m1), saved process 418 (identd)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Rik van Riel

On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:

 You mean without dropping out_of_memory() test in kswapd and calling
 oom_kill() in page fault [i.e. without additional patch]?

No.  I think it's ok for __alloc_pages() to call oom_kill()
IF we turn out to be out of memory, but that should not even
be needed.

Also, when a task in __alloc_pages() is OOM-killed, it will
have PF_MEMALLOC set and will immediately break out of the
loop. The rest of the system will spin around in the loop
until the victim has exited and then their allocations will
succeed.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Szabolcs Szakacsits


On Thu, 12 Apr 2001, Rik van Riel wrote:
 On Thu, 12 Apr 2001, Szabolcs Szakacsits wrote:
  You mean without dropping out_of_memory() test in kswapd and calling
  oom_kill() in page fault [i.e. without additional patch]?
 No.  I think it's ok for __alloc_pages() to call oom_kill()
 IF we turn out to be out of memory, but that should not even
 be needed.

Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the
same. After the system tried *really* hard to get *one* free page and
couldn't managed why loop forever? To eat CPU and waiting for
out_of_memory() to *guess* when system is in OOM? I don't think so, if
processes can't progress because system can't page in any of their
pages, somebody must go.

 Also, when a task in __alloc_pages() is OOM-killed, it will
 have PF_MEMALLOC set and will immediately break out of the
 loop. The rest of the system will spin around in the loop
 until the victim has exited and then their allocations will
 succeed.

Yes, I think this is a problem. In page fault if OOM, "bad" process
selected, scheduled, killed and everybody runs happily even without to
notice system is low on memory. Fast and gracious process killing
instead of slow, painful death IF out_of_memory() correctly detects OOM.

Szaka

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-12 Thread Valdis . Kletnieks

On Fri, 13 Apr 2001 01:02:21 +0200, Szabolcs Szakacsits said:

 Not __alloc_pages() calls oom_kill() however do_page_fault(). Not the
 same. After the system tried *really* hard to get *one* free page and
 couldn't managed why loop forever? To eat CPU and waiting for

For what it's worth, this *IS NOT* the case I'm getting bit by:

While kswapd was hung, I already had (from /proc/meminfo)

MemFree: 34064 kB

I suspect that kswapd is getting hung spinning on some *specific*
requirement that it's falling short on?

/Valdis

 PGP signature


Re: scheduler went mad?

2001-04-11 Thread Josh McKinney

I had the almost exact same thing happen to me just yesterday, I started up
xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was 
well, and still is.

Same kernel as you too.

On approximately Wed, Apr 11, 2001 at 04:24:48PM +0200, Priit Randla wrote:
> 
> 
> Hi,
>
> 
>Yesterday i tried to start cdda2wav but somehow it didn't do
> anything.
>   It didn't die to kill -9 too. Machine was slow but usable. 
>   vmstat 10 output:
> 
>   procs  memoryswap  io
> system cpu
>  r  b  w   swpd   free   buff  cache  si  sobibo   incs  us 
> sy  id
>  2  0  1   2972  40916108  18292   0   0 0 0  121 12735   0
> 100   0
>  2  0  1   2972  40492108  18292   0   0 0 0  109 12740   1 
> 99   0
>  2  0  1   2972  40492108  18292   0   0 0 0  103 12996   0
> 100   0
>  3  0  0   2972  40492108  18292   0   0 0 0  102 12932   0
> 100   0
>  3  0  1   2972  40492108  18292   0   0 0 0  131 12652   1 
> 99   0
>  2  0  0   2972  40496108  18292   0   0 0 0  142 12562   1 
> 99   0
>  2  0  0   2972  40500108  18292   0   0 0 0  120 12684   0
> 100   0
>  2  0  1   2972  40496108  18292   0   0 0 0  140 12480   1 
> 99   0
>  2  0  0   2972  39952108  18292   0   0 0 0  160 11445   7 
> 93   0
>  3  0  0   2972  39952108  18292   0   0 0 0  178 12295   2 
> 98   0
>  2  0  0   2972  39956108  18292   0   0 0 0  214 11958   2 
> 98   0
>  3  0  1   2972  39952108  18292   0   0 0 0  138 12579   1 
> 99   0
> 
> cs field is absolutely ridiculous for my machine.
>  
>   
> 
> ps showed cdda2wav & kswapd eating all of processor time. When i tried
> to close
> netscape, it hang too and joined cdda2wav and kswapd:
> 
>   PID USER PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME
> COMMAND
>  9990 priitr17   0 42380  41M  9928 R   0 32.5 33.4  21:47
> netscape-commun
> 3 root  17   0 00 0 SW  0 32.3  0.0  11:12
> kswapd
> 10538 priitr16   0848 0 R   0 32.3  0.0  11:09
> cdda2wav
> 5 root   9   0 00 0 SW  0  1.5  0.0   0:19
> bdflush
> 10616 priitr13   0   856  856   668 R   0  0.7  0.6   0:00 top
>   657 root   9   0 21160  20M  1668 S   0  0.1 16.7  29:36 X
> 
> 
> I couldn't leave X and had to kill it. After that, both netscape and
> cdda2wav were
> gone and everything looks normal since then.
> I'm running 2.4.3ac3 right now.
> 
> dmesg:
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scheduler went mad?

2001-04-11 Thread Josh McKinney

I had the almost exact same thing happen to me just yesterday, I started up
xcdroast, and cdda2wav and kswapd went crazy, backed out of X and all was 
well, and still is.

Same kernel as you too.

On approximately Wed, Apr 11, 2001 at 04:24:48PM +0200, Priit Randla wrote:
 
 
 Hi,

 
Yesterday i tried to start cdda2wav but somehow it didn't do
 anything.
   It didn't die to kill -9 too. Machine was slow but usable. 
   vmstat 10 output:
 
   procs  memoryswap  io
 system cpu
  r  b  w   swpd   free   buff  cache  si  sobibo   incs  us 
 sy  id
  2  0  1   2972  40916108  18292   0   0 0 0  121 12735   0
 100   0
  2  0  1   2972  40492108  18292   0   0 0 0  109 12740   1 
 99   0
  2  0  1   2972  40492108  18292   0   0 0 0  103 12996   0
 100   0
  3  0  0   2972  40492108  18292   0   0 0 0  102 12932   0
 100   0
  3  0  1   2972  40492108  18292   0   0 0 0  131 12652   1 
 99   0
  2  0  0   2972  40496108  18292   0   0 0 0  142 12562   1 
 99   0
  2  0  0   2972  40500108  18292   0   0 0 0  120 12684   0
 100   0
  2  0  1   2972  40496108  18292   0   0 0 0  140 12480   1 
 99   0
  2  0  0   2972  39952108  18292   0   0 0 0  160 11445   7 
 93   0
  3  0  0   2972  39952108  18292   0   0 0 0  178 12295   2 
 98   0
  2  0  0   2972  39956108  18292   0   0 0 0  214 11958   2 
 98   0
  3  0  1   2972  39952108  18292   0   0 0 0  138 12579   1 
 99   0
 
 cs field is absolutely ridiculous for my machine.
  
   
 
 ps showed cdda2wav  kswapd eating all of processor time. When i tried
 to close
 netscape, it hang too and joined cdda2wav and kswapd:
 
   PID USER PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME
 COMMAND
  9990 priitr17   0 42380  41M  9928 R   0 32.5 33.4  21:47
 netscape-commun
 3 root  17   0 00 0 SW  0 32.3  0.0  11:12
 kswapd
 10538 priitr16   0848 0 R   0 32.3  0.0  11:09
 cdda2wav
 5 root   9   0 00 0 SW  0  1.5  0.0   0:19
 bdflush
 10616 priitr13   0   856  856   668 R   0  0.7  0.6   0:00 top
   657 root   9   0 21160  20M  1668 S   0  0.1 16.7  29:36 X
 
 
 I couldn't leave X and had to kill it. After that, both netscape and
 cdda2wav were
 gone and everything looks normal since then.
 I'm running 2.4.3ac3 right now.
 
 dmesg:
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/