Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-05 Thread Mark Lord
Arjan van de Ven wrote: On Mon, 3 Dec 2007 11:27:15 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. What should it do when the NFS server doesn't answer anymore or when the network to the SAN RAID array located a few hundred KM

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-05 Thread Mark Lord
Arjan van de Ven wrote: On Mon, 3 Dec 2007 11:27:15 +0100 Andi Kleen [EMAIL PROTECTED] wrote: Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. What should it do when the NFS server doesn't answer anymore or when the network to the SAN RAID array located a few hundred KM

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > Er, it won't play well if that happen when tasks are frozen for > > > suspend. > > > > right now any suspend attempt times out after 20 seconds: > > > > $ grep TIMEOUT kernel/power/process.c > > #define TIMEOUT (20 * HZ) > >

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki
On Monday, 3 of December 2007, Ingo Molnar wrote: > > * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > This feature will save one full reporter-developer round-trip during > > > investigation of a significant number of bug reports. > > > > > > It might be more practical if it were to

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > This feature will save one full reporter-developer round-trip during > > investigation of a significant number of bug reports. > > > > It might be more practical if it were to dump the traces for _all_ > > D-state processes when it fires -

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki
On Monday, 3 of December 2007, Andrew Morton wrote: > On Mon, 3 Dec 2007 15:19:25 +0100 > Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > this patch extends the soft-lockup detector to automatically > > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are > > printed the following way: > > >

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andrew Morton
On Mon, 3 Dec 2007 15:19:25 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > this patch extends the soft-lockup detector to automatically > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are > printed the following way: > > --> > INFO: task prctl:3042 blocked for more

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ray Lee
On Dec 3, 2007 6:17 AM, Andi Kleen <[EMAIL PROTECTED]> wrote: > That won't address my concerns about already "breaking" (as in > frightening the user etc.) common error handling scenarios by default. Andi, may I respectfully submit that you're not understanding real users here? Real users

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
> the scsi layer will have the IO totally aborted within that time anyway; > the retry timeout for disks is 30 seconds after all. There are blocking waits who wait for multiple IOs. Also i think the SCSI driver can tune this anyways and I suspect iSCSI and friends increase it (?) -Andi -- To

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Arjan van de Ven
On Mon, 3 Dec 2007 11:27:15 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly > > broken. > > What should it do when the NFS server doesn't answer anymore or > when the network to the SAN RAID array located a few hundred KM away >

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > debugging feature can be disabled/enabled on a wide scale already: > > > > - in the .config > > > > - runtime, temporarily, via: > > > > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > > That won't address my concerns about already

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > Now Ingo's latest unreleased version with single line messages might > be actually ok if he turns off the backtraces by default. > Unfortunately I wasn't able to find out so far if he has done that or > not, he always cuts away these parts of the

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 02:55:47PM +0100, Ingo Molnar wrote: > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > I would still appreciate if you could state what default value you > > plan to set the backtrace sysctl to in the submitted patch. > > there's no "backtrace sysctl" planned for the

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 02:59:16PM +0100, Ingo Molnar wrote: > Andi, is that true? If yes, why didnt Andi state this concern outright, > instead of pooh-pooh-ing the patch on various other grounds? No of course not. Radoslaw is talking nonsense. -Andi -- To unsubscribe from this list: send the

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
> It's more like "lets warn about it and fix the problems when we find > some." It is already known there are lots of problems. I won't repeat them because I already wrote too much about them. Feel free to read back in the thread. Now if all the known problems are fixed and only some hard to

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Radoslaw Szkodzinski <[EMAIL PROTECTED]> wrote: > On Mon, 3 Dec 2007 14:29:56 +0100 > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > > > feedback about an impending catastrophy has been duly noted > > > > > > The point was less about an impending catastrophe, but more of a > > > timebomb

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Pekka Enberg <[EMAIL PROTECTED]> wrote: > Hi, > > On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: > > > "audit thousands of callsites in 8 million lines of code first" is a > > > nice euphemism for hiding from the blame forever. We had 10 years for it > > On Dec 3, 2007 2:13 PM,

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > I would still appreciate if you could state what default value you > plan to set the backtrace sysctl to in the submitted patch. there's no "backtrace sysctl" planned for the moment. This "hung tasks" debugging feature can be disabled/enabled on a

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Pekka Enberg
Hi, On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: > > "audit thousands of callsites in 8 million lines of code first" is a > > nice euphemism for hiding from the blame forever. We had 10 years for it On Dec 3, 2007 2:13 PM, Andi Kleen <[EMAIL PROTECTED]> wrote: > Ok your approach

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
I would still appreciate if you could state what default value you plan to set the backtrace sysctl to in the submitted patch. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread AstralStorm
On Mon, 3 Dec 2007 14:29:56 +0100 > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > feedback about an impending catastrophy has been duly noted > > > > The point was less about an impending catastrophe, but more of a > > timebomb ticking until the next widely used release. I think I know why

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
> negative I would consider it positive, but ok. If I was negative I would probably not care and just make always sure to disable SOFTLOCKUP in the kernels I use. > feedback about an impending catastrophy has been duly noted The point was less about an impending catastrophe, but more of a

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > you are over-designing it way too much - a backtrace is obviously > > very helpful and it must be printed by default. There's enough > > configurability in it already so that you can turn it off if you > > want. > > So it will hit everybody first

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 01:28:33PM +0100, Ingo Molnar wrote: > > > On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: > > > no. (that's why i added the '(or a kill -9)' qualification above - if > > > NFS is mounted noninterruptible then standard signals (such as Ctrl-C) > > > should

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: > > no. (that's why i added the '(or a kill -9)' qualification above - if > > NFS is mounted noninterruptible then standard signals (such as Ctrl-C) > > should not have an interrupting

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: > no. (that's why i added the '(or a kill -9)' qualification above - if > NFS is mounted noninterruptible then standard signals (such as Ctrl-C) > should not have an interrupting effect.) NFS is already interruptible with umount -f (I

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote: > > > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. > > > > > > What should it do when the NFS server doesn't

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote: > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. > > > > What should it do when the NFS server doesn't answer anymore or when > > the network to the SAN RAID

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. > > What should it do when the NFS server doesn't answer anymore or when > the network to the SAN RAID array located a few hundred KM away > develops some hickup? [...] maybe:

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
> Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. What should it do when the NFS server doesn't answer anymore or when the network to the SAN RAID array located a few hundred KM away develops some hickup? Or just the SCSI driver decides to do lengthy error recovery --

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Radoslaw Szkodzinski <[EMAIL PROTECTED]> wrote: > > iirc TASK_KILLABLE fixed NFS only. While that's a good thing there > > are unfortunately a lot more subsystems that would need the same > > treatment. > > Yes, that's exactly why the patch is needed - to find the bugs and fix > them.

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread AstralStorm
On Mon, 3 Dec 2007 10:55:01 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote: > > On Mon, 3 Dec 2007 01:07:41 +0100 > > Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > > This patch will likely work against that by breaking error paths.

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote: > On Mon, 3 Dec 2007 01:07:41 +0100 > Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > We really need to get better diagnostics for the > > > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to > > > get to the scenario

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote: On Mon, 3 Dec 2007 01:07:41 +0100 Andi Kleen [EMAIL PROTECTED] wrote: We really need to get better diagnostics for the bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to get to the scenario where we have

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. What should it do when the NFS server doesn't answer anymore or when the network to the SAN RAID array located a few hundred KM away develops some hickup? [...] maybe: if the

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread AstralStorm
On Mon, 3 Dec 2007 10:55:01 +0100 Andi Kleen [EMAIL PROTECTED] wrote: On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote: On Mon, 3 Dec 2007 01:07:41 +0100 Andi Kleen [EMAIL PROTECTED] wrote: This patch will likely work against that by breaking error paths. it won't

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Radoslaw Szkodzinski [EMAIL PROTECTED] wrote: iirc TASK_KILLABLE fixed NFS only. While that's a good thing there are unfortunately a lot more subsystems that would need the same treatment. Yes, that's exactly why the patch is needed - to find the bugs and fix them. Otherwise

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. What should it do when the NFS server doesn't answer anymore or when the network to the SAN RAID array located a few hundred KM away develops some hickup? Or just the SCSI driver decides to do lengthy error recovery --

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. What should it do when the NFS server doesn't answer anymore or when the network to the SAN RAID array located a

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. What should it do when the NFS server doesn't answer anymore or when

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: no. (that's why i added the '(or a kill -9)' qualification above - if NFS is mounted noninterruptible then standard signals (such as Ctrl-C) should not have an interrupting effect.) NFS

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: no. (that's why i added the '(or a kill -9)' qualification above - if NFS is mounted noninterruptible then standard signals (such as Ctrl-C) should not have an interrupting effect.) NFS is already interruptible with umount -f (I

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 01:28:33PM +0100, Ingo Molnar wrote: On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: no. (that's why i added the '(or a kill -9)' qualification above - if NFS is mounted noninterruptible then standard signals (such as Ctrl-C) should not have an

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: you are over-designing it way too much - a backtrace is obviously very helpful and it must be printed by default. There's enough configurability in it already so that you can turn it off if you want. So it will hit everybody first before they

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
negative I would consider it positive, but ok. If I was negative I would probably not care and just make always sure to disable SOFTLOCKUP in the kernels I use. feedback about an impending catastrophy has been duly noted The point was less about an impending catastrophe, but more of a

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Radoslaw Szkodzinski [EMAIL PROTECTED] wrote: On Mon, 3 Dec 2007 14:29:56 +0100 * Andi Kleen [EMAIL PROTECTED] wrote: feedback about an impending catastrophy has been duly noted The point was less about an impending catastrophe, but more of a timebomb ticking until the

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: I would still appreciate if you could state what default value you plan to set the backtrace sysctl to in the submitted patch. there's no backtrace sysctl planned for the moment. This hung tasks debugging feature can be disabled/enabled on a wide scale

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Pekka Enberg
Hi, On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: audit thousands of callsites in 8 million lines of code first is a nice euphemism for hiding from the blame forever. We had 10 years for it On Dec 3, 2007 2:13 PM, Andi Kleen [EMAIL PROTECTED] wrote: Ok your approach is then

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread AstralStorm
On Mon, 3 Dec 2007 14:29:56 +0100 * Andi Kleen [EMAIL PROTECTED] wrote: feedback about an impending catastrophy has been duly noted The point was less about an impending catastrophe, but more of a timebomb ticking until the next widely used release. I think I know why Andi is so

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
I would still appreciate if you could state what default value you plan to set the backtrace sysctl to in the submitted patch. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Pekka Enberg [EMAIL PROTECTED] wrote: Hi, On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote: audit thousands of callsites in 8 million lines of code first is a nice euphemism for hiding from the blame forever. We had 10 years for it On Dec 3, 2007 2:13 PM, Andi Kleen

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: debugging feature can be disabled/enabled on a wide scale already: - in the .config - runtime, temporarily, via: echo 0 /proc/sys/kernel/hung_task_timeout_secs That won't address my concerns about already breaking (as in

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 02:55:47PM +0100, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: I would still appreciate if you could state what default value you plan to set the backtrace sysctl to in the submitted patch. there's no backtrace sysctl planned for the moment. This

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
It's more like lets warn about it and fix the problems when we find some. It is already known there are lots of problems. I won't repeat them because I already wrote too much about them. Feel free to read back in the thread. Now if all the known problems are fixed and only some hard to know

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: Now Ingo's latest unreleased version with single line messages might be actually ok if he turns off the backtraces by default. Unfortunately I wasn't able to find out so far if he has done that or not, he always cuts away these parts of the emails.

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 02:59:16PM +0100, Ingo Molnar wrote: Andi, is that true? If yes, why didnt Andi state this concern outright, instead of pooh-pooh-ing the patch on various other grounds? No of course not. Radoslaw is talking nonsense. -Andi -- To unsubscribe from this list: send the

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Arjan van de Ven
On Mon, 3 Dec 2007 11:27:15 +0100 Andi Kleen [EMAIL PROTECTED] wrote: Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken. What should it do when the NFS server doesn't answer anymore or when the network to the SAN RAID array located a few hundred KM away develops some

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
the scsi layer will have the IO totally aborted within that time anyway; the retry timeout for disks is 30 seconds after all. There are blocking waits who wait for multiple IOs. Also i think the SCSI driver can tune this anyways and I suspect iSCSI and friends increase it (?) -Andi -- To

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ray Lee
On Dec 3, 2007 6:17 AM, Andi Kleen [EMAIL PROTECTED] wrote: That won't address my concerns about already breaking (as in frightening the user etc.) common error handling scenarios by default. Andi, may I respectfully submit that you're not understanding real users here? Real users either: -

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andrew Morton
On Mon, 3 Dec 2007 15:19:25 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: this patch extends the soft-lockup detector to automatically detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are printed the following way: -- INFO: task prctl:3042 blocked for more than 120

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki
On Monday, 3 of December 2007, Andrew Morton wrote: On Mon, 3 Dec 2007 15:19:25 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: this patch extends the soft-lockup detector to automatically detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are printed the following way:

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Rafael J. Wysocki [EMAIL PROTECTED] wrote: This feature will save one full reporter-developer round-trip during investigation of a significant number of bug reports. It might be more practical if it were to dump the traces for _all_ D-state processes when it fires - basically an

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki
On Monday, 3 of December 2007, Ingo Molnar wrote: * Rafael J. Wysocki [EMAIL PROTECTED] wrote: This feature will save one full reporter-developer round-trip during investigation of a significant number of bug reports. It might be more practical if it were to dump the traces for

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar
* Rafael J. Wysocki [EMAIL PROTECTED] wrote: Er, it won't play well if that happen when tasks are frozen for suspend. right now any suspend attempt times out after 20 seconds: $ grep TIMEOUT kernel/power/process.c #define TIMEOUT (20 * HZ) end_time = jiffies +

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Mon, 3 Dec 2007 01:07:41 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > > We really need to get better diagnostics for the > > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to > > get to the scenario where we have a more or less robust measure of > > kernel quality (and we're

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
> We really need to get better diagnostics for the > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to get > to the scenario where we have a more or less robust measure of kernel > quality (and we're not all that far off for several cases), one thing One measure to kernel quality

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
> Delay accounting (or the /proc//sched fields that i added recently) > only get updated once a task has finished its unreasonably long delay > and has scheduled. If it is stuck forever then you can just use sysrq-t If it recovers delay accounting will catch it. > detected_ this way. This is

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 21:47:25 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > > Out of direct experience, 95% of the "too long delay" cases are > > plain old bugs. The rest we can (and must!) convert to > > TASK_KILLABLE or could > > I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ > > something that most humans consider as "buggy" in the overwhelming > > majority of cases, regardless of the reason? Yes, there are and will > > be some exceptions, but not

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > Until now users had little direct recourse to get such problems > > fixed. (we had sysrq-t, but that included no real metric of how long > > a task was > > Actually task delay accounting can measure this now. iirc someone had > a latencytop based

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 22:19:25 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > Until now users had little direct recourse to get such problems > > fixed. (we had sysrq-t, but that included no real metric of how > > long a task was > > Actually task delay accounting can measure this now. iirc

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Ingo Molnar <[EMAIL PROTECTED]> writes: > > do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ > something that most humans consider as "buggy" in the overwhelming > majority of cases, regardless of the reason? Yes, there are and will be > some exceptions, but not nearly as

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote: > > what if you considered - just for a minute - the possibility of this > > debug tool being the thing that actually animates developers to fix such > > long delay bugs that have bothered

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote: > what if you considered - just for a minute - the possibility of this > debug tool being the thing that actually animates developers to fix such > long delay bugs that have bothered users for almost a decade meanwhile? Throwing

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > Out of direct experience, 95% of the "too long delay" cases are plain > > old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could > > I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs). It > would be pretty bad to

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
> Out of direct experience, 95% of the "too long delay" cases are plain > old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs). It would be pretty bad to merge this patch without converting them to

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > .. and it's even a tool to show where we missed making something > > TASK_KILLABLE... anything that triggers from NFS and the like really > > ought to be TASK_KILLABLE after all. This patch will point any > > omissions out quite nicely without

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Arjan van de Ven <[EMAIL PROTECTED]> wrote: > > TASK_KILLABLE should be the right solution i think. > > .. and it's even a tool to show where we missed making something > TASK_KILLABLE... anything that triggers from NFS and the like really > ought to be TASK_KILLABLE after all. This patch

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
> .. and it's even a tool to show where we missed making something > TASK_KILLABLE... anything that triggers from NFS and the like really > ought to be TASK_KILLABLE after all. This patch will point any > omissions out quite nicely without having to do any kind of destructive > testing. It would

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 19:59:45 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > > > this patch extends the soft-lockup detector to automatically > > > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > this patch extends the soft-lockup detector to automatically > > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are > > printed the following way: > > That will likely trigger anytime a hard

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Ingo Molnar <[EMAIL PROTECTED]> writes: > this patch extends the soft-lockup detector to automatically > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are > printed the following way: That will likely trigger anytime a hard nfs/cifs mount loses its server for 120s. To make this work

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread David Rientjes
On Sun, 2 Dec 2007, Ingo Oeser wrote: > > maybe, but we'd have to see how often this gets triggered. An OOM is > > something that could happen in any overloaded system - while a hung task > > is likely due to a kernel bug. > > What about a client using hard mounted NFS shares here? That

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Ingo Oeser <[EMAIL PROTECTED]> wrote: > On Saturday 01 December 2007, Ingo Molnar wrote: > > maybe, but we'd have to see how often this gets triggered. An OOM is > > something that could happen in any overloaded system - while a hung task > > is likely due to a kernel bug. > > What about a

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Ingo Oeser [EMAIL PROTECTED] wrote: On Saturday 01 December 2007, Ingo Molnar wrote: maybe, but we'd have to see how often this gets triggered. An OOM is something that could happen in any overloaded system - while a hung task is likely due to a kernel bug. What about a client

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread David Rientjes
On Sun, 2 Dec 2007, Ingo Oeser wrote: maybe, but we'd have to see how often this gets triggered. An OOM is something that could happen in any overloaded system - while a hung task is likely due to a kernel bug. What about a client using hard mounted NFS shares here? That shouldn't be

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Ingo Molnar [EMAIL PROTECTED] writes: this patch extends the soft-lockup detector to automatically detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are printed the following way: That will likely trigger anytime a hard nfs/cifs mount loses its server for 120s. To make this work you

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: Ingo Molnar [EMAIL PROTECTED] writes: this patch extends the soft-lockup detector to automatically detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are printed the following way: That will likely trigger anytime a hard nfs/cifs mount loses

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 19:59:45 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: * Andi Kleen [EMAIL PROTECTED] wrote: Ingo Molnar [EMAIL PROTECTED] writes: this patch extends the soft-lockup detector to automatically detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are printed

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: .. and it's even a tool to show where we missed making something TASK_KILLABLE... anything that triggers from NFS and the like really ought to be TASK_KILLABLE after all. This patch will point any omissions out quite nicely without having to do

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Arjan van de Ven [EMAIL PROTECTED] wrote: TASK_KILLABLE should be the right solution i think. .. and it's even a tool to show where we missed making something TASK_KILLABLE... anything that triggers from NFS and the like really ought to be TASK_KILLABLE after all. This patch will

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Out of direct experience, 95% of the too long delay cases are plain old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs). It would be pretty bad to merge this patch without converting them to TASK_KILLABLE

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
.. and it's even a tool to show where we missed making something TASK_KILLABLE... anything that triggers from NFS and the like really ought to be TASK_KILLABLE after all. This patch will point any omissions out quite nicely without having to do any kind of destructive testing. It would be

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote: what if you considered - just for a minute - the possibility of this debug tool being the thing that actually animates developers to fix such long delay bugs that have bothered users for almost a decade meanwhile? Throwing frequent

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote: what if you considered - just for a minute - the possibility of this debug tool being the thing that actually animates developers to fix such long delay bugs that have bothered users for

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Ingo Molnar [EMAIL PROTECTED] writes: do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ something that most humans consider as buggy in the overwhelming majority of cases, regardless of the reason? Yes, there are and will be some exceptions, but not nearly as countless as

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: Out of direct experience, 95% of the too long delay cases are plain old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs). It would be pretty bad to merge this

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ something that most humans consider as buggy in the overwhelming majority of cases, regardless of the reason? Yes, there are and will be some exceptions, but not nearly as

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 22:19:25 +0100 Andi Kleen [EMAIL PROTECTED] wrote: Until now users had little direct recourse to get such problems fixed. (we had sysrq-t, but that included no real metric of how long a task was Actually task delay accounting can measure this now. iirc someone

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: Until now users had little direct recourse to get such problems fixed. (we had sysrq-t, but that included no real metric of how long a task was Actually task delay accounting can measure this now. iirc someone had a latencytop based on it

  1   2   >