Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-05 Thread Mark Lord

Arjan van de Ven wrote:

On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:


Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
broken.
What should it do when the NFS server doesn't answer anymore or 
when the network to the SAN RAID array located a few hundred KM away

develops some hickup?  Or just the SCSI driver decides to do lengthy
error recovery  -- you could argue that is broken if it takes longer 
than 2 minutes, but in practice these things are hard to test

and to fix.



the scsi layer will have the IO totally aborted within that time anyway;
the retry timeout for disks is 30 seconds after all.

..

Mmm.. but the SCSI layer may do many retries, each with 30sec timeouts..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-05 Thread Mark Lord

Arjan van de Ven wrote:

On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen [EMAIL PROTECTED] wrote:


Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
broken.
What should it do when the NFS server doesn't answer anymore or 
when the network to the SAN RAID array located a few hundred KM away

develops some hickup?  Or just the SCSI driver decides to do lengthy
error recovery  -- you could argue that is broken if it takes longer 
than 2 minutes, but in practice these things are hard to test

and to fix.



the scsi layer will have the IO totally aborted within that time anyway;
the retry timeout for disks is 30 seconds after all.

..

Mmm.. but the SCSI layer may do many retries, each with 30sec timeouts..
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:

> > > Er, it won't play well if that happen when tasks are frozen for 
> > > suspend.
> > 
> > right now any suspend attempt times out after 20 seconds:
> > 
> >   $ grep TIMEOUT kernel/power/process.c
> >   #define TIMEOUT (20 * HZ)
> >   end_time = jiffies + TIMEOUT;
> 
> This is the timeout for freezing tasks, but if the freezing succeeds, 
> they can stay in TASK_UNINTERRUPTIBLE for quite some more time, 
> especially during a hibernation (the tasks stay frozen until we power 
> off the system after saving the image).

ah, ok. So this was a live bug - thanks for the clarification.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki
On Monday, 3 of December 2007, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> 
> > > This feature will save one full reporter-developer round-trip during 
> > > investigation of a significant number of bug reports.
> > > 
> > > It might be more practical if it were to dump the traces for _all_ 
> > > D-state processes when it fires - basically an auto-triggered 
> > > sysrq-W.
> > 
> > Er, it won't play well if that happen when tasks are frozen for 
> > suspend.
> 
> right now any suspend attempt times out after 20 seconds:
> 
>   $ grep TIMEOUT kernel/power/process.c
>   #define TIMEOUT (20 * HZ)
>   end_time = jiffies + TIMEOUT;

This is the timeout for freezing tasks, but if the freezing succeeds, they
can stay in TASK_UNINTERRUPTIBLE for quite some more time, especially during
a hibernation (the tasks stay frozen until we power off the system after saving
the image).

> which should be well before the 120 seconds timeout that the detector 
> uses. But indeed you are right in that the refrigerator() works via 
> TASK_UNINTERRUPTIBLE too. I've updated the patch to exclude PF_FROZEN - 
> attached below. That should solve this particular issue, even if the 
> timeout increased to above 20 secs, right?

Sure.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:

> > This feature will save one full reporter-developer round-trip during 
> > investigation of a significant number of bug reports.
> > 
> > It might be more practical if it were to dump the traces for _all_ 
> > D-state processes when it fires - basically an auto-triggered 
> > sysrq-W.
> 
> Er, it won't play well if that happen when tasks are frozen for 
> suspend.

right now any suspend attempt times out after 20 seconds:

  $ grep TIMEOUT kernel/power/process.c
  #define TIMEOUT (20 * HZ)
  end_time = jiffies + TIMEOUT;

which should be well before the 120 seconds timeout that the detector 
uses. But indeed you are right in that the refrigerator() works via 
TASK_UNINTERRUPTIBLE too. I've updated the patch to exclude PF_FROZEN - 
attached below. That should solve this particular issue, even if the 
timeout increased to above 20 secs, right?

Ingo

-->
Subject: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
From: Ingo Molnar <[EMAIL PROTECTED]>

this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:

 -->
 INFO: task prctl:3042 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
 prctl D fd5e3793 0  3042   2997
f6050f38 0046 0001 fd5e3793 0009 c06d8264 c06dae80 0286
f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 0001 f605 
f7e92d00 0286 f6050f18 c0489d1a f6050f40 6605  c0133a5b
 Call Trace:
  [] schedule_timeout+0x6d/0x8b
  [] schedule_timeout_uninterruptible+0x15/0x17
  [] msleep+0x10/0x16
  [] sys_prctl+0x30/0x1e2
  [] sysenter_past_esp+0x5f/0xa5
  ===
 2 locks held by prctl/3042:
 #0:  (>s_type->i_mutex_key#5){--..}, at: [] do_fsync+0x38/0x7a
 #1:  (jbd_handle){--..}, at: [] journal_start+0xc7/0xe9
 <--

the current default timeout is 120 seconds. Such messages are printed
up to 10 times per bootup. If the system has crashed already then the
messages are not printed.

if lockdep is enabled then all held locks are printed as well.

this feature is a natural extension to the softlockup-detector (kernel
locked up without scheduling) and to the NMI watchdog (kernel locked up
with IRQs disabled).

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |   15 --
 include/linux/debug_locks.h  |5 ++
 include/linux/sched.h|   10 
 kernel/fork.c|5 ++
 kernel/lockdep.c |   14 -
 kernel/sched.c   |4 -
 kernel/softlockup.c  |  106 ++-
 kernel/sysctl.c  |   27 ++
 8 files changed, 169 insertions(+), 17 deletions(-)

Index: linux/arch/x86/kernel/process_32.c
===
--- linux.orig/arch/x86/kernel/process_32.c
+++ linux/arch/x86/kernel/process_32.c
@@ -114,10 +114,19 @@ void default_idle(void)
smp_mb();
 
local_irq_disable();
-   if (!need_resched())
+   if (!need_resched()) {
+   ktime_t t0, t1;
+   u64 t0n, t1n;
+
+   t0 = ktime_get();
+   t0n = ktime_to_ns(t0);
safe_halt();/* enables interrupts racelessly */
-   else
-   local_irq_enable();
+   local_irq_disable();
+   t1 = ktime_get();
+   t1n = ktime_to_ns(t1);
+   sched_clock_idle_wakeup_event(t1n - t0n);
+   }
+   local_irq_enable();
current_thread_info()->status |= TS_POLLING;
} else {
/* loop is done by the caller */
Index: linux/include/linux/debug_locks.h
===
--- linux.orig/include/linux/debug_locks.h
+++ linux/include/linux/debug_locks.h
@@ -47,6 +47,7 @@ struct task_struct;
 
 #ifdef CONFIG_LOCKDEP
 extern void debug_show_all_locks(void);
+extern void __debug_show_held_locks(struct task_struct *task);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
 extern void debug_check_no_locks_held(struct task_struct *task);
@@ -55,6 +56,10 @@ static inline void debug_show_all_locks(
 {
 }
 
+static inline void __debug_show_held_locks(struct task_struct *task)
+{
+}
+
 static inline void debug_show_held_locks(struct task_struct *task)
 {
 }
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ 

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki
On Monday, 3 of December 2007, Andrew Morton wrote:
> On Mon, 3 Dec 2007 15:19:25 +0100
> Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > this patch extends the soft-lockup detector to automatically
> > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> > printed the following way:
> > 
> >  -->
> >  INFO: task prctl:3042 blocked for more than 120 seconds.
> >  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
> >  prctl D fd5e3793 0  3042   2997
> > f6050f38 0046 0001 fd5e3793 0009 c06d8264 c06dae80 
> > 0286
> > f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 0001 f605 
> > 
> > f7e92d00 0286 f6050f18 c0489d1a f6050f40 6605  
> > c0133a5b
> >  Call Trace:
> >   [] schedule_timeout+0x6d/0x8b
> >   [] schedule_timeout_uninterruptible+0x15/0x17
> >   [] msleep+0x10/0x16
> >   [] sys_prctl+0x30/0x1e2
> >   [] sysenter_past_esp+0x5f/0xa5
> >   ===
> >  2 locks held by prctl/3042:
> >  #0:  (>s_type->i_mutex_key#5){--..}, at: [] 
> > do_fsync+0x38/0x7a
> >  #1:  (jbd_handle){--..}, at: [] journal_start+0xc7/0xe9
> >  <--
> > 
> > the current default timeout is 120 seconds. Such messages are printed
> > up to 10 times per bootup. If the system has crashed already then the
> > messages are not printed.
> > 
> > if lockdep is enabled then all held locks are printed as well.
> > 
> > this feature is a natural extension to the softlockup-detector (kernel
> > locked up without scheduling) and to the NMI watchdog (kernel locked up
> > with IRQs disabled).
> 
> This feature will save one full reporter-developer round-trip during
> investigation of a significant number of bug reports.
> 
> It might be more practical if it were to dump the traces for _all_
> D-state processes when it fires - basically an auto-triggered sysrq-W.

Er, it won't play well if that happen when tasks are frozen for suspend.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andrew Morton
On Mon, 3 Dec 2007 15:19:25 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> this patch extends the soft-lockup detector to automatically
> detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> printed the following way:
> 
>  -->
>  INFO: task prctl:3042 blocked for more than 120 seconds.
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
>  prctl D fd5e3793 0  3042   2997
> f6050f38 0046 0001 fd5e3793 0009 c06d8264 c06dae80 
> 0286
> f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 0001 f605 
> 
> f7e92d00 0286 f6050f18 c0489d1a f6050f40 6605  
> c0133a5b
>  Call Trace:
>   [] schedule_timeout+0x6d/0x8b
>   [] schedule_timeout_uninterruptible+0x15/0x17
>   [] msleep+0x10/0x16
>   [] sys_prctl+0x30/0x1e2
>   [] sysenter_past_esp+0x5f/0xa5
>   ===
>  2 locks held by prctl/3042:
>  #0:  (>s_type->i_mutex_key#5){--..}, at: [] do_fsync+0x38/0x7a
>  #1:  (jbd_handle){--..}, at: [] journal_start+0xc7/0xe9
>  <--
> 
> the current default timeout is 120 seconds. Such messages are printed
> up to 10 times per bootup. If the system has crashed already then the
> messages are not printed.
> 
> if lockdep is enabled then all held locks are printed as well.
> 
> this feature is a natural extension to the softlockup-detector (kernel
> locked up without scheduling) and to the NMI watchdog (kernel locked up
> with IRQs disabled).

This feature will save one full reporter-developer round-trip during
investigation of a significant number of bug reports.

It might be more practical if it were to dump the traces for _all_
D-state processes when it fires - basically an auto-triggered sysrq-W.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ray Lee
On Dec 3, 2007 6:17 AM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> That won't address my concerns about already "breaking" (as in
> frightening the user etc.) common error handling scenarios by default.

Andi, may I respectfully submit that you're not understanding real users here?

Real users either:
  -  Never look at their syslog
  -  Or look, and aren't terribly scared by what's in there.

Perhaps I'm wrong, and you can point to a message on lkml where
someone has posted a backtrace with fear in their voice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
> the scsi layer will have the IO totally aborted within that time anyway;
> the retry timeout for disks is 30 seconds after all.

There are blocking waits who wait for multiple IOs. 

Also i think the SCSI driver can tune this anyways and I suspect
iSCSI and friends increase it (?)

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Arjan van de Ven
On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
> > broken.
> 
> What should it do when the NFS server doesn't answer anymore or 
> when the network to the SAN RAID array located a few hundred KM away
> develops some hickup?  Or just the SCSI driver decides to do lengthy
> error recovery  -- you could argue that is broken if it takes longer 
> than 2 minutes, but in practice these things are hard to test
> and to fix.
> 

the scsi layer will have the IO totally aborted within that time anyway;
the retry timeout for disks is 30 seconds after all.




-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > debugging feature can be disabled/enabled on a wide scale already:
> > 
> >  - in the .config
> > 
> >  - runtime, temporarily, via:
> > 
> >  echo 0 > /proc/sys/kernel/hung_task_timeout_secs
> 
> That won't address my concerns about already "breaking" (as in 
> frightening the user etc.) common error handling scenarios by default.

wow, now you again use big scary words like "breaking" and 
"frightening", with is a nice addition to the ticking timebomb 
metaphore.

Andi, sadly you are not listening AT ALL, you are repeating the same 
argument again and again that has been disputed and countered. Either 
come up with a new argument or please stop wasting everyone's time.

It is undisputable that for most users an unexplained hang is far more 
"frightening" than a hang followed by an explanation in the syslog. That 
is a _hard fact_, and if you cannot accept that it's really your 
cognitive problem not ours...

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> Now Ingo's latest unreleased version with single line messages might 
> be actually ok if he turns off the backtraces by default. 
> Unfortunately I wasn't able to find out so far if he has done that or 
> not, he always cuts away these parts of the emails.

Andi, what's this crap supposed to mean?? I of course answered your 
question:

   http://lkml.org/lkml/2007/12/3/90

below is the latest patch.

Ingo

-->
Subject: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
From: Ingo Molnar <[EMAIL PROTECTED]>

this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:

 -->
 INFO: task prctl:3042 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
 prctl D fd5e3793 0  3042   2997
f6050f38 0046 0001 fd5e3793 0009 c06d8264 c06dae80 0286
f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 0001 f605 
f7e92d00 0286 f6050f18 c0489d1a f6050f40 6605  c0133a5b
 Call Trace:
  [] schedule_timeout+0x6d/0x8b
  [] schedule_timeout_uninterruptible+0x15/0x17
  [] msleep+0x10/0x16
  [] sys_prctl+0x30/0x1e2
  [] sysenter_past_esp+0x5f/0xa5
  ===
 2 locks held by prctl/3042:
 #0:  (>s_type->i_mutex_key#5){--..}, at: [] do_fsync+0x38/0x7a
 #1:  (jbd_handle){--..}, at: [] journal_start+0xc7/0xe9
 <--

the current default timeout is 120 seconds. Such messages are printed
up to 10 times per bootup. If the system has crashed already then the
messages are not printed.

if lockdep is enabled then all held locks are printed as well.

this feature is a natural extension to the softlockup-detector (kernel
locked up without scheduling) and to the NMI watchdog (kernel locked up
with IRQs disabled).

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |   15 +-
 include/linux/debug_locks.h  |5 ++
 include/linux/sched.h|   10 
 kernel/fork.c|5 ++
 kernel/lockdep.c |   14 +
 kernel/sched.c   |4 -
 kernel/softlockup.c  |  103 ++-
 kernel/sysctl.c  |   27 +++
 8 files changed, 166 insertions(+), 17 deletions(-)

Index: linux/arch/x86/kernel/process_32.c
===
--- linux.orig/arch/x86/kernel/process_32.c
+++ linux/arch/x86/kernel/process_32.c
@@ -114,10 +114,19 @@ void default_idle(void)
smp_mb();
 
local_irq_disable();
-   if (!need_resched())
+   if (!need_resched()) {
+   ktime_t t0, t1;
+   u64 t0n, t1n;
+
+   t0 = ktime_get();
+   t0n = ktime_to_ns(t0);
safe_halt();/* enables interrupts racelessly */
-   else
-   local_irq_enable();
+   local_irq_disable();
+   t1 = ktime_get();
+   t1n = ktime_to_ns(t1);
+   sched_clock_idle_wakeup_event(t1n - t0n);
+   }
+   local_irq_enable();
current_thread_info()->status |= TS_POLLING;
} else {
/* loop is done by the caller */
Index: linux/include/linux/debug_locks.h
===
--- linux.orig/include/linux/debug_locks.h
+++ linux/include/linux/debug_locks.h
@@ -47,6 +47,7 @@ struct task_struct;
 
 #ifdef CONFIG_LOCKDEP
 extern void debug_show_all_locks(void);
+extern void __debug_show_held_locks(struct task_struct *task);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
 extern void debug_check_no_locks_held(struct task_struct *task);
@@ -55,6 +56,10 @@ static inline void debug_show_all_locks(
 {
 }
 
+static inline void __debug_show_held_locks(struct task_struct *task)
+{
+}
+
 static inline void debug_show_held_locks(struct task_struct *task)
 {
 }
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -261,12 +261,17 @@ extern void account_process_tick(struct 
 extern void update_process_times(int user);
 extern void scheduler_tick(void);
 
+extern void sched_show_task(struct task_struct *p);
+
 #ifdef CONFIG_DETECT_SOFTLOCKUP
 extern void softlockup_tick(void);
 extern void spawn_softlockup_task(void);
 extern void touch_softlockup_watchdog(void);
 extern void touch_all_softlockup_watchdogs(void);
 extern int 

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 02:55:47PM +0100, Ingo Molnar wrote:
> 
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > I would still appreciate if you could state what default value you 
> > plan to set the backtrace sysctl to in the submitted patch.
> 
> there's no "backtrace sysctl" planned for the moment. This "hung tasks" 

I hope you'll reconsider that at least before submitting 
this patch to mainline. For -mm it's probably ok.

> debugging feature can be disabled/enabled on a wide scale already:
> 
>  - in the .config
> 
>  - runtime, temporarily, via:
> 
>  echo 0 > /proc/sys/kernel/hung_task_timeout_secs

That won't address my concerns about already "breaking" (as in 
frightening the user etc.) common error handling scenarios by default.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 02:59:16PM +0100, Ingo Molnar wrote:
> Andi, is that true? If yes, why didnt Andi state this concern outright, 
> instead of pooh-pooh-ing the patch on various other grounds?

No of course not. Radoslaw is talking nonsense.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
> It's more like "lets warn about it and fix the problems when we find
> some." 

It is already known there are lots of problems. I won't repeat
them because I already wrote too much about them. Feel free
to read back in the thread.

Now if all the known problems are fixed and only some hard to know
hidden ones remain it might make sense to enable (but even then
a little dubious considering how important error handling robustness
is), but not at the current state (at least with the full default
backtraces) 

> Btw, how is this different from how the lockdep patches went
> in?

lockdep is clearly a "only enable if you're developing the kernel" 
feature (just alone because of its overhead and other problems).
So side effects etc. are expected.

Softlockup previously (before this patch) was a "can be always safely 
enabled" feature.  

Now Ingo's latest unreleased version with single
line messages might be actually ok if he turns off the backtraces
by default. Unfortunately I wasn't able to find out so far if he 
has done that or not, he always cuts away these parts of the emails.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Radoslaw Szkodzinski <[EMAIL PROTECTED]> wrote:

> On Mon, 3 Dec 2007 14:29:56 +0100
> > * Andi Kleen <[EMAIL PROTECTED]> wrote:
> > 
> > > > feedback about an impending catastrophy has been duly noted
> > > 
> > > The point was less about an impending catastrophe, but more of a 
> > > timebomb ticking until the next widely used release.
> 
> I think I know why Andi is so much against backtraces: he has some log 
> filter listening on netconsole mailing him with suspicious log 
> messages.

Andi, is that true? If yes, why didnt Andi state this concern outright, 
instead of pooh-pooh-ing the patch on various other grounds?

Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Pekka Enberg <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > > "audit thousands of callsites in 8 million lines of code first" is a
> > > nice euphemism for hiding from the blame forever. We had 10 years for it
> 
> On Dec 3, 2007 2:13 PM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Ok your approach is then to "let's warn about it and hope
> > it will go away"
> 
> It's more like "lets warn about it and fix the problems when we find 
> some." Btw, how is this different from how the lockdep patches went 
> in?

yeah, it's quite similar. (in fact this feature is expected to have a 
false positive rate lower than that of lockdep) The backtraces are 
essential as well, they help kernel developer find the bugs.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> I would still appreciate if you could state what default value you 
> plan to set the backtrace sysctl to in the submitted patch.

there's no "backtrace sysctl" planned for the moment. This "hung tasks" 
debugging feature can be disabled/enabled on a wide scale already:

 - in the .config

 - runtime, temporarily, via:

 echo 0 > /proc/sys/kernel/hung_task_timeout_secs

 - runtime, permanently, via adding:

 kernel.hung_task_timeout_secs = 0

   to /etc/sysctl.conf.

the backtrace, as other posters have stated it in this thread too, is 
very useful, so i'll not remove it from the printout.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Pekka Enberg
Hi,

On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > "audit thousands of callsites in 8 million lines of code first" is a
> > nice euphemism for hiding from the blame forever. We had 10 years for it

On Dec 3, 2007 2:13 PM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> Ok your approach is then to "let's warn about it and hope
> it will go away"

It's more like "lets warn about it and fix the problems when we find
some." Btw, how is this different from how the lockdep patches went
in?

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen

I would still appreciate if you could state what default value
you plan to set the backtrace sysctl to in the submitted patch.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread AstralStorm
On Mon, 3 Dec 2007 14:29:56 +0100
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > > feedback about an impending catastrophy has been duly noted
> > 
> > The point was less about an impending catastrophe, but more of a 
> > timebomb ticking until the next widely used release.

I think I know why Andi is so much against backtraces: he has some log
filter listening on netconsole mailing him with suspicious log messages.

Apparently, that log filter is much too stupid to discern an
OOPS or OOM from softlockup and other informative backtraces even when
given a clear hint what it is.

Or maybe he's just underestimating user's ability to read. ;)


signature.asc
Description: PGP signature


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
> negative 

I would consider it positive, but ok. If I was negative I would 
probably not care and just make always sure to disable SOFTLOCKUP
in the kernels I use.

> feedback about an impending catastrophy has been duly noted 

The point was less about an impending catastrophe, but more of a timebomb
ticking until the next widely used release.

> Can we now please move on to a 
> more productive stage?

Sure no more theory.

To what value do you want to set the backtrace sysctl by default?

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > you are over-designing it way too much - a backtrace is obviously 
> > very helpful and it must be printed by default. There's enough 
> > configurability in it already so that you can turn it off if you 
> > want.
> 
> So it will hit everybody first before they can figure out how to get 
> rid of it? That was the part I was objecting too.

you are apparently arguing in a circle. This is a debug mechanism. It 
goes the normal upstream acceptance process. I have booted this patch a 
few hundred times on a number of boxes and got not a single false 
positive so far. While this is in no way an exhaustive test, only more 
testing (in -mm, etc.) will tell us more, one way or another. Your 
negative feedback about an impending catastrophy has been duly noted 
(which vision of yours has not been shared by anyone else in this thread 
so far), and is given its due weight. Can we now please move on to a 
more productive stage?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 01:28:33PM +0100, Ingo Molnar wrote:
> 
> > On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > > no. (that's why i added the '(or a kill -9)' qualification above - if 
> > > NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
> > > should not have an interrupting effect.)
> > 
> > NFS is already interruptible with umount -f (I use that all the 
> > time...), but softlockup won't know that and throw the warning 
> > anyways.
> 
> umount -f is a spectacularly unintelligent solution (it requires the 
> user to know precisely which path to umount, etc.),

lsof | grep programname

> TASK_KILLABLE is a lot more useful.

Not sure it is better on all measures.

One problem is how to distingush again between program abort
(which only affects the program) and IO abort (which leaves
EIO marked pages in the page cache affecting other processes too) 
umount -f does this at last.

I didn't think TASK_KILLABLE has solved that cleanly (although
I admit I haven't read the latest patchkit, perhaps that has changed
over the first iteration) 

But it also probably doesn't make things much worse than they were before.

> 
> > > your syslet snide comment aside (which is quite incomprehensible - a
> > 
> > For the record I have no principle problem with syslets, just I do 
> > consider them roughly equivalent in end result to a explicit retry 
> > based AIO implementation.
> 
> which suggests you have not really understood syslets. Syslets have no 

That's possible.

> "retry" component, they just process straight through the workflow. 
> Retry based AIO has a retry component, which - as its name suggests 
> already - retries operations instead of processing through the workload 
> intelligently. Depending on how "deep" the context of an operation the 
> retries might or might not make a noticeable difference in performance, 
> but it sure is an inferior approach.

Not sure what is that less intelligent in retry (you're
refering to more CPU cycles needed?), but I admit I haven't 
thought very deeply about that.

> 
> > > retry based asynchonous IO model is clearly inferior even if it were 
> > > implemented everywhere), i do think that most if not all of these 
> > > supposedly "difficult to fix" codepaths are just on the backburner 
> > > out of lack of a clear blame vector.
> > 
> > Hmm. -ENOPARSE. Can you please clarify?
> 
> which bit was unclear to you? The retry bit i've explained above, lemme 
> know if there's any other unclarity.

The clear blame vector bit was unclear.

> > > nice euphemism for hiding from the blame forever. We had 10 years 
> > > for it
> > 
> > Ok your approach is then to "let's warn about it and hope it will go 
> > away"
> 
> s/hope//, but yes. Surprisingly, this works quite well :-) [as long as 
> the warnings are not excessively bogus, of course]

Well i consider a backtrace excessively bogus.

> > Anyways I think I could live with it a one liner warning (if it's 
> > seriously rate limited etc.) and a sysctl to enable the backtraces; 
> > off by default. Or if you prefer that record the backtrace always in a 
> > buffer and make it available somewhere in /proc or /sys or /debug. 
> > Would that work for you?
> 
> you are over-designing it way too much - a backtrace is obviously very 
> helpful and it must be printed by default. There's enough 
> configurability in it already so that you can turn it off if you want. 

So it will hit everybody first before they can figure out how
to get rid of it? That was the part I was objecting too.

If it is decided to warn about something which is not 100% clear a bug
(and I think I have established this for now -- at least you didn't 
object to many of my examples...) then the likely
false positives shouldn't be too obnoxious. Backtraces are unfortunately
obnoxious and always come at a high cost (worried user, linux reputation
as a buggy OS, mailing list bandwidth, support load etc.) and having that 
for too many false positives is a bad thing.

> (And you said SLES has softlockup turned off already so it shouldnt 
> affect you anyway.)

My objection was not really for SLES, but for general Linux kernel
quality.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > no. (that's why i added the '(or a kill -9)' qualification above - if 
> > NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
> > should not have an interrupting effect.)
> 
> NFS is already interruptible with umount -f (I use that all the 
> time...), but softlockup won't know that and throw the warning 
> anyways.

umount -f is a spectacularly unintelligent solution (it requires the 
user to know precisely which path to umount, etc.), TASK_KILLABLE is a 
lot more useful.

> > your syslet snide comment aside (which is quite incomprehensible - a
> 
> For the record I have no principle problem with syslets, just I do 
> consider them roughly equivalent in end result to a explicit retry 
> based AIO implementation.

which suggests you have not really understood syslets. Syslets have no 
"retry" component, they just process straight through the workflow. 
Retry based AIO has a retry component, which - as its name suggests 
already - retries operations instead of processing through the workload 
intelligently. Depending on how "deep" the context of an operation the 
retries might or might not make a noticeable difference in performance, 
but it sure is an inferior approach.

> > retry based asynchonous IO model is clearly inferior even if it were 
> > implemented everywhere), i do think that most if not all of these 
> > supposedly "difficult to fix" codepaths are just on the backburner 
> > out of lack of a clear blame vector.
> 
> Hmm. -ENOPARSE. Can you please clarify?

which bit was unclear to you? The retry bit i've explained above, lemme 
know if there's any other unclarity.

> > "audit thousands of callsites in 8 million lines of code first" is a 
> > nice euphemism for hiding from the blame forever. We had 10 years 
> > for it
> 
> Ok your approach is then to "let's warn about it and hope it will go 
> away"

s/hope//, but yes. Surprisingly, this works quite well :-) [as long as 
the warnings are not excessively bogus, of course]

and note that this is just a happy side-effect - the primary motivation 
is to get warnings about tasks that are uninterruptible forever. (which 
is a quite common kernel bug pattern.)

> Anyways I think I could live with it a one liner warning (if it's 
> seriously rate limited etc.) and a sysctl to enable the backtraces; 
> off by default. Or if you prefer that record the backtrace always in a 
> buffer and make it available somewhere in /proc or /sys or /debug. 
> Would that work for you?

you are over-designing it way too much - a backtrace is obviously very 
helpful and it must be printed by default. There's enough 
configurability in it already so that you can turn it off if you want. 
(And you said SLES has softlockup turned off already so it shouldnt 
affect you anyway.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> no. (that's why i added the '(or a kill -9)' qualification above - if 
> NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
> should not have an interrupting effect.)

NFS is already interruptible with umount -f (I use that all the time...), 
but softlockup won't know that and throw the warning anyways.

> your syslet snide comment aside (which is quite incomprehensible - a 

For the record I have no principle problem with syslets, just I do
consider them roughly equivalent in end result to a explicit retry based
AIO implementation.

> retry based asynchonous IO model is clearly inferior even if it were 
> implemented everywhere), i do think that most if not all of these 
> supposedly "difficult to fix" codepaths are just on the backburner out 
> of lack of a clear blame vector.

Hmm. -ENOPARSE. Can you please clarify?

> 
> "audit thousands of callsites in 8 million lines of code first" is a 
> nice euphemism for hiding from the blame forever. We had 10 years for it 

Ok your approach is then to "let's warn about it and hope
it will go away" 

> and it didnt happen. As we've seen it again and again, getting a 
> non-fatal reminder in the dmesg about the suckage is quite efficient at 

It's not universal suckage I would say, but sometimes unavoidable
conditions. Now it is better of course to have these all TASK_KILLABLE,
but then fixing that all in the kernel will probably a long term
project. I'm not arguing against that, just forcing it through
backtraces before even starting all that is probably not the right
strategy to do that.

> getting people to fix crappy solutions, and gives users and exact blame 
> point of where to start. That will create pressure to fix these 
> problems.

After impacting the user base -- many of these conditions are infrequent
enough that we will likely only see them during real production. Throwing
warnings for lots of known cases is probably ok for a -mm kernel
(where users expect things lik that), but not a "release" (be it
Linus release or any kind of end user distribution) imho.

I don't think there is a real alternative to code audit first
(and someone doing all the work of fixing all these first) 
 

> 
> > > I think you are somehow confusing two issues: this patch in no way 
> > > declares that "long waits are bad" - if the user _choses_ to wait 
> > > for
> > 
> > Throwing a backtrace is the kernel's way to declare something as bad. 
> > The only more clear ways to that I know of would be BUG or panic().
> 
> there are various levels of declarig something bad, and you are quite 
> wrong to suggest that a BUG() would be the only recourse.

I didn't write that, please reread my sentence.. 

But we seem to agree that a backtrace is something "declared bad" anyways,
which was my point.


> 
> > > way to stop_ are quite likely bad".
> > 
> > The user will just see the backtraces and think the kernel has 
> > crashed.
> 
> i've just changed the message to:
> 
>   INFO: task keventd/5 blocked for more than 120 seconds.
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

That's better, but the backtrace is still there isn't it?

Anyways I think I could live with it a one liner warning (if it's
seriously rate limited etc.) and a sysctl to enable the backtraces;
off by default. Or if you prefer that record 
the backtrace always in a buffer and make it available somewhere in /proc
or /sys or /debug. Would that work for you? 

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
> > 
> > * Andi Kleen <[EMAIL PROTECTED]> wrote:
> > 
> > > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
> > > 
> > > What should it do when the NFS server doesn't answer anymore or when 
> > > the network to the SAN RAID array located a few hundred KM away 
> > > develops some hickup?  [...]
> > 
> > maybe: if the user does a Ctrl-C (or a kill -9), the kernel should 
> > try
> 
> You mean NFS intr should be default? [...]

no. (that's why i added the '(or a kill -9)' qualification above - if 
NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
should not have an interrupting effect.)

> If you consider any of the arguments in the following paragraph "not 
> rational" please state your objection precisely. Thanks.
> 
> Consider the block case: First a lot of block IO runs over networks 
> too these days (iSCSI, drbd, nbd, SANs etc.) so the same 
> considerations as for other network file systems apply.  Networks can 
> have hickups and might take long to recover. Now implementing 
> TASK_KILLABLE in all block IO paths there properly is equivalent to 
> implementing EIOCBRETRY aio because it has to error out in near the 
> same ways in all the same places.  While I would like to see that (and 
> it would probably make syslets obsolete too ;-) it has been rejected 
> as too difficult in the past.

your syslet snide comment aside (which is quite incomprehensible - a 
retry based asynchonous IO model is clearly inferior even if it were 
implemented everywhere), i do think that most if not all of these 
supposedly "difficult to fix" codepaths are just on the backburner out 
of lack of a clear blame vector.

"audit thousands of callsites in 8 million lines of code first" is a 
nice euphemism for hiding from the blame forever. We had 10 years for it 
and it didnt happen. As we've seen it again and again, getting a 
non-fatal reminder in the dmesg about the suckage is quite efficient at 
getting people to fix crappy solutions, and gives users and exact blame 
point of where to start. That will create pressure to fix these 
problems.

> > I think you are somehow confusing two issues: this patch in no way 
> > declares that "long waits are bad" - if the user _choses_ to wait 
> > for
> 
> Throwing a backtrace is the kernel's way to declare something as bad. 
> The only more clear ways to that I know of would be BUG or panic().

there are various levels of declarig something bad, and you are quite 
wrong to suggest that a BUG() would be the only recourse.

> > way to stop_ are quite likely bad".
> 
> The user will just see the backtraces and think the kernel has 
> crashed.

i've just changed the message to:

  INFO: task keventd/5 blocked for more than 120 seconds.
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
> 
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
> > 
> > What should it do when the NFS server doesn't answer anymore or when 
> > the network to the SAN RAID array located a few hundred KM away 
> > develops some hickup?  [...]
> 
> maybe: if the user does a Ctrl-C (or a kill -9), the kernel should try 

You mean NFS intr should be default? Traditionally that was not done,
although that decision dates back to long before Linux to original SunOS. 

I was not there but I suspect it was because it is hard to distingush 
between "abort IO" and "abort program". With aborted IO you tend to end up 
with a page in page cache that is marked as IO error and will affect
other programs too.

Perhaps that can be cleanly solved -- personally I'm not sure -- but
it is likely not easy otherwise people would have done that a long
time ago.

> to honor it, instead of staying there stuck for a very long time 
> (possibly forever)?

Sure everybody hates that (it is like trying to argue against
free video games @), but fixing it properly is quite hard. 
I just think it's a bad idea to outlaw it before even attempting
to fix it.

If you consider any of the arguments in the following
paragraph "not rational" please state your objection precisely.
Thanks.

Consider the block case: First a lot of block
IO runs over networks too these days (iSCSI, drbd, nbd, SANs etc.) 
so the same considerations as for other network file systems
apply.  Networks can have hickups and might take long to recover.
Now implementing TASK_KILLABLE in all block IO paths
there properly is equivalent to implementing EIOCBRETRY aio because
it has to error out in near the same ways in all the same
places.  While I would like to see that (and it would probably make syslets
obsolete too ;-) it has been rejected as too difficult in the past.

> I think you are somehow confusing two issues: this patch in no way 
> declares that "long waits are bad" - if the user _choses_ to wait for 

Throwing a backtrace is the kernel's way to declare something as bad.
The only more clear ways to that I know of would be BUG or panic().

> way to stop_ are quite likely bad".

The user will just see the backtraces and think the kernel
has crashed.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
> 
> What should it do when the NFS server doesn't answer anymore or when 
> the network to the SAN RAID array located a few hundred KM away 
> develops some hickup?  [...]

maybe: if the user does a Ctrl-C (or a kill -9), the kernel should try 
to honor it, instead of staying there stuck for a very long time 
(possibly forever)?

I think you are somehow confusing two issues: this patch in no way 
declares that "long waits are bad" - if the user _choses_ to wait for 
the NFS server (after phoning IT quickly or whatever), he can wait an 
hour. This patch only declares that "long waits _that the user has no 
way to stop_ are quite likely bad".

Do you see the important distinction between the two cases? Please 
reconsider your position (or re-state it differently), it just makes no 
rational sense to me so far.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
> Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.

What should it do when the NFS server doesn't answer anymore or 
when the network to the SAN RAID array located a few hundred KM away develops 
some hickup?  Or just the SCSI driver decides to do lengthy error 
recovery  -- you could argue that is broken if it takes longer 
than 2 minutes, but in practice these things are hard to test
and to fix.

> Yes, that's exactly why the patch is needed - to find the bugs and fix

The way to find that would be to use source auditing, not break
perfectly fine error handling paths. Especially since this at least
traditionally hasn't been considered a bug, but a fundamental design
parameter of network/block/etc. file systems 

> CIFS and similar have to be fixed - it tends to lock the app
> using it, in unkillable state.

Actually that's not true. You can umount -f and then kill for at least
NFS and CIFS. Not sure it is true for the other network file systems
though.

You could in theory do TASK_KILLABLE for all block devices too (not 
a bad thing; I would love to have it). 

But it would be equivalent in work (has to patch all the same places with 
similar code) to Suparna's big old fs AIO retry
patchkit that never went forward because everyone was too worried
about excessive code impact. Maybe that has changed, maybe not ... 

And even then you would need to check all possible error handling
paths (scsi_error and low level drivers at least) that they all 
finish in less than two minutes.

> > > wild guesses. Only one way to get the real false positive percentage.
> > 
> > Yes let's break things first instead of looking at the implications closely.
> 
> Throwing _rare_ stack traces is not breakage. 120s task_uninterruptible

Sorry that's total bogus. Throwing a stack trace is the kernel
equivalent of sending S.O.S. and worrying the user significantly,
taxing reporting resources etc.  and in the interest of saving
everybody trouble it should only do that when it is really
sure it is truly broken. 

> in the usual case (no errors) is already broken - there are no sane
> loads that can invoke that IMO.

You are wrong on that.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Radoslaw Szkodzinski <[EMAIL PROTECTED]> wrote:

> > iirc TASK_KILLABLE fixed NFS only. While that's a good thing there 
> > are unfortunately a lot more subsystems that would need the same 
> > treatment.
> 
> Yes, that's exactly why the patch is needed - to find the bugs and fix 
> them. Otherwise you'll have problems finding some places to convert to 
> TASK_KILLABLE.
> 
> CIFS and similar have to be fixed - it tends to lock the app using it, 
> in unkillable state.

Amen. I still have to see a single rational argument against this 
debugging feature - and tons of arguments were listed in favor of it. So 
let's just try and see what happens.

> > Yes let's break things first instead of looking at the implications 
> > closely.
> 
> Throwing _rare_ stack traces is not breakage. 120s 
> task_uninterruptible in the usual case (no errors) is already broken - 
> there are no sane loads that can invoke that IMO.
> 
> A stack trace on x subsystem error is not that bad, especially as 
> these are limited to 10 per session.

we could lower that limit to 1 per bootup - if they become annoying. 
There's lots of flexibility in the code. Really, we should have done 
this 10 years ago - it would have literally saved me many days of 
debugging time combined, and i really have experience in identifying 
such bad tasks. (and it would have sped up debugging in countless number 
of instances when users were met with an uninterruptible task.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread AstralStorm
On Mon, 3 Dec 2007 10:55:01 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
> > On Mon, 3 Dec 2007 01:07:41 +0100
> > Andi Kleen <[EMAIL PROTECTED]> wrote:
> > 
> > > This patch will likely work against that by breaking error paths.
> > 
> > it won't break error paths, it will at most put a warning in the log.
> > It doesn't kill or otherwise damage the system or process.
> 
> From the user perspective a kernel randomly throwing backtraces is 
> a broken kernel. 

Throwing in my 2c:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
I wouldn't wait that long for the system to become responsive, I yanked
the power cord already.

Hm, that's already detected with sleep_uninterruptible logic.

A task that's not killable for more than 2 minutes is broken still, but
less so.

> > > > This patch is a step in the right direction there, by quite a
> > > > lot.
> > > > 
> > > > I really don't understand what your objection is to this patch...
> > > > is it that an enterprise distro can't ship with it on? (Which is
> > > > fine btw)
> > > 
> > > Any distribution aimed at end users cannot ship with it on. 
> > 
> > That's a pretty bold statement; assuming that the TASK_KILLABLE patch
> > is in, I don't see the problem.
> 
> iirc TASK_KILLABLE fixed NFS only. While that's a good thing there are
> unfortunately a lot more subsystems that would need the same treatment.

Yes, that's exactly why the patch is needed - to find the bugs and fix
them. Otherwise you'll have problems finding some places to convert to
TASK_KILLABLE.

CIFS and similar have to be fixed - it tends to lock the app
using it, in unkillable state.

> > > Also in general I have my doubts that the false positive:real bug
> > > ratio of this warning is well balanced.
> > 
> > I'll just have to disagree with you then; but both of us are making
> > wild guesses. Only one way to get the real false positive percentage.
> 
> Yes let's break things first instead of looking at the implications closely.

Throwing _rare_ stack traces is not breakage. 120s task_uninterruptible
in the usual case (no errors) is already broken - there are no sane
loads that can invoke that IMO.

A stack trace on x subsystem error is not that bad, especially as these 
are limited to 10 per session. 

Disclaimer: I am not a kernel developer, just a user.


signature.asc
Description: PGP signature


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
> On Mon, 3 Dec 2007 01:07:41 +0100
> Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > > We really need to get better diagnostics for the
> > > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to
> > > get to the scenario where we have a more or less robust measure of
> > > kernel quality (and we're not all that far off for several cases),
> > > one thing
> > 
> > One measure to kernel quality is to recover well from IO errors
> > (like network problems or broken block devices)
> 
> yes. and this patch will flag cases that don't (yet) work well

If the device/server/... takes more than 2 minutes to recover, how does this
imply the error path "does not work well" ? Or is your goal to handle all 
possible errors in less than two minutes? [That might be a worthy goal,
but is probably far from easy and likely impossible in some cases]

> > This patch will likely work against that by breaking error paths.
> 
> it won't break error paths, it will at most put a warning in the log.
> It doesn't kill or otherwise damage the system or process.

>From the user perspective a kernel randomly throwing backtraces is 
a broken kernel. 

> 
> > 
> > > This patch is a step in the right direction there, by quite a
> > > lot.
> > > 
> > > I really don't understand what your objection is to this patch...
> > > is it that an enterprise distro can't ship with it on? (Which is
> > > fine btw)
> > 
> > Any distribution aimed at end users cannot ship with it on. 
> 
> That's a pretty bold statement; assuming that the TASK_KILLABLE patch
> is in, I don't see the problem.

iirc TASK_KILLABLE fixed NFS only. While that's a good thing there are
unfortunately a lot more subsystems that would need the same treatment.

> > Also in general I have my doubts that the false positive:real bug
> > ratio of this warning is well balanced.
> 
> I'll just have to disagree with you then; but both of us are making
> wild guesses. Only one way to get the real false positive percentage.

Yes let's break things first instead of looking at the implications closely.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
 On Mon, 3 Dec 2007 01:07:41 +0100
 Andi Kleen [EMAIL PROTECTED] wrote:
 
   We really need to get better diagnostics for the
   bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to
   get to the scenario where we have a more or less robust measure of
   kernel quality (and we're not all that far off for several cases),
   one thing
  
  One measure to kernel quality is to recover well from IO errors
  (like network problems or broken block devices)
 
 yes. and this patch will flag cases that don't (yet) work well

If the device/server/... takes more than 2 minutes to recover, how does this
imply the error path does not work well ? Or is your goal to handle all 
possible errors in less than two minutes? [That might be a worthy goal,
but is probably far from easy and likely impossible in some cases]

  This patch will likely work against that by breaking error paths.
 
 it won't break error paths, it will at most put a warning in the log.
 It doesn't kill or otherwise damage the system or process.

From the user perspective a kernel randomly throwing backtraces is 
a broken kernel. 

 
  
   This patch is a step in the right direction there, by quite a
   lot.
   
   I really don't understand what your objection is to this patch...
   is it that an enterprise distro can't ship with it on? (Which is
   fine btw)
  
  Any distribution aimed at end users cannot ship with it on. 
 
 That's a pretty bold statement; assuming that the TASK_KILLABLE patch
 is in, I don't see the problem.

iirc TASK_KILLABLE fixed NFS only. While that's a good thing there are
unfortunately a lot more subsystems that would need the same treatment.

  Also in general I have my doubts that the false positive:real bug
  ratio of this warning is well balanced.
 
 I'll just have to disagree with you then; but both of us are making
 wild guesses. Only one way to get the real false positive percentage.

Yes let's break things first instead of looking at the implications closely.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
 
 What should it do when the NFS server doesn't answer anymore or when 
 the network to the SAN RAID array located a few hundred KM away 
 develops some hickup?  [...]

maybe: if the user does a Ctrl-C (or a kill -9), the kernel should try 
to honor it, instead of staying there stuck for a very long time 
(possibly forever)?

I think you are somehow confusing two issues: this patch in no way 
declares that long waits are bad - if the user _choses_ to wait for 
the NFS server (after phoning IT quickly or whatever), he can wait an 
hour. This patch only declares that long waits _that the user has no 
way to stop_ are quite likely bad.

Do you see the important distinction between the two cases? Please 
reconsider your position (or re-state it differently), it just makes no 
rational sense to me so far.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread AstralStorm
On Mon, 3 Dec 2007 10:55:01 +0100
Andi Kleen [EMAIL PROTECTED] wrote:

 On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
  On Mon, 3 Dec 2007 01:07:41 +0100
  Andi Kleen [EMAIL PROTECTED] wrote:
  
   This patch will likely work against that by breaking error paths.
  
  it won't break error paths, it will at most put a warning in the log.
  It doesn't kill or otherwise damage the system or process.
 
 From the user perspective a kernel randomly throwing backtraces is 
 a broken kernel. 

Throwing in my 2c:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
I wouldn't wait that long for the system to become responsive, I yanked
the power cord already.

Hm, that's already detected with sleep_uninterruptible logic.

A task that's not killable for more than 2 minutes is broken still, but
less so.

This patch is a step in the right direction there, by quite a
lot.

I really don't understand what your objection is to this patch...
is it that an enterprise distro can't ship with it on? (Which is
fine btw)
   
   Any distribution aimed at end users cannot ship with it on. 
  
  That's a pretty bold statement; assuming that the TASK_KILLABLE patch
  is in, I don't see the problem.
 
 iirc TASK_KILLABLE fixed NFS only. While that's a good thing there are
 unfortunately a lot more subsystems that would need the same treatment.

Yes, that's exactly why the patch is needed - to find the bugs and fix
them. Otherwise you'll have problems finding some places to convert to
TASK_KILLABLE.

CIFS and similar have to be fixed - it tends to lock the app
using it, in unkillable state.

   Also in general I have my doubts that the false positive:real bug
   ratio of this warning is well balanced.
  
  I'll just have to disagree with you then; but both of us are making
  wild guesses. Only one way to get the real false positive percentage.
 
 Yes let's break things first instead of looking at the implications closely.

Throwing _rare_ stack traces is not breakage. 120s task_uninterruptible
in the usual case (no errors) is already broken - there are no sane
loads that can invoke that IMO.

A stack trace on x subsystem error is not that bad, especially as these 
are limited to 10 per session. 

Disclaimer: I am not a kernel developer, just a user.


signature.asc
Description: PGP signature


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Radoslaw Szkodzinski [EMAIL PROTECTED] wrote:

  iirc TASK_KILLABLE fixed NFS only. While that's a good thing there 
  are unfortunately a lot more subsystems that would need the same 
  treatment.
 
 Yes, that's exactly why the patch is needed - to find the bugs and fix 
 them. Otherwise you'll have problems finding some places to convert to 
 TASK_KILLABLE.
 
 CIFS and similar have to be fixed - it tends to lock the app using it, 
 in unkillable state.

Amen. I still have to see a single rational argument against this 
debugging feature - and tons of arguments were listed in favor of it. So 
let's just try and see what happens.

  Yes let's break things first instead of looking at the implications 
  closely.
 
 Throwing _rare_ stack traces is not breakage. 120s 
 task_uninterruptible in the usual case (no errors) is already broken - 
 there are no sane loads that can invoke that IMO.
 
 A stack trace on x subsystem error is not that bad, especially as 
 these are limited to 10 per session.

we could lower that limit to 1 per bootup - if they become annoying. 
There's lots of flexibility in the code. Really, we should have done 
this 10 years ago - it would have literally saved me many days of 
debugging time combined, and i really have experience in identifying 
such bad tasks. (and it would have sped up debugging in countless number 
of instances when users were met with an uninterruptible task.)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
 Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.

What should it do when the NFS server doesn't answer anymore or 
when the network to the SAN RAID array located a few hundred KM away develops 
some hickup?  Or just the SCSI driver decides to do lengthy error 
recovery  -- you could argue that is broken if it takes longer 
than 2 minutes, but in practice these things are hard to test
and to fix.

 Yes, that's exactly why the patch is needed - to find the bugs and fix

The way to find that would be to use source auditing, not break
perfectly fine error handling paths. Especially since this at least
traditionally hasn't been considered a bug, but a fundamental design
parameter of network/block/etc. file systems 

 CIFS and similar have to be fixed - it tends to lock the app
 using it, in unkillable state.

Actually that's not true. You can umount -f and then kill for at least
NFS and CIFS. Not sure it is true for the other network file systems
though.

You could in theory do TASK_KILLABLE for all block devices too (not 
a bad thing; I would love to have it). 

But it would be equivalent in work (has to patch all the same places with 
similar code) to Suparna's big old fs AIO retry
patchkit that never went forward because everyone was too worried
about excessive code impact. Maybe that has changed, maybe not ... 

And even then you would need to check all possible error handling
paths (scsi_error and low level drivers at least) that they all 
finish in less than two minutes.

   wild guesses. Only one way to get the real false positive percentage.
  
  Yes let's break things first instead of looking at the implications closely.
 
 Throwing _rare_ stack traces is not breakage. 120s task_uninterruptible

Sorry that's total bogus. Throwing a stack trace is the kernel
equivalent of sending S.O.S. and worrying the user significantly,
taxing reporting resources etc.  and in the interest of saving
everybody trouble it should only do that when it is really
sure it is truly broken. 

 in the usual case (no errors) is already broken - there are no sane
 loads that can invoke that IMO.

You are wrong on that.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
 
 * Andi Kleen [EMAIL PROTECTED] wrote:
 
   Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
  
  What should it do when the NFS server doesn't answer anymore or when 
  the network to the SAN RAID array located a few hundred KM away 
  develops some hickup?  [...]
 
 maybe: if the user does a Ctrl-C (or a kill -9), the kernel should try 

You mean NFS intr should be default? Traditionally that was not done,
although that decision dates back to long before Linux to original SunOS. 

I was not there but I suspect it was because it is hard to distingush 
between abort IO and abort program. With aborted IO you tend to end up 
with a page in page cache that is marked as IO error and will affect
other programs too.

Perhaps that can be cleanly solved -- personally I'm not sure -- but
it is likely not easy otherwise people would have done that a long
time ago.

 to honor it, instead of staying there stuck for a very long time 
 (possibly forever)?

Sure everybody hates that (it is like trying to argue against
free video games @), but fixing it properly is quite hard. 
I just think it's a bad idea to outlaw it before even attempting
to fix it.

If you consider any of the arguments in the following
paragraph not rational please state your objection precisely.
Thanks.

Consider the block case: First a lot of block
IO runs over networks too these days (iSCSI, drbd, nbd, SANs etc.) 
so the same considerations as for other network file systems
apply.  Networks can have hickups and might take long to recover.
Now implementing TASK_KILLABLE in all block IO paths
there properly is equivalent to implementing EIOCBRETRY aio because
it has to error out in near the same ways in all the same
places.  While I would like to see that (and it would probably make syslets
obsolete too ;-) it has been rejected as too difficult in the past.

 I think you are somehow confusing two issues: this patch in no way 
 declares that long waits are bad - if the user _choses_ to wait for 

Throwing a backtrace is the kernel's way to declare something as bad.
The only more clear ways to that I know of would be BUG or panic().

 way to stop_ are quite likely bad.

The user will just see the backtraces and think the kernel
has crashed.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
  
  * Andi Kleen [EMAIL PROTECTED] wrote:
  
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
   
   What should it do when the NFS server doesn't answer anymore or when 
   the network to the SAN RAID array located a few hundred KM away 
   develops some hickup?  [...]
  
  maybe: if the user does a Ctrl-C (or a kill -9), the kernel should 
  try
 
 You mean NFS intr should be default? [...]

no. (that's why i added the '(or a kill -9)' qualification above - if 
NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
should not have an interrupting effect.)

 If you consider any of the arguments in the following paragraph not 
 rational please state your objection precisely. Thanks.
 
 Consider the block case: First a lot of block IO runs over networks 
 too these days (iSCSI, drbd, nbd, SANs etc.) so the same 
 considerations as for other network file systems apply.  Networks can 
 have hickups and might take long to recover. Now implementing 
 TASK_KILLABLE in all block IO paths there properly is equivalent to 
 implementing EIOCBRETRY aio because it has to error out in near the 
 same ways in all the same places.  While I would like to see that (and 
 it would probably make syslets obsolete too ;-) it has been rejected 
 as too difficult in the past.

your syslet snide comment aside (which is quite incomprehensible - a 
retry based asynchonous IO model is clearly inferior even if it were 
implemented everywhere), i do think that most if not all of these 
supposedly difficult to fix codepaths are just on the backburner out 
of lack of a clear blame vector.

audit thousands of callsites in 8 million lines of code first is a 
nice euphemism for hiding from the blame forever. We had 10 years for it 
and it didnt happen. As we've seen it again and again, getting a 
non-fatal reminder in the dmesg about the suckage is quite efficient at 
getting people to fix crappy solutions, and gives users and exact blame 
point of where to start. That will create pressure to fix these 
problems.

  I think you are somehow confusing two issues: this patch in no way 
  declares that long waits are bad - if the user _choses_ to wait 
  for
 
 Throwing a backtrace is the kernel's way to declare something as bad. 
 The only more clear ways to that I know of would be BUG or panic().

there are various levels of declarig something bad, and you are quite 
wrong to suggest that a BUG() would be the only recourse.

  way to stop_ are quite likely bad.
 
 The user will just see the backtraces and think the kernel has 
 crashed.

i've just changed the message to:

  INFO: task keventd/5 blocked for more than 120 seconds.
  echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
  no. (that's why i added the '(or a kill -9)' qualification above - if 
  NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
  should not have an interrupting effect.)
 
 NFS is already interruptible with umount -f (I use that all the 
 time...), but softlockup won't know that and throw the warning 
 anyways.

umount -f is a spectacularly unintelligent solution (it requires the 
user to know precisely which path to umount, etc.), TASK_KILLABLE is a 
lot more useful.

  your syslet snide comment aside (which is quite incomprehensible - a
 
 For the record I have no principle problem with syslets, just I do 
 consider them roughly equivalent in end result to a explicit retry 
 based AIO implementation.

which suggests you have not really understood syslets. Syslets have no 
retry component, they just process straight through the workflow. 
Retry based AIO has a retry component, which - as its name suggests 
already - retries operations instead of processing through the workload 
intelligently. Depending on how deep the context of an operation the 
retries might or might not make a noticeable difference in performance, 
but it sure is an inferior approach.

  retry based asynchonous IO model is clearly inferior even if it were 
  implemented everywhere), i do think that most if not all of these 
  supposedly difficult to fix codepaths are just on the backburner 
  out of lack of a clear blame vector.
 
 Hmm. -ENOPARSE. Can you please clarify?

which bit was unclear to you? The retry bit i've explained above, lemme 
know if there's any other unclarity.

  audit thousands of callsites in 8 million lines of code first is a 
  nice euphemism for hiding from the blame forever. We had 10 years 
  for it
 
 Ok your approach is then to let's warn about it and hope it will go 
 away

s/hope//, but yes. Surprisingly, this works quite well :-) [as long as 
the warnings are not excessively bogus, of course]

and note that this is just a happy side-effect - the primary motivation 
is to get warnings about tasks that are uninterruptible forever. (which 
is a quite common kernel bug pattern.)

 Anyways I think I could live with it a one liner warning (if it's 
 seriously rate limited etc.) and a sysctl to enable the backtraces; 
 off by default. Or if you prefer that record the backtrace always in a 
 buffer and make it available somewhere in /proc or /sys or /debug. 
 Would that work for you?

you are over-designing it way too much - a backtrace is obviously very 
helpful and it must be printed by default. There's enough 
configurability in it already so that you can turn it off if you want. 
(And you said SLES has softlockup turned off already so it shouldnt 
affect you anyway.)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
 no. (that's why i added the '(or a kill -9)' qualification above - if 
 NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
 should not have an interrupting effect.)

NFS is already interruptible with umount -f (I use that all the time...), 
but softlockup won't know that and throw the warning anyways.

 your syslet snide comment aside (which is quite incomprehensible - a 

For the record I have no principle problem with syslets, just I do
consider them roughly equivalent in end result to a explicit retry based
AIO implementation.

 retry based asynchonous IO model is clearly inferior even if it were 
 implemented everywhere), i do think that most if not all of these 
 supposedly difficult to fix codepaths are just on the backburner out 
 of lack of a clear blame vector.

Hmm. -ENOPARSE. Can you please clarify?

 
 audit thousands of callsites in 8 million lines of code first is a 
 nice euphemism for hiding from the blame forever. We had 10 years for it 

Ok your approach is then to let's warn about it and hope
it will go away 

 and it didnt happen. As we've seen it again and again, getting a 
 non-fatal reminder in the dmesg about the suckage is quite efficient at 

It's not universal suckage I would say, but sometimes unavoidable
conditions. Now it is better of course to have these all TASK_KILLABLE,
but then fixing that all in the kernel will probably a long term
project. I'm not arguing against that, just forcing it through
backtraces before even starting all that is probably not the right
strategy to do that.

 getting people to fix crappy solutions, and gives users and exact blame 
 point of where to start. That will create pressure to fix these 
 problems.

After impacting the user base -- many of these conditions are infrequent
enough that we will likely only see them during real production. Throwing
warnings for lots of known cases is probably ok for a -mm kernel
(where users expect things lik that), but not a release (be it
Linus release or any kind of end user distribution) imho.

I don't think there is a real alternative to code audit first
(and someone doing all the work of fixing all these first) 
 

 
   I think you are somehow confusing two issues: this patch in no way 
   declares that long waits are bad - if the user _choses_ to wait 
   for
  
  Throwing a backtrace is the kernel's way to declare something as bad. 
  The only more clear ways to that I know of would be BUG or panic().
 
 there are various levels of declarig something bad, and you are quite 
 wrong to suggest that a BUG() would be the only recourse.

I didn't write that, please reread my sentence.. 

But we seem to agree that a backtrace is something declared bad anyways,
which was my point.


 
   way to stop_ are quite likely bad.
  
  The user will just see the backtraces and think the kernel has 
  crashed.
 
 i've just changed the message to:
 
   INFO: task keventd/5 blocked for more than 120 seconds.
   echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message

That's better, but the backtrace is still there isn't it?

Anyways I think I could live with it a one liner warning (if it's
seriously rate limited etc.) and a sysctl to enable the backtraces;
off by default. Or if you prefer that record 
the backtrace always in a buffer and make it available somewhere in /proc
or /sys or /debug. Would that work for you? 

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 01:28:33PM +0100, Ingo Molnar wrote:
 
  On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
   no. (that's why i added the '(or a kill -9)' qualification above - if 
   NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
   should not have an interrupting effect.)
  
  NFS is already interruptible with umount -f (I use that all the 
  time...), but softlockup won't know that and throw the warning 
  anyways.
 
 umount -f is a spectacularly unintelligent solution (it requires the 
 user to know precisely which path to umount, etc.),

lsof | grep programname

 TASK_KILLABLE is a lot more useful.

Not sure it is better on all measures.

One problem is how to distingush again between program abort
(which only affects the program) and IO abort (which leaves
EIO marked pages in the page cache affecting other processes too) 
umount -f does this at last.

I didn't think TASK_KILLABLE has solved that cleanly (although
I admit I haven't read the latest patchkit, perhaps that has changed
over the first iteration) 

But it also probably doesn't make things much worse than they were before.

 
   your syslet snide comment aside (which is quite incomprehensible - a
  
  For the record I have no principle problem with syslets, just I do 
  consider them roughly equivalent in end result to a explicit retry 
  based AIO implementation.
 
 which suggests you have not really understood syslets. Syslets have no 

That's possible.

 retry component, they just process straight through the workflow. 
 Retry based AIO has a retry component, which - as its name suggests 
 already - retries operations instead of processing through the workload 
 intelligently. Depending on how deep the context of an operation the 
 retries might or might not make a noticeable difference in performance, 
 but it sure is an inferior approach.

Not sure what is that less intelligent in retry (you're
refering to more CPU cycles needed?), but I admit I haven't 
thought very deeply about that.

 
   retry based asynchonous IO model is clearly inferior even if it were 
   implemented everywhere), i do think that most if not all of these 
   supposedly difficult to fix codepaths are just on the backburner 
   out of lack of a clear blame vector.
  
  Hmm. -ENOPARSE. Can you please clarify?
 
 which bit was unclear to you? The retry bit i've explained above, lemme 
 know if there's any other unclarity.

The clear blame vector bit was unclear.

   nice euphemism for hiding from the blame forever. We had 10 years 
   for it
  
  Ok your approach is then to let's warn about it and hope it will go 
  away
 
 s/hope//, but yes. Surprisingly, this works quite well :-) [as long as 
 the warnings are not excessively bogus, of course]

Well i consider a backtrace excessively bogus.

  Anyways I think I could live with it a one liner warning (if it's 
  seriously rate limited etc.) and a sysctl to enable the backtraces; 
  off by default. Or if you prefer that record the backtrace always in a 
  buffer and make it available somewhere in /proc or /sys or /debug. 
  Would that work for you?
 
 you are over-designing it way too much - a backtrace is obviously very 
 helpful and it must be printed by default. There's enough 
 configurability in it already so that you can turn it off if you want. 

So it will hit everybody first before they can figure out how
to get rid of it? That was the part I was objecting too.

If it is decided to warn about something which is not 100% clear a bug
(and I think I have established this for now -- at least you didn't 
object to many of my examples...) then the likely
false positives shouldn't be too obnoxious. Backtraces are unfortunately
obnoxious and always come at a high cost (worried user, linux reputation
as a buggy OS, mailing list bandwidth, support load etc.) and having that 
for too many false positives is a bad thing.

 (And you said SLES has softlockup turned off already so it shouldnt 
 affect you anyway.)

My objection was not really for SLES, but for general Linux kernel
quality.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  you are over-designing it way too much - a backtrace is obviously 
  very helpful and it must be printed by default. There's enough 
  configurability in it already so that you can turn it off if you 
  want.
 
 So it will hit everybody first before they can figure out how to get 
 rid of it? That was the part I was objecting too.

you are apparently arguing in a circle. This is a debug mechanism. It 
goes the normal upstream acceptance process. I have booted this patch a 
few hundred times on a number of boxes and got not a single false 
positive so far. While this is in no way an exhaustive test, only more 
testing (in -mm, etc.) will tell us more, one way or another. Your 
negative feedback about an impending catastrophy has been duly noted 
(which vision of yours has not been shared by anyone else in this thread 
so far), and is given its due weight. Can we now please move on to a 
more productive stage?

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
 negative 

I would consider it positive, but ok. If I was negative I would 
probably not care and just make always sure to disable SOFTLOCKUP
in the kernels I use.

 feedback about an impending catastrophy has been duly noted 

The point was less about an impending catastrophe, but more of a timebomb
ticking until the next widely used release.

 Can we now please move on to a 
 more productive stage?

Sure no more theory.

To what value do you want to set the backtrace sysctl by default?

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Radoslaw Szkodzinski [EMAIL PROTECTED] wrote:

 On Mon, 3 Dec 2007 14:29:56 +0100
  * Andi Kleen [EMAIL PROTECTED] wrote:
  
feedback about an impending catastrophy has been duly noted
   
   The point was less about an impending catastrophe, but more of a 
   timebomb ticking until the next widely used release.
 
 I think I know why Andi is so much against backtraces: he has some log 
 filter listening on netconsole mailing him with suspicious log 
 messages.

Andi, is that true? If yes, why didnt Andi state this concern outright, 
instead of pooh-pooh-ing the patch on various other grounds?

Ingo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 I would still appreciate if you could state what default value you 
 plan to set the backtrace sysctl to in the submitted patch.

there's no backtrace sysctl planned for the moment. This hung tasks 
debugging feature can be disabled/enabled on a wide scale already:

 - in the .config

 - runtime, temporarily, via:

 echo 0  /proc/sys/kernel/hung_task_timeout_secs

 - runtime, permanently, via adding:

 kernel.hung_task_timeout_secs = 0

   to /etc/sysctl.conf.

the backtrace, as other posters have stated it in this thread too, is 
very useful, so i'll not remove it from the printout.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Pekka Enberg
Hi,

On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
  audit thousands of callsites in 8 million lines of code first is a
  nice euphemism for hiding from the blame forever. We had 10 years for it

On Dec 3, 2007 2:13 PM, Andi Kleen [EMAIL PROTECTED] wrote:
 Ok your approach is then to let's warn about it and hope
 it will go away

It's more like lets warn about it and fix the problems when we find
some. Btw, how is this different from how the lockdep patches went
in?

Pekka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread AstralStorm
On Mon, 3 Dec 2007 14:29:56 +0100
 * Andi Kleen [EMAIL PROTECTED] wrote:
 
   feedback about an impending catastrophy has been duly noted
  
  The point was less about an impending catastrophe, but more of a 
  timebomb ticking until the next widely used release.

I think I know why Andi is so much against backtraces: he has some log
filter listening on netconsole mailing him with suspicious log messages.

Apparently, that log filter is much too stupid to discern an
OOPS or OOM from softlockup and other informative backtraces even when
given a clear hint what it is.

Or maybe he's just underestimating user's ability to read. ;)


signature.asc
Description: PGP signature


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen

I would still appreciate if you could state what default value
you plan to set the backtrace sysctl to in the submitted patch.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Pekka Enberg [EMAIL PROTECTED] wrote:

 Hi,
 
 On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
   audit thousands of callsites in 8 million lines of code first is a
   nice euphemism for hiding from the blame forever. We had 10 years for it
 
 On Dec 3, 2007 2:13 PM, Andi Kleen [EMAIL PROTECTED] wrote:
  Ok your approach is then to let's warn about it and hope
  it will go away
 
 It's more like lets warn about it and fix the problems when we find 
 some. Btw, how is this different from how the lockdep patches went 
 in?

yeah, it's quite similar. (in fact this feature is expected to have a 
false positive rate lower than that of lockdep) The backtraces are 
essential as well, they help kernel developer find the bugs.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  debugging feature can be disabled/enabled on a wide scale already:
  
   - in the .config
  
   - runtime, temporarily, via:
  
   echo 0  /proc/sys/kernel/hung_task_timeout_secs
 
 That won't address my concerns about already breaking (as in 
 frightening the user etc.) common error handling scenarios by default.

wow, now you again use big scary words like breaking and 
frightening, with is a nice addition to the ticking timebomb 
metaphore.

Andi, sadly you are not listening AT ALL, you are repeating the same 
argument again and again that has been disputed and countered. Either 
come up with a new argument or please stop wasting everyone's time.

It is undisputable that for most users an unexplained hang is far more 
frightening than a hang followed by an explanation in the syslog. That 
is a _hard fact_, and if you cannot accept that it's really your 
cognitive problem not ours...

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 02:55:47PM +0100, Ingo Molnar wrote:
 
 * Andi Kleen [EMAIL PROTECTED] wrote:
 
  I would still appreciate if you could state what default value you 
  plan to set the backtrace sysctl to in the submitted patch.
 
 there's no backtrace sysctl planned for the moment. This hung tasks 

I hope you'll reconsider that at least before submitting 
this patch to mainline. For -mm it's probably ok.

 debugging feature can be disabled/enabled on a wide scale already:
 
  - in the .config
 
  - runtime, temporarily, via:
 
  echo 0  /proc/sys/kernel/hung_task_timeout_secs

That won't address my concerns about already breaking (as in 
frightening the user etc.) common error handling scenarios by default.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
 It's more like lets warn about it and fix the problems when we find
 some. 

It is already known there are lots of problems. I won't repeat
them because I already wrote too much about them. Feel free
to read back in the thread.

Now if all the known problems are fixed and only some hard to know
hidden ones remain it might make sense to enable (but even then
a little dubious considering how important error handling robustness
is), but not at the current state (at least with the full default
backtraces) 

 Btw, how is this different from how the lockdep patches went
 in?

lockdep is clearly a only enable if you're developing the kernel 
feature (just alone because of its overhead and other problems).
So side effects etc. are expected.

Softlockup previously (before this patch) was a can be always safely 
enabled feature.  

Now Ingo's latest unreleased version with single
line messages might be actually ok if he turns off the backtraces
by default. Unfortunately I wasn't able to find out so far if he 
has done that or not, he always cuts away these parts of the emails.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 Now Ingo's latest unreleased version with single line messages might 
 be actually ok if he turns off the backtraces by default. 
 Unfortunately I wasn't able to find out so far if he has done that or 
 not, he always cuts away these parts of the emails.

Andi, what's this crap supposed to mean?? I of course answered your 
question:

   http://lkml.org/lkml/2007/12/3/90

below is the latest patch.

Ingo

--
Subject: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
From: Ingo Molnar [EMAIL PROTECTED]

this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:

 --
 INFO: task prctl:3042 blocked for more than 120 seconds.
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message
 prctl D fd5e3793 0  3042   2997
f6050f38 0046 0001 fd5e3793 0009 c06d8264 c06dae80 0286
f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 0001 f605 
f7e92d00 0286 f6050f18 c0489d1a f6050f40 6605  c0133a5b
 Call Trace:
  [c04883a5] schedule_timeout+0x6d/0x8b
  [c04883d8] schedule_timeout_uninterruptible+0x15/0x17
  [c0133a76] msleep+0x10/0x16
  [c0138974] sys_prctl+0x30/0x1e2
  [c0104c52] sysenter_past_esp+0x5f/0xa5
  ===
 2 locks held by prctl/3042:
 #0:  (sb-s_type-i_mutex_key#5){--..}, at: [c0197d11] do_fsync+0x38/0x7a
 #1:  (jbd_handle){--..}, at: [c01ca3d2] journal_start+0xc7/0xe9
 --

the current default timeout is 120 seconds. Such messages are printed
up to 10 times per bootup. If the system has crashed already then the
messages are not printed.

if lockdep is enabled then all held locks are printed as well.

this feature is a natural extension to the softlockup-detector (kernel
locked up without scheduling) and to the NMI watchdog (kernel locked up
with IRQs disabled).

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Signed-off-by: Arjan van de Ven [EMAIL PROTECTED]
Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]
---
 arch/x86/kernel/process_32.c |   15 +-
 include/linux/debug_locks.h  |5 ++
 include/linux/sched.h|   10 
 kernel/fork.c|5 ++
 kernel/lockdep.c |   14 +
 kernel/sched.c   |4 -
 kernel/softlockup.c  |  103 ++-
 kernel/sysctl.c  |   27 +++
 8 files changed, 166 insertions(+), 17 deletions(-)

Index: linux/arch/x86/kernel/process_32.c
===
--- linux.orig/arch/x86/kernel/process_32.c
+++ linux/arch/x86/kernel/process_32.c
@@ -114,10 +114,19 @@ void default_idle(void)
smp_mb();
 
local_irq_disable();
-   if (!need_resched())
+   if (!need_resched()) {
+   ktime_t t0, t1;
+   u64 t0n, t1n;
+
+   t0 = ktime_get();
+   t0n = ktime_to_ns(t0);
safe_halt();/* enables interrupts racelessly */
-   else
-   local_irq_enable();
+   local_irq_disable();
+   t1 = ktime_get();
+   t1n = ktime_to_ns(t1);
+   sched_clock_idle_wakeup_event(t1n - t0n);
+   }
+   local_irq_enable();
current_thread_info()-status |= TS_POLLING;
} else {
/* loop is done by the caller */
Index: linux/include/linux/debug_locks.h
===
--- linux.orig/include/linux/debug_locks.h
+++ linux/include/linux/debug_locks.h
@@ -47,6 +47,7 @@ struct task_struct;
 
 #ifdef CONFIG_LOCKDEP
 extern void debug_show_all_locks(void);
+extern void __debug_show_held_locks(struct task_struct *task);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
 extern void debug_check_no_locks_held(struct task_struct *task);
@@ -55,6 +56,10 @@ static inline void debug_show_all_locks(
 {
 }
 
+static inline void __debug_show_held_locks(struct task_struct *task)
+{
+}
+
 static inline void debug_show_held_locks(struct task_struct *task)
 {
 }
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -261,12 +261,17 @@ extern void account_process_tick(struct 
 extern void update_process_times(int user);
 extern void scheduler_tick(void);
 
+extern void sched_show_task(struct task_struct *p);
+
 #ifdef CONFIG_DETECT_SOFTLOCKUP
 extern void softlockup_tick(void);
 extern void spawn_softlockup_task(void);
 extern void touch_softlockup_watchdog(void);
 extern void 

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
On Mon, Dec 03, 2007 at 02:59:16PM +0100, Ingo Molnar wrote:
 Andi, is that true? If yes, why didnt Andi state this concern outright, 
 instead of pooh-pooh-ing the patch on various other grounds?

No of course not. Radoslaw is talking nonsense.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Arjan van de Ven
On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen [EMAIL PROTECTED] wrote:

  Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
  broken.
 
 What should it do when the NFS server doesn't answer anymore or 
 when the network to the SAN RAID array located a few hundred KM away
 develops some hickup?  Or just the SCSI driver decides to do lengthy
 error recovery  -- you could argue that is broken if it takes longer 
 than 2 minutes, but in practice these things are hard to test
 and to fix.
 

the scsi layer will have the IO totally aborted within that time anyway;
the retry timeout for disks is 30 seconds after all.




-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andi Kleen
 the scsi layer will have the IO totally aborted within that time anyway;
 the retry timeout for disks is 30 seconds after all.

There are blocking waits who wait for multiple IOs. 

Also i think the SCSI driver can tune this anyways and I suspect
iSCSI and friends increase it (?)

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ray Lee
On Dec 3, 2007 6:17 AM, Andi Kleen [EMAIL PROTECTED] wrote:
 That won't address my concerns about already breaking (as in
 frightening the user etc.) common error handling scenarios by default.

Andi, may I respectfully submit that you're not understanding real users here?

Real users either:
  -  Never look at their syslog
  -  Or look, and aren't terribly scared by what's in there.

Perhaps I'm wrong, and you can point to a message on lkml where
someone has posted a backtrace with fear in their voice.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Andrew Morton
On Mon, 3 Dec 2007 15:19:25 +0100
Ingo Molnar [EMAIL PROTECTED] wrote:

 this patch extends the soft-lockup detector to automatically
 detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
 printed the following way:
 
  --
  INFO: task prctl:3042 blocked for more than 120 seconds.
  echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message
  prctl D fd5e3793 0  3042   2997
 f6050f38 0046 0001 fd5e3793 0009 c06d8264 c06dae80 
 0286
 f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 0001 f605 
 
 f7e92d00 0286 f6050f18 c0489d1a f6050f40 6605  
 c0133a5b
  Call Trace:
   [c04883a5] schedule_timeout+0x6d/0x8b
   [c04883d8] schedule_timeout_uninterruptible+0x15/0x17
   [c0133a76] msleep+0x10/0x16
   [c0138974] sys_prctl+0x30/0x1e2
   [c0104c52] sysenter_past_esp+0x5f/0xa5
   ===
  2 locks held by prctl/3042:
  #0:  (sb-s_type-i_mutex_key#5){--..}, at: [c0197d11] do_fsync+0x38/0x7a
  #1:  (jbd_handle){--..}, at: [c01ca3d2] journal_start+0xc7/0xe9
  --
 
 the current default timeout is 120 seconds. Such messages are printed
 up to 10 times per bootup. If the system has crashed already then the
 messages are not printed.
 
 if lockdep is enabled then all held locks are printed as well.
 
 this feature is a natural extension to the softlockup-detector (kernel
 locked up without scheduling) and to the NMI watchdog (kernel locked up
 with IRQs disabled).

This feature will save one full reporter-developer round-trip during
investigation of a significant number of bug reports.

It might be more practical if it were to dump the traces for _all_
D-state processes when it fires - basically an auto-triggered sysrq-W.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki
On Monday, 3 of December 2007, Andrew Morton wrote:
 On Mon, 3 Dec 2007 15:19:25 +0100
 Ingo Molnar [EMAIL PROTECTED] wrote:
 
  this patch extends the soft-lockup detector to automatically
  detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
  printed the following way:
  
   --
   INFO: task prctl:3042 blocked for more than 120 seconds.
   echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message
   prctl D fd5e3793 0  3042   2997
  f6050f38 0046 0001 fd5e3793 0009 c06d8264 c06dae80 
  0286
  f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 0001 f605 
  
  f7e92d00 0286 f6050f18 c0489d1a f6050f40 6605  
  c0133a5b
   Call Trace:
[c04883a5] schedule_timeout+0x6d/0x8b
[c04883d8] schedule_timeout_uninterruptible+0x15/0x17
[c0133a76] msleep+0x10/0x16
[c0138974] sys_prctl+0x30/0x1e2
[c0104c52] sysenter_past_esp+0x5f/0xa5
===
   2 locks held by prctl/3042:
   #0:  (sb-s_type-i_mutex_key#5){--..}, at: [c0197d11] 
  do_fsync+0x38/0x7a
   #1:  (jbd_handle){--..}, at: [c01ca3d2] journal_start+0xc7/0xe9
   --
  
  the current default timeout is 120 seconds. Such messages are printed
  up to 10 times per bootup. If the system has crashed already then the
  messages are not printed.
  
  if lockdep is enabled then all held locks are printed as well.
  
  this feature is a natural extension to the softlockup-detector (kernel
  locked up without scheduling) and to the NMI watchdog (kernel locked up
  with IRQs disabled).
 
 This feature will save one full reporter-developer round-trip during
 investigation of a significant number of bug reports.
 
 It might be more practical if it were to dump the traces for _all_
 D-state processes when it fires - basically an auto-triggered sysrq-W.

Er, it won't play well if that happen when tasks are frozen for suspend.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Rafael J. Wysocki [EMAIL PROTECTED] wrote:

  This feature will save one full reporter-developer round-trip during 
  investigation of a significant number of bug reports.
  
  It might be more practical if it were to dump the traces for _all_ 
  D-state processes when it fires - basically an auto-triggered 
  sysrq-W.
 
 Er, it won't play well if that happen when tasks are frozen for 
 suspend.

right now any suspend attempt times out after 20 seconds:

  $ grep TIMEOUT kernel/power/process.c
  #define TIMEOUT (20 * HZ)
  end_time = jiffies + TIMEOUT;

which should be well before the 120 seconds timeout that the detector 
uses. But indeed you are right in that the refrigerator() works via 
TASK_UNINTERRUPTIBLE too. I've updated the patch to exclude PF_FROZEN - 
attached below. That should solve this particular issue, even if the 
timeout increased to above 20 secs, right?

Ingo

--
Subject: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
From: Ingo Molnar [EMAIL PROTECTED]

this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:

 --
 INFO: task prctl:3042 blocked for more than 120 seconds.
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message
 prctl D fd5e3793 0  3042   2997
f6050f38 0046 0001 fd5e3793 0009 c06d8264 c06dae80 0286
f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 0001 f605 
f7e92d00 0286 f6050f18 c0489d1a f6050f40 6605  c0133a5b
 Call Trace:
  [c04883a5] schedule_timeout+0x6d/0x8b
  [c04883d8] schedule_timeout_uninterruptible+0x15/0x17
  [c0133a76] msleep+0x10/0x16
  [c0138974] sys_prctl+0x30/0x1e2
  [c0104c52] sysenter_past_esp+0x5f/0xa5
  ===
 2 locks held by prctl/3042:
 #0:  (sb-s_type-i_mutex_key#5){--..}, at: [c0197d11] do_fsync+0x38/0x7a
 #1:  (jbd_handle){--..}, at: [c01ca3d2] journal_start+0xc7/0xe9
 --

the current default timeout is 120 seconds. Such messages are printed
up to 10 times per bootup. If the system has crashed already then the
messages are not printed.

if lockdep is enabled then all held locks are printed as well.

this feature is a natural extension to the softlockup-detector (kernel
locked up without scheduling) and to the NMI watchdog (kernel locked up
with IRQs disabled).

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Signed-off-by: Arjan van de Ven [EMAIL PROTECTED]
Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]
---
 arch/x86/kernel/process_32.c |   15 --
 include/linux/debug_locks.h  |5 ++
 include/linux/sched.h|   10 
 kernel/fork.c|5 ++
 kernel/lockdep.c |   14 -
 kernel/sched.c   |4 -
 kernel/softlockup.c  |  106 ++-
 kernel/sysctl.c  |   27 ++
 8 files changed, 169 insertions(+), 17 deletions(-)

Index: linux/arch/x86/kernel/process_32.c
===
--- linux.orig/arch/x86/kernel/process_32.c
+++ linux/arch/x86/kernel/process_32.c
@@ -114,10 +114,19 @@ void default_idle(void)
smp_mb();
 
local_irq_disable();
-   if (!need_resched())
+   if (!need_resched()) {
+   ktime_t t0, t1;
+   u64 t0n, t1n;
+
+   t0 = ktime_get();
+   t0n = ktime_to_ns(t0);
safe_halt();/* enables interrupts racelessly */
-   else
-   local_irq_enable();
+   local_irq_disable();
+   t1 = ktime_get();
+   t1n = ktime_to_ns(t1);
+   sched_clock_idle_wakeup_event(t1n - t0n);
+   }
+   local_irq_enable();
current_thread_info()-status |= TS_POLLING;
} else {
/* loop is done by the caller */
Index: linux/include/linux/debug_locks.h
===
--- linux.orig/include/linux/debug_locks.h
+++ linux/include/linux/debug_locks.h
@@ -47,6 +47,7 @@ struct task_struct;
 
 #ifdef CONFIG_LOCKDEP
 extern void debug_show_all_locks(void);
+extern void __debug_show_held_locks(struct task_struct *task);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
 extern void debug_check_no_locks_held(struct task_struct *task);
@@ -55,6 +56,10 @@ static inline void debug_show_all_locks(
 {
 }
 
+static inline void __debug_show_held_locks(struct task_struct *task)
+{
+}
+
 static inline void debug_show_held_locks(struct task_struct *task)
 {
 }
Index: linux/include/linux/sched.h
===
--- 

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki
On Monday, 3 of December 2007, Ingo Molnar wrote:
 
 * Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
   This feature will save one full reporter-developer round-trip during 
   investigation of a significant number of bug reports.
   
   It might be more practical if it were to dump the traces for _all_ 
   D-state processes when it fires - basically an auto-triggered 
   sysrq-W.
  
  Er, it won't play well if that happen when tasks are frozen for 
  suspend.
 
 right now any suspend attempt times out after 20 seconds:
 
   $ grep TIMEOUT kernel/power/process.c
   #define TIMEOUT (20 * HZ)
   end_time = jiffies + TIMEOUT;

This is the timeout for freezing tasks, but if the freezing succeeds, they
can stay in TASK_UNINTERRUPTIBLE for quite some more time, especially during
a hibernation (the tasks stay frozen until we power off the system after saving
the image).

 which should be well before the 120 seconds timeout that the detector 
 uses. But indeed you are right in that the refrigerator() works via 
 TASK_UNINTERRUPTIBLE too. I've updated the patch to exclude PF_FROZEN - 
 attached below. That should solve this particular issue, even if the 
 timeout increased to above 20 secs, right?

Sure.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar

* Rafael J. Wysocki [EMAIL PROTECTED] wrote:

   Er, it won't play well if that happen when tasks are frozen for 
   suspend.
  
  right now any suspend attempt times out after 20 seconds:
  
$ grep TIMEOUT kernel/power/process.c
#define TIMEOUT (20 * HZ)
end_time = jiffies + TIMEOUT;
 
 This is the timeout for freezing tasks, but if the freezing succeeds, 
 they can stay in TASK_UNINTERRUPTIBLE for quite some more time, 
 especially during a hibernation (the tasks stay frozen until we power 
 off the system after saving the image).

ah, ok. So this was a live bug - thanks for the clarification.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Mon, 3 Dec 2007 01:07:41 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> > We really need to get better diagnostics for the
> > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to
> > get to the scenario where we have a more or less robust measure of
> > kernel quality (and we're not all that far off for several cases),
> > one thing
> 
> One measure to kernel quality is to recover well from IO errors
> (like network problems or broken block devices)

yes. and this patch will flag cases that don't (yet) work well

> 
> This patch will likely work against that by breaking error paths.

it won't break error paths, it will at most put a warning in the log.
It doesn't kill or otherwise damage the system or process.

> 
> > This patch is a step in the right direction there, by quite a
> > lot.
> > 
> > I really don't understand what your objection is to this patch...
> > is it that an enterprise distro can't ship with it on? (Which is
> > fine btw)
> 
> Any distribution aimed at end users cannot ship with it on. 

That's a pretty bold statement; assuming that the TASK_KILLABLE patch
is in, I don't see the problem.

And even if a distro doesn't turn it on, I still don't see a problem;
it's a diagnostics patch that people can turn on (even at runtime) if
they see problems.

> Also in general I have my doubts that the false positive:real bug
> ratio of this warning is well balanced.

I'll just have to disagree with you then; but both of us are making
wild guesses. Only one way to get the real false positive percentage.

> Just consider the original
> example of dead network servers. Even in my relatively small
> home network that that is a quite common occurrence. This patch
> will break that all by throwing random backtraces when this 
> happens.

1) with TASK_KILLABLE that shouldn't happen
2) how does "throwing a backtrace" "break" things?


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
> We really need to get better diagnostics for the
> bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to get
> to the scenario where we have a more or less robust measure of kernel
> quality (and we're not all that far off for several cases), one thing

One measure to kernel quality is to recover well from IO errors
(like network problems or broken block devices)

This patch will likely work against that by breaking error paths.

> This patch is a step in the right direction there, by quite a
> lot.
> 
> I really don't understand what your objection is to this patch... is it
> that an enterprise distro can't ship with it on? (Which is fine btw)

Any distribution aimed at end users cannot ship with it on. 
Most likely not even a standard Linus kernel should really enable
it without warnings.

Also in general I have my doubts that the false positive:real bug
ratio of this warning is well balanced. Just consider the original
example of dead network servers. Even in my relatively small
home network that that is a quite common occurrence. This patch
will break that all by throwing random backtraces when this 
happens.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
> Delay accounting (or the /proc//sched fields that i added recently) 
> only get updated once a task has finished its unreasonably long delay 
> and has scheduled. 

If it is stuck forever then you can just use sysrq-t

If it recovers delay accounting will catch it.

> detected_ this way. This is a debugging facility that clearly belongs 
> into the kernel. 

My worry is that it will flag various legitimate cases. So far 
you seem to try to just hand-wave them away.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 21:47:25 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> > Out of direct experience, 95% of the "too long delay" cases are
> > plain old bugs. The rest we can (and must!) convert to
> > TASK_KILLABLE or could 
> 
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).   
> It would be pretty bad to merge this patch without converting them to 
> TASK_KILLABLE first

"pretty bad" as in "a few people see warnings in their dmesg" ?
And TASK_KILLABLE is hopefully about to get merged anyway.


We really need to get better diagnostics for the
bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to get
to the scenario where we have a more or less robust measure of kernel
quality (and we're not all that far off for several cases), one thing
we need keep doing is have the kernel detect bad cases as much as
possible. This patch is a step in the right direction there, by quite a
lot.

I really don't understand what your objection is to this patch... is it
that an enterprise distro can't ship with it on? (Which is fine btw)

> 


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ 
> > something that most humans consider as "buggy" in the overwhelming 
> > majority of cases, regardless of the reason? Yes, there are and will 
> > be some exceptions, but not nearly as countless as you try to paint 
> > it. A quick test in the next -mm will give us a good idea about the 
> > ratio of false positives.
> 
> That would assume error paths get regularly exercised in -mm. 
> Doubtful.  Most likely we'll only hear about it after it's out in the 
> wild on some bigger release.

by that argument we could never include _anything_ in -mm because ... 
only some bigger release would excercise error paths?

Your argument makes no objective sense to me - my patch is a 
non-intrusive debugging facility that people clearly find useful and 
that would increase the quality of kernel bugreporting.

If, contrary to expectation, it decreases kernel bugreporting quality 
then we'll disable it quickly - just like we did it with other debugging 
facilities that were causing more trouble than good. (suck as the stack 
unwinder code)

In fact it can already by disabled easily, from user-space, without any 
kernel change, by doing:

   echo 0 > /proc/sys/kernel/hung_task_timeout_secs

and there you go, no warnings at all. Or you can add this to 
/etc/sysctl.conf to disable it permanently:

   kernel.hung_task_timeout_secs = 0

or you can disable it in the .config. So i dont see your problem. It's 
just like most other debug facilities. (in fact it's more flexible than 
most other debug facilities)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > Until now users had little direct recourse to get such problems 
> > fixed. (we had sysrq-t, but that included no real metric of how long 
> > a task was
> 
> Actually task delay accounting can measure this now.  iirc someone had 
> a latencytop based on it already.

Delay accounting (or the /proc//sched fields that i added recently) 
only get updated once a task has finished its unreasonably long delay 
and has scheduled. So lockups or extremely long delays _wont be 
detected_ this way. This is a debugging facility that clearly belongs 
into the kernel. Your arguments just make no objective sense.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 22:19:25 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> > 
> > Until now users had little direct recourse to get such problems
> > fixed. (we had sysrq-t, but that included no real metric of how
> > long a task was 
> 
> Actually task delay accounting can measure this now.  iirc someone
> had a latencytop based on it already.


I have written a latencytop tool, but it's not based quite on the task
delay accounting (it doesn't provide the right information to make such
a tool). I've not released the tool mostly because I'm not quite happy
about the kernel side yet... but if there's real interest I'll fix it
up soon and release it.

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Ingo Molnar <[EMAIL PROTECTED]> writes:
>
> do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ 
> something that most humans consider as "buggy" in the overwhelming 
> majority of cases, regardless of the reason? Yes, there are and will be 
> some exceptions, but not nearly as countless as you try to paint it. A 
> quick test in the next -mm will give us a good idea about the ratio of 
> false positives.

That would assume error paths get regularly exercised in -mm. 
Doubtful.  Most likely we'll only hear about it after it's
out in the wild on some bigger release.

The problem I have with your patch is that it will mess up Linux (in
particular block/network file system) error handling even more than it
already is. In error handling cases such "unusual" things happen
frequently unfortunately.

I used to fight with this with the NMI watchdog on on x86-64 -- it
tended to trigger regularly on SCSI error handlers for example
disabling interrupts too long while handling the error. They
eventually got all fixed, but with that change they will likely
all start throwing nasty messages again. 

And usually it is not simply broken code neither but really
doing something difficult.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
> > what if you considered - just for a minute - the possibility of this 
> > debug tool being the thing that actually animates developers to fix such 
> > long delay bugs that have bothered users for almost a decade meanwhile?
> 
> Throwing frequent debugging messages for non buggy cases will just 
> lead to people generally ignore softlockups.

do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ 
something that most humans consider as "buggy" in the overwhelming 
majority of cases, regardless of the reason? Yes, there are and will be 
some exceptions, but not nearly as countless as you try to paint it. A 
quick test in the next -mm will give us a good idea about the ratio of 
false positives.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
> what if you considered - just for a minute - the possibility of this 
> debug tool being the thing that actually animates developers to fix such 
> long delay bugs that have bothered users for almost a decade meanwhile?

Throwing frequent debugging messages for non buggy cases will
just lead to people generally ignore softlockups.

I don't think runtime instrumentation is the way to introduce
TASK_KILLABLE in general. The only way there is people going through
the source and identify places where it makes sense.

> 
> Until now users had little direct recourse to get such problems fixed. 
> (we had sysrq-t, but that included no real metric of how long a task was 

Actually task delay accounting can measure this now.  iirc someone
had a latencytop based on it already.

> blocked, so there was no direct link in the typical case and users had 
> no real reliable tool to express their frustration about unreasonable 
> delays.)
> 
> Now this changes: they get a "smoking gun" backtrace reported by the 
> kernel, and blamed on exactly the place that caused that unreasonable 
> delay. And it's not like the kernel breaks - at most 10 such messages 
> are reported per bootup.
> 
> We increase the delay timeout to say 300 seconds, and if the system is 
> under extremely high IO load then 120+ might be a reasonable delay, so 
> it's all tunable and runtime disable-able anyway. So if you _know_ that 
> you will see and tolerate such long delays, you can tweak it - but i can 

This means the user has to see their kernel log fill by such
messages at least once - do a round trip to some mailing list to 
explain that it is expected and not a kernel bug - then tweak
some obscure parameters. Doesn't seem like a particular fruitful
procedure to me.

> tell you with 100% certainty that 99.9% of the typical Linux users do 
> not characterize such long delays as "correct behavior".

It's about robustness, not the typical case.
Throwing backtraces when something slightly unusual happens is not a robust 
system.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > Out of direct experience, 95% of the "too long delay" cases are plain 
> > old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 
> 
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).  It 
> would be pretty bad to merge this patch without converting them to 
> TASK_KILLABLE first

which we want to do in 2.6.25 anyway, so i dont see any big problems 
here. Also, it costs nothing to just stick it in and see the results, 
worst case we'd have to flip around the default. I think this is much 
ado about nothing - so far i dont really see any objective basis for 
your negative attitude.

> There's also the additional issue that even block devices are often 
> network or SAN backed these days. Having 120 second delays in there is 
> quite possible.
>
> So most likely adding this patch and still keeping a robust kernel 
> would require converting most of these delays to TASK_KILLABLE first. 
> That would not be a bad thing -- i would often like to kill a process 
> stuck on a bad block device -- but is likely a lot of work.

what if you considered - just for a minute - the possibility of this 
debug tool being the thing that actually animates developers to fix such 
long delay bugs that have bothered users for almost a decade meanwhile?

Until now users had little direct recourse to get such problems fixed. 
(we had sysrq-t, but that included no real metric of how long a task was 
blocked, so there was no direct link in the typical case and users had 
no real reliable tool to express their frustration about unreasonable 
delays.)

Now this changes: they get a "smoking gun" backtrace reported by the 
kernel, and blamed on exactly the place that caused that unreasonable 
delay. And it's not like the kernel breaks - at most 10 such messages 
are reported per bootup.

We increase the delay timeout to say 300 seconds, and if the system is 
under extremely high IO load then 120+ might be a reasonable delay, so 
it's all tunable and runtime disable-able anyway. So if you _know_ that 
you will see and tolerate such long delays, you can tweak it - but i can 
tell you with 100% certainty that 99.9% of the typical Linux users do 
not characterize such long delays as "correct behavior".

> > There are no softlockup false positive bugs open at the moment. If 
> > you know about any, then please do not hesitate and report them, 
> > i'll be eager to fix them. The softlockup detector is turned on by 
> > default in Fedora (alongside lockdep in rawhide), and it helped us 
> > find countless
> 
> That just means nobody runs stress tests on those. [...]

that is an all-encompassing blanket assertion that sadly drips of ill 
will (which permeates your mails lately). I for example run tons of 
stress tests on "those" and of course many others do too. So i dont 
really know what to think of your statement :-(

> [...] e.g. lockdep tends to explode even on simple stress tests on 
> larger systems because it tracks all locks in all dynamic objects in 
> memory and towards 6k-10k entries the graph walks tend to take 
> multiple seconds on some NUMA systems.

a bug was fixed in this area - can you still see this with 2.6.24-rc3?

[ But i'd be the first one to point out that lockdep is certainly not
  from the cheap tools department, that's why i said above that lockdep
  is enabled in Fedora rawhide (i.e. development) kernels. Softlockup
  detector is much cheaper and it's default enabled all the time. ]

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
> Out of direct experience, 95% of the "too long delay" cases are plain 
> old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 

I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).   
It would be pretty bad to merge this patch without converting them to 
TASK_KILLABLE first

There's also the additional issue that even block devices are often
network or SAN backed these days. Having 120 second delays in there
is quite possible.

So most likely adding this patch and still keeping a robust kernel
would require converting most of these delays to TASK_KILLABLE first.
That would not be a bad thing -- i would often like to kill a
process stuck on a bad block device -- but is likely a lot of work.

> There are no softlockup false positive bugs open at the moment. If you 
> know about any, then please do not hesitate and report them, i'll be 
> eager to fix them. The softlockup detector is turned on by default in 
> Fedora (alongside lockdep in rawhide), and it helped us find countless 

That just means nobody runs stress tests on those. e.g. lockdep 
tends to explode even on simple stress tests on larger systems because it
tracks all locks in all dynamic objects in memory and towards 6k-10k entries
the graph walks tend to take multiple seconds on some NUMA systems.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > .. and it's even a tool to show where we missed making something 
> > TASK_KILLABLE... anything that triggers from NFS and the like really 
> > ought to be TASK_KILLABLE after all. This patch will point any 
> > omissions out quite nicely without having to do any kind of 
> > destructive testing.
> 
> It would be better to just audit the source for those. [...]

that was wishful thinking 10 years ago already, when Linux was 10 times 
smaller.

> [...] Outlawing something which was previously legal without auditing 
> the source is bad.

to the contrary, being 120+ seconds uninterruptible without a very good 
reason is certainly something that was unreasonable (and harmful) for a 
long time already - we just never had the mechanism to warn about this 
intelligently without crashing the system.

Out of direct experience, 95% of the "too long delay" cases are plain 
old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 
annotate if it _really_ needs to be TASK_UNINTERRUPTIBLE.

> Anyways, i suspect it would just lead to more people disabling 
> softlockup. I remember during some older stress testing it also tended 
> to explode regularly, so e.g. SUSE kernel rpms have it disabled. That 
> patch would probably make it worse.

There are no softlockup false positive bugs open at the moment. If you 
know about any, then please do not hesitate and report them, i'll be 
eager to fix them. The softlockup detector is turned on by default in 
Fedora (alongside lockdep in rawhide), and it helped us find countless 
number of bugs. You are the first person to suggest that it's somehow 
harmful.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> > TASK_KILLABLE should be the right solution i think.
> 
> .. and it's even a tool to show where we missed making something 
> TASK_KILLABLE... anything that triggers from NFS and the like really 
> ought to be TASK_KILLABLE after all. This patch will point any 
> omissions out quite nicely without having to do any kind of 
> destructive testing.

yeah, exactly. Having something "hard blocked" for a long amount of time 
is rarely a good thing.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
> .. and it's even a tool to show where we missed making something
> TASK_KILLABLE... anything that triggers from NFS and the like really
> ought to be TASK_KILLABLE after all. This patch will point any
> omissions out quite nicely without having to do any kind of destructive
> testing.

It would be better to just audit the source for those. Outlawing
something which was previously legal without auditing the source
is bad.

Anyways, i suspect it would just lead to more people disabling
softlockup. I remember during some older stress testing it also
tended to explode regularly, so e.g. SUSE kernel rpms have it disabled.
That patch would probably make it worse.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 19:59:45 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > Ingo Molnar <[EMAIL PROTECTED]> writes:
> > 
> > > this patch extends the soft-lockup detector to automatically
> > > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> > > printed the following way:
> > 
> > That will likely trigger anytime a hard nfs/cifs mount loses its 
> > server for 120s. To make this work you would need a new 
> > TASK_UNINTERRUPTIBLE_EXTERNAL_EVENT or similar and mark all the
> > places which depend on those.
> 
> TASK_KILLABLE should be the right solution i think.

.. and it's even a tool to show where we missed making something
TASK_KILLABLE... anything that triggers from NFS and the like really
ought to be TASK_KILLABLE after all. This patch will point any
omissions out quite nicely without having to do any kind of destructive
testing.


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> Ingo Molnar <[EMAIL PROTECTED]> writes:
> 
> > this patch extends the soft-lockup detector to automatically
> > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> > printed the following way:
> 
> That will likely trigger anytime a hard nfs/cifs mount loses its 
> server for 120s. To make this work you would need a new 
> TASK_UNINTERRUPTIBLE_EXTERNAL_EVENT or similar and mark all the places 
> which depend on those.

TASK_KILLABLE should be the right solution i think.

> I've also seen kernel modules that use semaphores like wait queues 
> While that gives a little funny results (high load average) it worked 
> so far, until this patch.

it still works with this patch of course. With a funny results like high 
load average _AND_ a clear debug message that tells us (and the user) 
where that high load average comes from.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Ingo Molnar <[EMAIL PROTECTED]> writes:

> this patch extends the soft-lockup detector to automatically
> detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> printed the following way:

That will likely trigger anytime a hard nfs/cifs mount loses its
server for 120s. To make this work you would need a new 
TASK_UNINTERRUPTIBLE_EXTERNAL_EVENT or similar and mark
all the places which depend on those.

I've also seen kernel modules that use semaphores like wait queues
While that gives a little funny results (high load average) it worked 
so far, until this patch.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread David Rientjes
On Sun, 2 Dec 2007, Ingo Oeser wrote:

> > maybe, but we'd have to see how often this gets triggered. An OOM is 
> > something that could happen in any overloaded system - while a hung task 
> > is likely due to a kernel bug.
> 
> What about a client using hard mounted NFS shares here? That shouldn't be
> killed by the OOM killer in that situation, should it?
> 

That's orthogonal to the point I was making; the problem with the OOM 
killer right now is that it can easily enter an infinite loop in out of 
memory conditions if the task that it has selected to be killed fails to 
exit.  This only happens when the task hangs in TASK_UNINTERRUPTIBLE state 
and doesn't respond to the SIGKILL that the OOM killer has sent it.

That behavior is a consequence of trying to avoid needlessly killing tasks 
by giving already-killed tasks time to exit in subsequent OOM conditions.  
During the tasklist scan of eligible tasks to kill, if any task is found 
to have access to memory reserves that only the OOM killer can provide 
(signified by the TIF_MEMDIE thread flag) and it has not yet died, the OOM 
killer becomes a complete no-op.

This happens on occasion and completely deadlocks the system because the 
out of memory condition will never be alleviated.  With the hang detection 
addition to lockdep, it would be easy to correct this situation.  I 
understand the primary purpose of the patch is to identify potential 
kernel bugs that aren't hardware induced, but I think it has relevance to 
the OOM killer problem until such time as tasks hanging in 
TASK_UNINTERRUPTIBLE state becomes passe.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Ingo Oeser <[EMAIL PROTECTED]> wrote:

> On Saturday 01 December 2007, Ingo Molnar wrote:
> > maybe, but we'd have to see how often this gets triggered. An OOM is 
> > something that could happen in any overloaded system - while a hung task 
> > is likely due to a kernel bug.
> 
> What about a client using hard mounted NFS shares here? That shouldn't 
> be killed by the OOM killer in that situation, should it?

NFS is a bit weird in this regard - fundamentally everything should be 
interruptible (or at least killable). Wont the TASK_KILLABLE solve these 
problems?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Ingo Oeser [EMAIL PROTECTED] wrote:

 On Saturday 01 December 2007, Ingo Molnar wrote:
  maybe, but we'd have to see how often this gets triggered. An OOM is 
  something that could happen in any overloaded system - while a hung task 
  is likely due to a kernel bug.
 
 What about a client using hard mounted NFS shares here? That shouldn't 
 be killed by the OOM killer in that situation, should it?

NFS is a bit weird in this regard - fundamentally everything should be 
interruptible (or at least killable). Wont the TASK_KILLABLE solve these 
problems?

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread David Rientjes
On Sun, 2 Dec 2007, Ingo Oeser wrote:

  maybe, but we'd have to see how often this gets triggered. An OOM is 
  something that could happen in any overloaded system - while a hung task 
  is likely due to a kernel bug.
 
 What about a client using hard mounted NFS shares here? That shouldn't be
 killed by the OOM killer in that situation, should it?
 

That's orthogonal to the point I was making; the problem with the OOM 
killer right now is that it can easily enter an infinite loop in out of 
memory conditions if the task that it has selected to be killed fails to 
exit.  This only happens when the task hangs in TASK_UNINTERRUPTIBLE state 
and doesn't respond to the SIGKILL that the OOM killer has sent it.

That behavior is a consequence of trying to avoid needlessly killing tasks 
by giving already-killed tasks time to exit in subsequent OOM conditions.  
During the tasklist scan of eligible tasks to kill, if any task is found 
to have access to memory reserves that only the OOM killer can provide 
(signified by the TIF_MEMDIE thread flag) and it has not yet died, the OOM 
killer becomes a complete no-op.

This happens on occasion and completely deadlocks the system because the 
out of memory condition will never be alleviated.  With the hang detection 
addition to lockdep, it would be easy to correct this situation.  I 
understand the primary purpose of the patch is to identify potential 
kernel bugs that aren't hardware induced, but I think it has relevance to 
the OOM killer problem until such time as tasks hanging in 
TASK_UNINTERRUPTIBLE state becomes passe.

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Ingo Molnar [EMAIL PROTECTED] writes:

 this patch extends the soft-lockup detector to automatically
 detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
 printed the following way:

That will likely trigger anytime a hard nfs/cifs mount loses its
server for 120s. To make this work you would need a new 
TASK_UNINTERRUPTIBLE_EXTERNAL_EVENT or similar and mark
all the places which depend on those.

I've also seen kernel modules that use semaphores like wait queues
While that gives a little funny results (high load average) it worked 
so far, until this patch.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 Ingo Molnar [EMAIL PROTECTED] writes:
 
  this patch extends the soft-lockup detector to automatically
  detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
  printed the following way:
 
 That will likely trigger anytime a hard nfs/cifs mount loses its 
 server for 120s. To make this work you would need a new 
 TASK_UNINTERRUPTIBLE_EXTERNAL_EVENT or similar and mark all the places 
 which depend on those.

TASK_KILLABLE should be the right solution i think.

 I've also seen kernel modules that use semaphores like wait queues 
 While that gives a little funny results (high load average) it worked 
 so far, until this patch.

it still works with this patch of course. With a funny results like high 
load average _AND_ a clear debug message that tells us (and the user) 
where that high load average comes from.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 19:59:45 +0100
Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Andi Kleen [EMAIL PROTECTED] wrote:
 
  Ingo Molnar [EMAIL PROTECTED] writes:
  
   this patch extends the soft-lockup detector to automatically
   detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
   printed the following way:
  
  That will likely trigger anytime a hard nfs/cifs mount loses its 
  server for 120s. To make this work you would need a new 
  TASK_UNINTERRUPTIBLE_EXTERNAL_EVENT or similar and mark all the
  places which depend on those.
 
 TASK_KILLABLE should be the right solution i think.

.. and it's even a tool to show where we missed making something
TASK_KILLABLE... anything that triggers from NFS and the like really
ought to be TASK_KILLABLE after all. This patch will point any
omissions out quite nicely without having to do any kind of destructive
testing.


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  .. and it's even a tool to show where we missed making something 
  TASK_KILLABLE... anything that triggers from NFS and the like really 
  ought to be TASK_KILLABLE after all. This patch will point any 
  omissions out quite nicely without having to do any kind of 
  destructive testing.
 
 It would be better to just audit the source for those. [...]

that was wishful thinking 10 years ago already, when Linux was 10 times 
smaller.

 [...] Outlawing something which was previously legal without auditing 
 the source is bad.

to the contrary, being 120+ seconds uninterruptible without a very good 
reason is certainly something that was unreasonable (and harmful) for a 
long time already - we just never had the mechanism to warn about this 
intelligently without crashing the system.

Out of direct experience, 95% of the too long delay cases are plain 
old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 
annotate if it _really_ needs to be TASK_UNINTERRUPTIBLE.

 Anyways, i suspect it would just lead to more people disabling 
 softlockup. I remember during some older stress testing it also tended 
 to explode regularly, so e.g. SUSE kernel rpms have it disabled. That 
 patch would probably make it worse.

There are no softlockup false positive bugs open at the moment. If you 
know about any, then please do not hesitate and report them, i'll be 
eager to fix them. The softlockup detector is turned on by default in 
Fedora (alongside lockdep in rawhide), and it helped us find countless 
number of bugs. You are the first person to suggest that it's somehow 
harmful.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Arjan van de Ven [EMAIL PROTECTED] wrote:

  TASK_KILLABLE should be the right solution i think.
 
 .. and it's even a tool to show where we missed making something 
 TASK_KILLABLE... anything that triggers from NFS and the like really 
 ought to be TASK_KILLABLE after all. This patch will point any 
 omissions out quite nicely without having to do any kind of 
 destructive testing.

yeah, exactly. Having something hard blocked for a long amount of time 
is rarely a good thing.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
 Out of direct experience, 95% of the too long delay cases are plain 
 old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 

I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).   
It would be pretty bad to merge this patch without converting them to 
TASK_KILLABLE first

There's also the additional issue that even block devices are often
network or SAN backed these days. Having 120 second delays in there
is quite possible.

So most likely adding this patch and still keeping a robust kernel
would require converting most of these delays to TASK_KILLABLE first.
That would not be a bad thing -- i would often like to kill a
process stuck on a bad block device -- but is likely a lot of work.

 There are no softlockup false positive bugs open at the moment. If you 
 know about any, then please do not hesitate and report them, i'll be 
 eager to fix them. The softlockup detector is turned on by default in 
 Fedora (alongside lockdep in rawhide), and it helped us find countless 

That just means nobody runs stress tests on those. e.g. lockdep 
tends to explode even on simple stress tests on larger systems because it
tracks all locks in all dynamic objects in memory and towards 6k-10k entries
the graph walks tend to take multiple seconds on some NUMA systems.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
 .. and it's even a tool to show where we missed making something
 TASK_KILLABLE... anything that triggers from NFS and the like really
 ought to be TASK_KILLABLE after all. This patch will point any
 omissions out quite nicely without having to do any kind of destructive
 testing.

It would be better to just audit the source for those. Outlawing
something which was previously legal without auditing the source
is bad.

Anyways, i suspect it would just lead to more people disabling
softlockup. I remember during some older stress testing it also
tended to explode regularly, so e.g. SUSE kernel rpms have it disabled.
That patch would probably make it worse.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
 what if you considered - just for a minute - the possibility of this 
 debug tool being the thing that actually animates developers to fix such 
 long delay bugs that have bothered users for almost a decade meanwhile?

Throwing frequent debugging messages for non buggy cases will
just lead to people generally ignore softlockups.

I don't think runtime instrumentation is the way to introduce
TASK_KILLABLE in general. The only way there is people going through
the source and identify places where it makes sense.

 
 Until now users had little direct recourse to get such problems fixed. 
 (we had sysrq-t, but that included no real metric of how long a task was 

Actually task delay accounting can measure this now.  iirc someone
had a latencytop based on it already.

 blocked, so there was no direct link in the typical case and users had 
 no real reliable tool to express their frustration about unreasonable 
 delays.)
 
 Now this changes: they get a smoking gun backtrace reported by the 
 kernel, and blamed on exactly the place that caused that unreasonable 
 delay. And it's not like the kernel breaks - at most 10 such messages 
 are reported per bootup.
 
 We increase the delay timeout to say 300 seconds, and if the system is 
 under extremely high IO load then 120+ might be a reasonable delay, so 
 it's all tunable and runtime disable-able anyway. So if you _know_ that 
 you will see and tolerate such long delays, you can tweak it - but i can 

This means the user has to see their kernel log fill by such
messages at least once - do a round trip to some mailing list to 
explain that it is expected and not a kernel bug - then tweak
some obscure parameters. Doesn't seem like a particular fruitful
procedure to me.

 tell you with 100% certainty that 99.9% of the typical Linux users do 
 not characterize such long delays as correct behavior.

It's about robustness, not the typical case.
Throwing backtraces when something slightly unusual happens is not a robust 
system.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
  what if you considered - just for a minute - the possibility of this 
  debug tool being the thing that actually animates developers to fix such 
  long delay bugs that have bothered users for almost a decade meanwhile?
 
 Throwing frequent debugging messages for non buggy cases will just 
 lead to people generally ignore softlockups.

do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ 
something that most humans consider as buggy in the overwhelming 
majority of cases, regardless of the reason? Yes, there are and will be 
some exceptions, but not nearly as countless as you try to paint it. A 
quick test in the next -mm will give us a good idea about the ratio of 
false positives.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Andi Kleen
Ingo Molnar [EMAIL PROTECTED] writes:

 do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ 
 something that most humans consider as buggy in the overwhelming 
 majority of cases, regardless of the reason? Yes, there are and will be 
 some exceptions, but not nearly as countless as you try to paint it. A 
 quick test in the next -mm will give us a good idea about the ratio of 
 false positives.

That would assume error paths get regularly exercised in -mm. 
Doubtful.  Most likely we'll only hear about it after it's
out in the wild on some bigger release.

The problem I have with your patch is that it will mess up Linux (in
particular block/network file system) error handling even more than it
already is. In error handling cases such unusual things happen
frequently unfortunately.

I used to fight with this with the NMI watchdog on on x86-64 -- it
tended to trigger regularly on SCSI error handlers for example
disabling interrupts too long while handling the error. They
eventually got all fixed, but with that change they will likely
all start throwing nasty messages again. 

And usually it is not simply broken code neither but really
doing something difficult.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  Out of direct experience, 95% of the too long delay cases are plain 
  old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 
 
 I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).  It 
 would be pretty bad to merge this patch without converting them to 
 TASK_KILLABLE first

which we want to do in 2.6.25 anyway, so i dont see any big problems 
here. Also, it costs nothing to just stick it in and see the results, 
worst case we'd have to flip around the default. I think this is much 
ado about nothing - so far i dont really see any objective basis for 
your negative attitude.

 There's also the additional issue that even block devices are often 
 network or SAN backed these days. Having 120 second delays in there is 
 quite possible.

 So most likely adding this patch and still keeping a robust kernel 
 would require converting most of these delays to TASK_KILLABLE first. 
 That would not be a bad thing -- i would often like to kill a process 
 stuck on a bad block device -- but is likely a lot of work.

what if you considered - just for a minute - the possibility of this 
debug tool being the thing that actually animates developers to fix such 
long delay bugs that have bothered users for almost a decade meanwhile?

Until now users had little direct recourse to get such problems fixed. 
(we had sysrq-t, but that included no real metric of how long a task was 
blocked, so there was no direct link in the typical case and users had 
no real reliable tool to express their frustration about unreasonable 
delays.)

Now this changes: they get a smoking gun backtrace reported by the 
kernel, and blamed on exactly the place that caused that unreasonable 
delay. And it's not like the kernel breaks - at most 10 such messages 
are reported per bootup.

We increase the delay timeout to say 300 seconds, and if the system is 
under extremely high IO load then 120+ might be a reasonable delay, so 
it's all tunable and runtime disable-able anyway. So if you _know_ that 
you will see and tolerate such long delays, you can tweak it - but i can 
tell you with 100% certainty that 99.9% of the typical Linux users do 
not characterize such long delays as correct behavior.

  There are no softlockup false positive bugs open at the moment. If 
  you know about any, then please do not hesitate and report them, 
  i'll be eager to fix them. The softlockup detector is turned on by 
  default in Fedora (alongside lockdep in rawhide), and it helped us 
  find countless
 
 That just means nobody runs stress tests on those. [...]

that is an all-encompassing blanket assertion that sadly drips of ill 
will (which permeates your mails lately). I for example run tons of 
stress tests on those and of course many others do too. So i dont 
really know what to think of your statement :-(

 [...] e.g. lockdep tends to explode even on simple stress tests on 
 larger systems because it tracks all locks in all dynamic objects in 
 memory and towards 6k-10k entries the graph walks tend to take 
 multiple seconds on some NUMA systems.

a bug was fixed in this area - can you still see this with 2.6.24-rc3?

[ But i'd be the first one to point out that lockdep is certainly not
  from the cheap tools department, that's why i said above that lockdep
  is enabled in Fedora rawhide (i.e. development) kernels. Softlockup
  detector is much cheaper and it's default enabled all the time. ]

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ 
  something that most humans consider as buggy in the overwhelming 
  majority of cases, regardless of the reason? Yes, there are and will 
  be some exceptions, but not nearly as countless as you try to paint 
  it. A quick test in the next -mm will give us a good idea about the 
  ratio of false positives.
 
 That would assume error paths get regularly exercised in -mm. 
 Doubtful.  Most likely we'll only hear about it after it's out in the 
 wild on some bigger release.

by that argument we could never include _anything_ in -mm because ... 
only some bigger release would excercise error paths?

Your argument makes no objective sense to me - my patch is a 
non-intrusive debugging facility that people clearly find useful and 
that would increase the quality of kernel bugreporting.

If, contrary to expectation, it decreases kernel bugreporting quality 
then we'll disable it quickly - just like we did it with other debugging 
facilities that were causing more trouble than good. (suck as the stack 
unwinder code)

In fact it can already by disabled easily, from user-space, without any 
kernel change, by doing:

   echo 0  /proc/sys/kernel/hung_task_timeout_secs

and there you go, no warnings at all. Or you can add this to 
/etc/sysctl.conf to disable it permanently:

   kernel.hung_task_timeout_secs = 0

or you can disable it in the .config. So i dont see your problem. It's 
just like most other debug facilities. (in fact it's more flexible than 
most other debug facilities)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Arjan van de Ven
On Sun, 2 Dec 2007 22:19:25 +0100
Andi Kleen [EMAIL PROTECTED] wrote:

  
  Until now users had little direct recourse to get such problems
  fixed. (we had sysrq-t, but that included no real metric of how
  long a task was 
 
 Actually task delay accounting can measure this now.  iirc someone
 had a latencytop based on it already.


I have written a latencytop tool, but it's not based quite on the task
delay accounting (it doesn't provide the right information to make such
a tool). I've not released the tool mostly because I'm not quite happy
about the kernel side yet... but if there's real interest I'll fix it
up soon and release it.

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-02 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  Until now users had little direct recourse to get such problems 
  fixed. (we had sysrq-t, but that included no real metric of how long 
  a task was
 
 Actually task delay accounting can measure this now.  iirc someone had 
 a latencytop based on it already.

Delay accounting (or the /proc/PID/sched fields that i added recently) 
only get updated once a task has finished its unreasonably long delay 
and has scheduled. So lockups or extremely long delays _wont be 
detected_ this way. This is a debugging facility that clearly belongs 
into the kernel. Your arguments just make no objective sense.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >