Re: [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu

2014-08-28 Thread Don Zickus
On Thu, Aug 28, 2014 at 04:07:23PM -0700, Andrew Morton wrote:
> On Thu, 28 Aug 2014 00:52:24 -0400 Don Zickus  wrote:
> 
> > From: chai wen 
> > 
> > For now, soft lockup detector warns once for each case of process 
> > softlockup.
> > But the thread 'watchdog/n' may not always get the cpu at the time slot 
> > between
> > the task switch of two processes hogging that cpu to reset 
> > soft_watchdog_warn.
> > 
> > An example would be two processes hogging the cpu.  Process A causes the
> > softlockup warning and is killed manually by a user.  Process B immediately
> > becomes the new process hogging the cpu preventing the softlockup code from
> > resetting the soft_watchdog_warn variable.
> > 
> > This case is a false negative of "warn only once for a process", as there 
> > may
> > be a different process that is going to hog the cpu.  Resolve this by
> > saving/checking the task pointer of the hogging process and use that to 
> > reset
> > soft_watchdog_warn too.
> > 
> 
> OK, this should address the PID uniqueness issue which Ingo identified.
> 
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
> >  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
> >  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> >  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
> > +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
> >  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> >  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
> >  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
> > @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> > hrtimer *hrtimer)
> > return HRTIMER_RESTART;
> >  
> > /* only warn once */
> > -   if (__this_cpu_read(soft_watchdog_warn) == true)
> > +   if (__this_cpu_read(soft_watchdog_warn) == true) {
> > +   /*
> > +* Handle the case where multiple processes are
> > +* causing softlockups but the duration is small
> > +* enough, the softlockup detector can not reset
> > +* itself in time.  Use task pointers to detect this.
> > +*/
> 
> This comment is rather hard to follow ("the duration" of what?).  Can
> you think of some words which are a bit more complete/clear?

Agreed.  Does this work better?

"
/*
 * When multiple processes are causing softlockups
 * the softlockup detector only warns on the first
 * one because the code relies on a full quiet cycle
 * to re-arm.  The second process prevents the
 * quiet cycle and never gets reported.  Use task
 * pointers to detect this.
 */

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu

2014-08-28 Thread Andrew Morton
On Thu, 28 Aug 2014 00:52:24 -0400 Don Zickus  wrote:

> From: chai wen 
> 
> For now, soft lockup detector warns once for each case of process softlockup.
> But the thread 'watchdog/n' may not always get the cpu at the time slot 
> between
> the task switch of two processes hogging that cpu to reset soft_watchdog_warn.
> 
> An example would be two processes hogging the cpu.  Process A causes the
> softlockup warning and is killed manually by a user.  Process B immediately
> becomes the new process hogging the cpu preventing the softlockup code from
> resetting the soft_watchdog_warn variable.
> 
> This case is a false negative of "warn only once for a process", as there may
> be a different process that is going to hog the cpu.  Resolve this by
> saving/checking the task pointer of the hogging process and use that to reset
> soft_watchdog_warn too.
> 

OK, this should address the PID uniqueness issue which Ingo identified.

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
>  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
>  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
>  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
> +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
> @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> hrtimer *hrtimer)
>   return HRTIMER_RESTART;
>  
>   /* only warn once */
> - if (__this_cpu_read(soft_watchdog_warn) == true)
> + if (__this_cpu_read(soft_watchdog_warn) == true) {
> + /*
> +  * Handle the case where multiple processes are
> +  * causing softlockups but the duration is small
> +  * enough, the softlockup detector can not reset
> +  * itself in time.  Use task pointers to detect this.
> +  */

This comment is rather hard to follow ("the duration" of what?).  Can
you think of some words which are a bit more complete/clear?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu

2014-08-28 Thread Andrew Morton
On Thu, 28 Aug 2014 00:52:24 -0400 Don Zickus dzic...@redhat.com wrote:

 From: chai wen chaiw.f...@cn.fujitsu.com
 
 For now, soft lockup detector warns once for each case of process softlockup.
 But the thread 'watchdog/n' may not always get the cpu at the time slot 
 between
 the task switch of two processes hogging that cpu to reset soft_watchdog_warn.
 
 An example would be two processes hogging the cpu.  Process A causes the
 softlockup warning and is killed manually by a user.  Process B immediately
 becomes the new process hogging the cpu preventing the softlockup code from
 resetting the soft_watchdog_warn variable.
 
 This case is a false negative of warn only once for a process, as there may
 be a different process that is going to hog the cpu.  Resolve this by
 saving/checking the task pointer of the hogging process and use that to reset
 soft_watchdog_warn too.
 

OK, this should address the PID uniqueness issue which Ingo identified.

 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
 +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
  #ifdef CONFIG_HARDLOCKUP_DETECTOR
  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
   return HRTIMER_RESTART;
  
   /* only warn once */
 - if (__this_cpu_read(soft_watchdog_warn) == true)
 + if (__this_cpu_read(soft_watchdog_warn) == true) {
 + /*
 +  * Handle the case where multiple processes are
 +  * causing softlockups but the duration is small
 +  * enough, the softlockup detector can not reset
 +  * itself in time.  Use task pointers to detect this.
 +  */

This comment is rather hard to follow (the duration of what?).  Can
you think of some words which are a bit more complete/clear?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu

2014-08-28 Thread Don Zickus
On Thu, Aug 28, 2014 at 04:07:23PM -0700, Andrew Morton wrote:
 On Thu, 28 Aug 2014 00:52:24 -0400 Don Zickus dzic...@redhat.com wrote:
 
  From: chai wen chaiw.f...@cn.fujitsu.com
  
  For now, soft lockup detector warns once for each case of process 
  softlockup.
  But the thread 'watchdog/n' may not always get the cpu at the time slot 
  between
  the task switch of two processes hogging that cpu to reset 
  soft_watchdog_warn.
  
  An example would be two processes hogging the cpu.  Process A causes the
  softlockup warning and is killed manually by a user.  Process B immediately
  becomes the new process hogging the cpu preventing the softlockup code from
  resetting the soft_watchdog_warn variable.
  
  This case is a false negative of warn only once for a process, as there 
  may
  be a different process that is going to hog the cpu.  Resolve this by
  saving/checking the task pointer of the hogging process and use that to 
  reset
  soft_watchdog_warn too.
  
 
 OK, this should address the PID uniqueness issue which Ingo identified.
 
  --- a/kernel/watchdog.c
  +++ b/kernel/watchdog.c
  @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
   static DEFINE_PER_CPU(bool, soft_watchdog_warn);
   static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
   static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
  +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
   #ifdef CONFIG_HARDLOCKUP_DETECTOR
   static DEFINE_PER_CPU(bool, hard_watchdog_warn);
   static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
  @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
  hrtimer *hrtimer)
  return HRTIMER_RESTART;
   
  /* only warn once */
  -   if (__this_cpu_read(soft_watchdog_warn) == true)
  +   if (__this_cpu_read(soft_watchdog_warn) == true) {
  +   /*
  +* Handle the case where multiple processes are
  +* causing softlockups but the duration is small
  +* enough, the softlockup detector can not reset
  +* itself in time.  Use task pointers to detect this.
  +*/
 
 This comment is rather hard to follow (the duration of what?).  Can
 you think of some words which are a bit more complete/clear?

Agreed.  Does this work better?


/*
 * When multiple processes are causing softlockups
 * the softlockup detector only warns on the first
 * one because the code relies on a full quiet cycle
 * to re-arm.  The second process prevents the
 * quiet cycle and never gets reported.  Use task
 * pointers to detect this.
 */

Cheers,
Don

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu

2014-08-27 Thread Don Zickus
From: chai wen 

For now, soft lockup detector warns once for each case of process softlockup.
But the thread 'watchdog/n' may not always get the cpu at the time slot between
the task switch of two processes hogging that cpu to reset soft_watchdog_warn.

An example would be two processes hogging the cpu.  Process A causes the
softlockup warning and is killed manually by a user.  Process B immediately
becomes the new process hogging the cpu preventing the softlockup code from
resetting the soft_watchdog_warn variable.

This case is a false negative of "warn only once for a process", as there may
be a different process that is going to hog the cpu.  Resolve this by
saving/checking the task pointer of the hogging process and use that to reset
soft_watchdog_warn too.

Signed-off-by: chai wen 
Signed-off-by: Don Zickus 
---
 kernel/watchdog.c |   16 +++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index c3319bd..499f65f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
+static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
@@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
return HRTIMER_RESTART;
 
/* only warn once */
-   if (__this_cpu_read(soft_watchdog_warn) == true)
+   if (__this_cpu_read(soft_watchdog_warn) == true) {
+   /*
+* Handle the case where multiple processes are
+* causing softlockups but the duration is small
+* enough, the softlockup detector can not reset
+* itself in time.  Use task pointers to detect this.
+*/
+   if (__this_cpu_read(softlockup_task_ptr_saved) !=
+   current) {
+   __this_cpu_write(soft_watchdog_warn, false);
+   __touch_watchdog();
+   }
return HRTIMER_RESTART;
+   }
 
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is 
already
@@ -348,6 +361,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! 
[%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
+   __this_cpu_write(softlockup_task_ptr_saved, current);
print_modules();
print_irqtrace_events(current);
if (regs)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu

2014-08-27 Thread Don Zickus
From: chai wen chaiw.f...@cn.fujitsu.com

For now, soft lockup detector warns once for each case of process softlockup.
But the thread 'watchdog/n' may not always get the cpu at the time slot between
the task switch of two processes hogging that cpu to reset soft_watchdog_warn.

An example would be two processes hogging the cpu.  Process A causes the
softlockup warning and is killed manually by a user.  Process B immediately
becomes the new process hogging the cpu preventing the softlockup code from
resetting the soft_watchdog_warn variable.

This case is a false negative of warn only once for a process, as there may
be a different process that is going to hog the cpu.  Resolve this by
saving/checking the task pointer of the hogging process and use that to reset
soft_watchdog_warn too.

Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
Signed-off-by: Don Zickus dzic...@redhat.com
---
 kernel/watchdog.c |   16 +++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index c3319bd..499f65f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
+static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
@@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
return HRTIMER_RESTART;
 
/* only warn once */
-   if (__this_cpu_read(soft_watchdog_warn) == true)
+   if (__this_cpu_read(soft_watchdog_warn) == true) {
+   /*
+* Handle the case where multiple processes are
+* causing softlockups but the duration is small
+* enough, the softlockup detector can not reset
+* itself in time.  Use task pointers to detect this.
+*/
+   if (__this_cpu_read(softlockup_task_ptr_saved) !=
+   current) {
+   __this_cpu_write(soft_watchdog_warn, false);
+   __touch_watchdog();
+   }
return HRTIMER_RESTART;
+   }
 
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is 
already
@@ -348,6 +361,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
printk(KERN_EMERG BUG: soft lockup - CPU#%d stuck for %us! 
[%s:%d]\n,
smp_processor_id(), duration,
current-comm, task_pid_nr(current));
+   __this_cpu_write(softlockup_task_ptr_saved, current);
print_modules();
print_irqtrace_events(current);
if (regs)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-26 Thread Chai Wen
On 08/26/2014 10:22 PM, Don Zickus wrote:

> On Tue, Aug 26, 2014 at 08:51:30PM +0800, Chai Wen wrote:
>> On 08/22/2014 09:58 AM, Don Zickus wrote:
>>
>>> On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote:
 For now, soft lockup detector warns once for each case of process 
 softlockup.
 But the thread 'watchdog/n' may not always get the cpu at the time slot 
 between
 the task switch of two processes hogging that cpu to reset 
 soft_watchdog_warn.

 An example would be two processes hogging the cpu.  Process A causes the
 softlockup warning and is killed manually by a user.  Process B immediately
 becomes the new process hogging the cpu preventing the softlockup code from
 resetting the soft_watchdog_warn variable.

 This case is a false negative of "warn only once for a process", as there 
 may
 be a different process that is going to hog the cpu.  Resolve this by
 saving/checking the task pointer of the hogging process and use that to 
 reset
 soft_watchdog_warn too.

 Signed-off-by: chai wen 
 Signed-off-by: Don Zickus 
>>>
>>> Acked-by: Don Zickus 
>>>
>>
>>
>> Hi Andrew
>>
>> Sorry for some disturbing.
>> Could you help to check and pick up this little improvement patch ?
>>
>> I am not sure which MAINTAINER I should talk to, but the original version of
>> this patch is queued to -mm tree by you, so I assume that they are in the 
>> charge of you.
>>
>>
>> thanks
>> chai wen
> 
> Hi Chai,
> 
> Sorry about that.  Ingo asked me privately to pick this up and re-post
> with my signoff.  I was converting to a new test env and was going to use this
> patch as an excuse to exercise it.  That is the delay.  Let me get this
> out today.
> 


OK, It is kind of you to do that, thanks for your work. :)

thanks
chai wen

> Cheers,
> Don
> 
>>
 ---
  kernel/watchdog.c |   16 +++-
  1 files changed, 15 insertions(+), 1 deletions(-)

 diff --git a/kernel/watchdog.c b/kernel/watchdog.c
 index 0037db6..2e55620 100644
 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
 +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
  #ifdef CONFIG_HARDLOCKUP_DETECTOR
  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
return HRTIMER_RESTART;
  
/* only warn once */
 -  if (__this_cpu_read(soft_watchdog_warn) == true)
 +  if (__this_cpu_read(soft_watchdog_warn) == true) {
 +  /*
 +   * Handle the case where multiple processes are
 +   * causing softlockups but the duration is small
 +   * enough, the softlockup detector can not reset
 +   * itself in time.  Use task pointers to detect this.
 +   */
 +  if (__this_cpu_read(softlockup_task_ptr_saved) !=
 +  current) {
 +  __this_cpu_write(soft_watchdog_warn, false);
 +  __touch_watchdog();
 +  }
return HRTIMER_RESTART;
 +  }
  
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is 
 already
 @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
 +  __this_cpu_write(softlockup_task_ptr_saved, current);
print_modules();
print_irqtrace_events(current);
if (regs)
 -- 
 1.7.1

>>> .
>>>
>>
>>
>>
>> -- 
>> Regards
>>
>> Chai Wen
> .
> 



-- 
Regards

Chai Wen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-26 Thread Don Zickus
On Tue, Aug 26, 2014 at 08:51:30PM +0800, Chai Wen wrote:
> On 08/22/2014 09:58 AM, Don Zickus wrote:
> 
> > On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote:
> >> For now, soft lockup detector warns once for each case of process 
> >> softlockup.
> >> But the thread 'watchdog/n' may not always get the cpu at the time slot 
> >> between
> >> the task switch of two processes hogging that cpu to reset 
> >> soft_watchdog_warn.
> >>
> >> An example would be two processes hogging the cpu.  Process A causes the
> >> softlockup warning and is killed manually by a user.  Process B immediately
> >> becomes the new process hogging the cpu preventing the softlockup code from
> >> resetting the soft_watchdog_warn variable.
> >>
> >> This case is a false negative of "warn only once for a process", as there 
> >> may
> >> be a different process that is going to hog the cpu.  Resolve this by
> >> saving/checking the task pointer of the hogging process and use that to 
> >> reset
> >> soft_watchdog_warn too.
> >>
> >> Signed-off-by: chai wen 
> >> Signed-off-by: Don Zickus 
> > 
> > Acked-by: Don Zickus 
> > 
> 
> 
> Hi Andrew
> 
> Sorry for some disturbing.
> Could you help to check and pick up this little improvement patch ?
> 
> I am not sure which MAINTAINER I should talk to, but the original version of
> this patch is queued to -mm tree by you, so I assume that they are in the 
> charge of you.
> 
> 
> thanks
> chai wen

Hi Chai,

Sorry about that.  Ingo asked me privately to pick this up and re-post
with my signoff.  I was converting to a new test env and was going to use this
patch as an excuse to exercise it.  That is the delay.  Let me get this
out today.

Cheers,
Don

> 
> >> ---
> >>  kernel/watchdog.c |   16 +++-
> >>  1 files changed, 15 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> >> index 0037db6..2e55620 100644
> >> --- a/kernel/watchdog.c
> >> +++ b/kernel/watchdog.c
> >> @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
> >>  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
> >>  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> >>  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
> >> +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
> >>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> >>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
> >>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
> >> @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> >> hrtimer *hrtimer)
> >>return HRTIMER_RESTART;
> >>  
> >>/* only warn once */
> >> -  if (__this_cpu_read(soft_watchdog_warn) == true)
> >> +  if (__this_cpu_read(soft_watchdog_warn) == true) {
> >> +  /*
> >> +   * Handle the case where multiple processes are
> >> +   * causing softlockups but the duration is small
> >> +   * enough, the softlockup detector can not reset
> >> +   * itself in time.  Use task pointers to detect this.
> >> +   */
> >> +  if (__this_cpu_read(softlockup_task_ptr_saved) !=
> >> +  current) {
> >> +  __this_cpu_write(soft_watchdog_warn, false);
> >> +  __touch_watchdog();
> >> +  }
> >>return HRTIMER_RESTART;
> >> +  }
> >>  
> >>if (softlockup_all_cpu_backtrace) {
> >>/* Prevent multiple soft-lockup reports if one cpu is 
> >> already
> >> @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> >> hrtimer *hrtimer)
> >>pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
> >>smp_processor_id(), duration,
> >>current->comm, task_pid_nr(current));
> >> +  __this_cpu_write(softlockup_task_ptr_saved, current);
> >>print_modules();
> >>print_irqtrace_events(current);
> >>if (regs)
> >> -- 
> >> 1.7.1
> >>
> > .
> > 
> 
> 
> 
> -- 
> Regards
> 
> Chai Wen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-26 Thread Chai Wen
On 08/22/2014 09:58 AM, Don Zickus wrote:

> On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote:
>> For now, soft lockup detector warns once for each case of process softlockup.
>> But the thread 'watchdog/n' may not always get the cpu at the time slot 
>> between
>> the task switch of two processes hogging that cpu to reset 
>> soft_watchdog_warn.
>>
>> An example would be two processes hogging the cpu.  Process A causes the
>> softlockup warning and is killed manually by a user.  Process B immediately
>> becomes the new process hogging the cpu preventing the softlockup code from
>> resetting the soft_watchdog_warn variable.
>>
>> This case is a false negative of "warn only once for a process", as there may
>> be a different process that is going to hog the cpu.  Resolve this by
>> saving/checking the task pointer of the hogging process and use that to reset
>> soft_watchdog_warn too.
>>
>> Signed-off-by: chai wen 
>> Signed-off-by: Don Zickus 
> 
> Acked-by: Don Zickus 
> 


Hi Andrew

Sorry for some disturbing.
Could you help to check and pick up this little improvement patch ?

I am not sure which MAINTAINER I should talk to, but the original version of
this patch is queued to -mm tree by you, so I assume that they are in the 
charge of you.


thanks
chai wen

>> ---
>>  kernel/watchdog.c |   16 +++-
>>  1 files changed, 15 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 0037db6..2e55620 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
>>  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
>>  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
>>  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
>> +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
>>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>> @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
>> hrtimer *hrtimer)
>>  return HRTIMER_RESTART;
>>  
>>  /* only warn once */
>> -if (__this_cpu_read(soft_watchdog_warn) == true)
>> +if (__this_cpu_read(soft_watchdog_warn) == true) {
>> +/*
>> + * Handle the case where multiple processes are
>> + * causing softlockups but the duration is small
>> + * enough, the softlockup detector can not reset
>> + * itself in time.  Use task pointers to detect this.
>> + */
>> +if (__this_cpu_read(softlockup_task_ptr_saved) !=
>> +current) {
>> +__this_cpu_write(soft_watchdog_warn, false);
>> +__touch_watchdog();
>> +}
>>  return HRTIMER_RESTART;
>> +}
>>  
>>  if (softlockup_all_cpu_backtrace) {
>>  /* Prevent multiple soft-lockup reports if one cpu is 
>> already
>> @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
>> hrtimer *hrtimer)
>>  pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
>>  smp_processor_id(), duration,
>>  current->comm, task_pid_nr(current));
>> +__this_cpu_write(softlockup_task_ptr_saved, current);
>>  print_modules();
>>  print_irqtrace_events(current);
>>  if (regs)
>> -- 
>> 1.7.1
>>
> .
> 



-- 
Regards

Chai Wen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-26 Thread Chai Wen
On 08/22/2014 09:58 AM, Don Zickus wrote:

 On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote:
 For now, soft lockup detector warns once for each case of process softlockup.
 But the thread 'watchdog/n' may not always get the cpu at the time slot 
 between
 the task switch of two processes hogging that cpu to reset 
 soft_watchdog_warn.

 An example would be two processes hogging the cpu.  Process A causes the
 softlockup warning and is killed manually by a user.  Process B immediately
 becomes the new process hogging the cpu preventing the softlockup code from
 resetting the soft_watchdog_warn variable.

 This case is a false negative of warn only once for a process, as there may
 be a different process that is going to hog the cpu.  Resolve this by
 saving/checking the task pointer of the hogging process and use that to reset
 soft_watchdog_warn too.

 Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
 Signed-off-by: Don Zickus dzic...@redhat.com
 
 Acked-by: Don Zickus dzic...@redhat.com
 


Hi Andrew

Sorry for some disturbing.
Could you help to check and pick up this little improvement patch ?

I am not sure which MAINTAINER I should talk to, but the original version of
this patch is queued to -mm tree by you, so I assume that they are in the 
charge of you.


thanks
chai wen

 ---
  kernel/watchdog.c |   16 +++-
  1 files changed, 15 insertions(+), 1 deletions(-)

 diff --git a/kernel/watchdog.c b/kernel/watchdog.c
 index 0037db6..2e55620 100644
 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
 +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
  #ifdef CONFIG_HARDLOCKUP_DETECTOR
  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
  return HRTIMER_RESTART;
  
  /* only warn once */
 -if (__this_cpu_read(soft_watchdog_warn) == true)
 +if (__this_cpu_read(soft_watchdog_warn) == true) {
 +/*
 + * Handle the case where multiple processes are
 + * causing softlockups but the duration is small
 + * enough, the softlockup detector can not reset
 + * itself in time.  Use task pointers to detect this.
 + */
 +if (__this_cpu_read(softlockup_task_ptr_saved) !=
 +current) {
 +__this_cpu_write(soft_watchdog_warn, false);
 +__touch_watchdog();
 +}
  return HRTIMER_RESTART;
 +}
  
  if (softlockup_all_cpu_backtrace) {
  /* Prevent multiple soft-lockup reports if one cpu is 
 already
 @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
  pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n,
  smp_processor_id(), duration,
  current-comm, task_pid_nr(current));
 +__this_cpu_write(softlockup_task_ptr_saved, current);
  print_modules();
  print_irqtrace_events(current);
  if (regs)
 -- 
 1.7.1

 .
 



-- 
Regards

Chai Wen
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-26 Thread Don Zickus
On Tue, Aug 26, 2014 at 08:51:30PM +0800, Chai Wen wrote:
 On 08/22/2014 09:58 AM, Don Zickus wrote:
 
  On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote:
  For now, soft lockup detector warns once for each case of process 
  softlockup.
  But the thread 'watchdog/n' may not always get the cpu at the time slot 
  between
  the task switch of two processes hogging that cpu to reset 
  soft_watchdog_warn.
 
  An example would be two processes hogging the cpu.  Process A causes the
  softlockup warning and is killed manually by a user.  Process B immediately
  becomes the new process hogging the cpu preventing the softlockup code from
  resetting the soft_watchdog_warn variable.
 
  This case is a false negative of warn only once for a process, as there 
  may
  be a different process that is going to hog the cpu.  Resolve this by
  saving/checking the task pointer of the hogging process and use that to 
  reset
  soft_watchdog_warn too.
 
  Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
  Signed-off-by: Don Zickus dzic...@redhat.com
  
  Acked-by: Don Zickus dzic...@redhat.com
  
 
 
 Hi Andrew
 
 Sorry for some disturbing.
 Could you help to check and pick up this little improvement patch ?
 
 I am not sure which MAINTAINER I should talk to, but the original version of
 this patch is queued to -mm tree by you, so I assume that they are in the 
 charge of you.
 
 
 thanks
 chai wen

Hi Chai,

Sorry about that.  Ingo asked me privately to pick this up and re-post
with my signoff.  I was converting to a new test env and was going to use this
patch as an excuse to exercise it.  That is the delay.  Let me get this
out today.

Cheers,
Don

 
  ---
   kernel/watchdog.c |   16 +++-
   1 files changed, 15 insertions(+), 1 deletions(-)
 
  diff --git a/kernel/watchdog.c b/kernel/watchdog.c
  index 0037db6..2e55620 100644
  --- a/kernel/watchdog.c
  +++ b/kernel/watchdog.c
  @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
   static DEFINE_PER_CPU(bool, soft_watchdog_warn);
   static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
   static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
  +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
   #ifdef CONFIG_HARDLOCKUP_DETECTOR
   static DEFINE_PER_CPU(bool, hard_watchdog_warn);
   static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
  @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
  hrtimer *hrtimer)
 return HRTIMER_RESTART;
   
 /* only warn once */
  -  if (__this_cpu_read(soft_watchdog_warn) == true)
  +  if (__this_cpu_read(soft_watchdog_warn) == true) {
  +  /*
  +   * Handle the case where multiple processes are
  +   * causing softlockups but the duration is small
  +   * enough, the softlockup detector can not reset
  +   * itself in time.  Use task pointers to detect this.
  +   */
  +  if (__this_cpu_read(softlockup_task_ptr_saved) !=
  +  current) {
  +  __this_cpu_write(soft_watchdog_warn, false);
  +  __touch_watchdog();
  +  }
 return HRTIMER_RESTART;
  +  }
   
 if (softlockup_all_cpu_backtrace) {
 /* Prevent multiple soft-lockup reports if one cpu is 
  already
  @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
  hrtimer *hrtimer)
 pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n,
 smp_processor_id(), duration,
 current-comm, task_pid_nr(current));
  +  __this_cpu_write(softlockup_task_ptr_saved, current);
 print_modules();
 print_irqtrace_events(current);
 if (regs)
  -- 
  1.7.1
 
  .
  
 
 
 
 -- 
 Regards
 
 Chai Wen
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-26 Thread Chai Wen
On 08/26/2014 10:22 PM, Don Zickus wrote:

 On Tue, Aug 26, 2014 at 08:51:30PM +0800, Chai Wen wrote:
 On 08/22/2014 09:58 AM, Don Zickus wrote:

 On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote:
 For now, soft lockup detector warns once for each case of process 
 softlockup.
 But the thread 'watchdog/n' may not always get the cpu at the time slot 
 between
 the task switch of two processes hogging that cpu to reset 
 soft_watchdog_warn.

 An example would be two processes hogging the cpu.  Process A causes the
 softlockup warning and is killed manually by a user.  Process B immediately
 becomes the new process hogging the cpu preventing the softlockup code from
 resetting the soft_watchdog_warn variable.

 This case is a false negative of warn only once for a process, as there 
 may
 be a different process that is going to hog the cpu.  Resolve this by
 saving/checking the task pointer of the hogging process and use that to 
 reset
 soft_watchdog_warn too.

 Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
 Signed-off-by: Don Zickus dzic...@redhat.com

 Acked-by: Don Zickus dzic...@redhat.com



 Hi Andrew

 Sorry for some disturbing.
 Could you help to check and pick up this little improvement patch ?

 I am not sure which MAINTAINER I should talk to, but the original version of
 this patch is queued to -mm tree by you, so I assume that they are in the 
 charge of you.


 thanks
 chai wen
 
 Hi Chai,
 
 Sorry about that.  Ingo asked me privately to pick this up and re-post
 with my signoff.  I was converting to a new test env and was going to use this
 patch as an excuse to exercise it.  That is the delay.  Let me get this
 out today.
 


OK, It is kind of you to do that, thanks for your work. :)

thanks
chai wen

 Cheers,
 Don
 

 ---
  kernel/watchdog.c |   16 +++-
  1 files changed, 15 insertions(+), 1 deletions(-)

 diff --git a/kernel/watchdog.c b/kernel/watchdog.c
 index 0037db6..2e55620 100644
 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
 +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
  #ifdef CONFIG_HARDLOCKUP_DETECTOR
  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
return HRTIMER_RESTART;
  
/* only warn once */
 -  if (__this_cpu_read(soft_watchdog_warn) == true)
 +  if (__this_cpu_read(soft_watchdog_warn) == true) {
 +  /*
 +   * Handle the case where multiple processes are
 +   * causing softlockups but the duration is small
 +   * enough, the softlockup detector can not reset
 +   * itself in time.  Use task pointers to detect this.
 +   */
 +  if (__this_cpu_read(softlockup_task_ptr_saved) !=
 +  current) {
 +  __this_cpu_write(soft_watchdog_warn, false);
 +  __touch_watchdog();
 +  }
return HRTIMER_RESTART;
 +  }
  
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is 
 already
 @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n,
smp_processor_id(), duration,
current-comm, task_pid_nr(current));
 +  __this_cpu_write(softlockup_task_ptr_saved, current);
print_modules();
print_irqtrace_events(current);
if (regs)
 -- 
 1.7.1

 .




 -- 
 Regards

 Chai Wen
 .
 



-- 
Regards

Chai Wen
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-21 Thread Don Zickus
On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote:
> For now, soft lockup detector warns once for each case of process softlockup.
> But the thread 'watchdog/n' may not always get the cpu at the time slot 
> between
> the task switch of two processes hogging that cpu to reset soft_watchdog_warn.
> 
> An example would be two processes hogging the cpu.  Process A causes the
> softlockup warning and is killed manually by a user.  Process B immediately
> becomes the new process hogging the cpu preventing the softlockup code from
> resetting the soft_watchdog_warn variable.
> 
> This case is a false negative of "warn only once for a process", as there may
> be a different process that is going to hog the cpu.  Resolve this by
> saving/checking the task pointer of the hogging process and use that to reset
> soft_watchdog_warn too.
> 
> Signed-off-by: chai wen 
> Signed-off-by: Don Zickus 

Acked-by: Don Zickus 

> ---
>  kernel/watchdog.c |   16 +++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 0037db6..2e55620 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
>  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
>  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
>  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
> +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
> @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> hrtimer *hrtimer)
>   return HRTIMER_RESTART;
>  
>   /* only warn once */
> - if (__this_cpu_read(soft_watchdog_warn) == true)
> + if (__this_cpu_read(soft_watchdog_warn) == true) {
> + /*
> +  * Handle the case where multiple processes are
> +  * causing softlockups but the duration is small
> +  * enough, the softlockup detector can not reset
> +  * itself in time.  Use task pointers to detect this.
> +  */
> + if (__this_cpu_read(softlockup_task_ptr_saved) !=
> + current) {
> + __this_cpu_write(soft_watchdog_warn, false);
> + __touch_watchdog();
> + }
>   return HRTIMER_RESTART;
> + }
>  
>   if (softlockup_all_cpu_backtrace) {
>   /* Prevent multiple soft-lockup reports if one cpu is 
> already
> @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> hrtimer *hrtimer)
>   pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
>   smp_processor_id(), duration,
>   current->comm, task_pid_nr(current));
> + __this_cpu_write(softlockup_task_ptr_saved, current);
>   print_modules();
>   print_irqtrace_events(current);
>   if (regs)
> -- 
> 1.7.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-21 Thread Chai Wen
On 08/21/2014 01:42 PM, chai wen wrote:

> For now, soft lockup detector warns once for each case of process softlockup.
> But the thread 'watchdog/n' may not always get the cpu at the time slot 
> between
> the task switch of two processes hogging that cpu to reset soft_watchdog_warn.
> 
> An example would be two processes hogging the cpu.  Process A causes the
> softlockup warning and is killed manually by a user.  Process B immediately
> becomes the new process hogging the cpu preventing the softlockup code from
> resetting the soft_watchdog_warn variable.
> 
> This case is a false negative of "warn only once for a process", as there may
> be a different process that is going to hog the cpu.  Resolve this by
> saving/checking the task pointer of the hogging process and use that to reset
> soft_watchdog_warn too.
> 
> Signed-off-by: chai wen 
> Signed-off-by: Don Zickus 


Hi Ingo & Don

Ping...

This patch is using the task pointer to check cases that softlockup can
not reset itself, and has been tested.

thanks
chai wen

> ---
>  kernel/watchdog.c |   16 +++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 0037db6..2e55620 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
>  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
>  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
>  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
> +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
> @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> hrtimer *hrtimer)
>   return HRTIMER_RESTART;
>  
>   /* only warn once */
> - if (__this_cpu_read(soft_watchdog_warn) == true)
> + if (__this_cpu_read(soft_watchdog_warn) == true) {
> + /*
> +  * Handle the case where multiple processes are
> +  * causing softlockups but the duration is small
> +  * enough, the softlockup detector can not reset
> +  * itself in time.  Use task pointers to detect this.
> +  */
> + if (__this_cpu_read(softlockup_task_ptr_saved) !=
> + current) {
> + __this_cpu_write(soft_watchdog_warn, false);
> + __touch_watchdog();
> + }
>   return HRTIMER_RESTART;
> + }
>  
>   if (softlockup_all_cpu_backtrace) {
>   /* Prevent multiple soft-lockup reports if one cpu is 
> already
> @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> hrtimer *hrtimer)
>   pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
>   smp_processor_id(), duration,
>   current->comm, task_pid_nr(current));
> + __this_cpu_write(softlockup_task_ptr_saved, current);
>   print_modules();
>   print_irqtrace_events(current);
>   if (regs)



-- 
Regards

Chai Wen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-21 Thread Chai Wen
On 08/21/2014 01:42 PM, chai wen wrote:

 For now, soft lockup detector warns once for each case of process softlockup.
 But the thread 'watchdog/n' may not always get the cpu at the time slot 
 between
 the task switch of two processes hogging that cpu to reset soft_watchdog_warn.
 
 An example would be two processes hogging the cpu.  Process A causes the
 softlockup warning and is killed manually by a user.  Process B immediately
 becomes the new process hogging the cpu preventing the softlockup code from
 resetting the soft_watchdog_warn variable.
 
 This case is a false negative of warn only once for a process, as there may
 be a different process that is going to hog the cpu.  Resolve this by
 saving/checking the task pointer of the hogging process and use that to reset
 soft_watchdog_warn too.
 
 Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
 Signed-off-by: Don Zickus dzic...@redhat.com


Hi Ingo  Don

Ping...

This patch is using the task pointer to check cases that softlockup can
not reset itself, and has been tested.

thanks
chai wen

 ---
  kernel/watchdog.c |   16 +++-
  1 files changed, 15 insertions(+), 1 deletions(-)
 
 diff --git a/kernel/watchdog.c b/kernel/watchdog.c
 index 0037db6..2e55620 100644
 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
 +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
  #ifdef CONFIG_HARDLOCKUP_DETECTOR
  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
   return HRTIMER_RESTART;
  
   /* only warn once */
 - if (__this_cpu_read(soft_watchdog_warn) == true)
 + if (__this_cpu_read(soft_watchdog_warn) == true) {
 + /*
 +  * Handle the case where multiple processes are
 +  * causing softlockups but the duration is small
 +  * enough, the softlockup detector can not reset
 +  * itself in time.  Use task pointers to detect this.
 +  */
 + if (__this_cpu_read(softlockup_task_ptr_saved) !=
 + current) {
 + __this_cpu_write(soft_watchdog_warn, false);
 + __touch_watchdog();
 + }
   return HRTIMER_RESTART;
 + }
  
   if (softlockup_all_cpu_backtrace) {
   /* Prevent multiple soft-lockup reports if one cpu is 
 already
 @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
   pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n,
   smp_processor_id(), duration,
   current-comm, task_pid_nr(current));
 + __this_cpu_write(softlockup_task_ptr_saved, current);
   print_modules();
   print_irqtrace_events(current);
   if (regs)



-- 
Regards

Chai Wen
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-21 Thread Don Zickus
On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote:
 For now, soft lockup detector warns once for each case of process softlockup.
 But the thread 'watchdog/n' may not always get the cpu at the time slot 
 between
 the task switch of two processes hogging that cpu to reset soft_watchdog_warn.
 
 An example would be two processes hogging the cpu.  Process A causes the
 softlockup warning and is killed manually by a user.  Process B immediately
 becomes the new process hogging the cpu preventing the softlockup code from
 resetting the soft_watchdog_warn variable.
 
 This case is a false negative of warn only once for a process, as there may
 be a different process that is going to hog the cpu.  Resolve this by
 saving/checking the task pointer of the hogging process and use that to reset
 soft_watchdog_warn too.
 
 Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
 Signed-off-by: Don Zickus dzic...@redhat.com

Acked-by: Don Zickus dzic...@redhat.com

 ---
  kernel/watchdog.c |   16 +++-
  1 files changed, 15 insertions(+), 1 deletions(-)
 
 diff --git a/kernel/watchdog.c b/kernel/watchdog.c
 index 0037db6..2e55620 100644
 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
 +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
  #ifdef CONFIG_HARDLOCKUP_DETECTOR
  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
   return HRTIMER_RESTART;
  
   /* only warn once */
 - if (__this_cpu_read(soft_watchdog_warn) == true)
 + if (__this_cpu_read(soft_watchdog_warn) == true) {
 + /*
 +  * Handle the case where multiple processes are
 +  * causing softlockups but the duration is small
 +  * enough, the softlockup detector can not reset
 +  * itself in time.  Use task pointers to detect this.
 +  */
 + if (__this_cpu_read(softlockup_task_ptr_saved) !=
 + current) {
 + __this_cpu_write(soft_watchdog_warn, false);
 + __touch_watchdog();
 + }
   return HRTIMER_RESTART;
 + }
  
   if (softlockup_all_cpu_backtrace) {
   /* Prevent multiple soft-lockup reports if one cpu is 
 already
 @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
   pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n,
   smp_processor_id(), duration,
   current-comm, task_pid_nr(current));
 + __this_cpu_write(softlockup_task_ptr_saved, current);
   print_modules();
   print_irqtrace_events(current);
   if (regs)
 -- 
 1.7.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-20 Thread chai wen
For now, soft lockup detector warns once for each case of process softlockup.
But the thread 'watchdog/n' may not always get the cpu at the time slot between
the task switch of two processes hogging that cpu to reset soft_watchdog_warn.

An example would be two processes hogging the cpu.  Process A causes the
softlockup warning and is killed manually by a user.  Process B immediately
becomes the new process hogging the cpu preventing the softlockup code from
resetting the soft_watchdog_warn variable.

This case is a false negative of "warn only once for a process", as there may
be a different process that is going to hog the cpu.  Resolve this by
saving/checking the task pointer of the hogging process and use that to reset
soft_watchdog_warn too.

Signed-off-by: chai wen 
Signed-off-by: Don Zickus 
---
 kernel/watchdog.c |   16 +++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 0037db6..2e55620 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
+static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
@@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
return HRTIMER_RESTART;
 
/* only warn once */
-   if (__this_cpu_read(soft_watchdog_warn) == true)
+   if (__this_cpu_read(soft_watchdog_warn) == true) {
+   /*
+* Handle the case where multiple processes are
+* causing softlockups but the duration is small
+* enough, the softlockup detector can not reset
+* itself in time.  Use task pointers to detect this.
+*/
+   if (__this_cpu_read(softlockup_task_ptr_saved) !=
+   current) {
+   __this_cpu_write(soft_watchdog_warn, false);
+   __touch_watchdog();
+   }
return HRTIMER_RESTART;
+   }
 
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is 
already
@@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
+   __this_cpu_write(softlockup_task_ptr_saved, current);
print_modules();
print_irqtrace_events(current);
if (regs)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-20 Thread chai wen
For now, soft lockup detector warns once for each case of process softlockup.
But the thread 'watchdog/n' may not always get the cpu at the time slot between
the task switch of two processes hogging that cpu to reset soft_watchdog_warn.

An example would be two processes hogging the cpu.  Process A causes the
softlockup warning and is killed manually by a user.  Process B immediately
becomes the new process hogging the cpu preventing the softlockup code from
resetting the soft_watchdog_warn variable.

This case is a false negative of warn only once for a process, as there may
be a different process that is going to hog the cpu.  Resolve this by
saving/checking the task pointer of the hogging process and use that to reset
soft_watchdog_warn too.

Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
Signed-off-by: Don Zickus dzic...@redhat.com
---
 kernel/watchdog.c |   16 +++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 0037db6..2e55620 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
+static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
@@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
return HRTIMER_RESTART;
 
/* only warn once */
-   if (__this_cpu_read(soft_watchdog_warn) == true)
+   if (__this_cpu_read(soft_watchdog_warn) == true) {
+   /*
+* Handle the case where multiple processes are
+* causing softlockups but the duration is small
+* enough, the softlockup detector can not reset
+* itself in time.  Use task pointers to detect this.
+*/
+   if (__this_cpu_read(softlockup_task_ptr_saved) !=
+   current) {
+   __this_cpu_write(soft_watchdog_warn, false);
+   __touch_watchdog();
+   }
return HRTIMER_RESTART;
+   }
 
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is 
already
@@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n,
smp_processor_id(), duration,
current-comm, task_pid_nr(current));
+   __this_cpu_write(softlockup_task_ptr_saved, current);
print_modules();
print_irqtrace_events(current);
if (regs)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/