Re: [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu
On Thu, Aug 28, 2014 at 04:07:23PM -0700, Andrew Morton wrote: > On Thu, 28 Aug 2014 00:52:24 -0400 Don Zickus wrote: > > > From: chai wen > > > > For now, soft lockup detector warns once for each case of process > > softlockup. > > But the thread 'watchdog/n' may not always get the cpu at the time slot > > between > > the task switch of two processes hogging that cpu to reset > > soft_watchdog_warn. > > > > An example would be two processes hogging the cpu. Process A causes the > > softlockup warning and is killed manually by a user. Process B immediately > > becomes the new process hogging the cpu preventing the softlockup code from > > resetting the soft_watchdog_warn variable. > > > > This case is a false negative of "warn only once for a process", as there > > may > > be a different process that is going to hog the cpu. Resolve this by > > saving/checking the task pointer of the hogging process and use that to > > reset > > soft_watchdog_warn too. > > > > OK, this should address the PID uniqueness issue which Ingo identified. > > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); > > static DEFINE_PER_CPU(bool, soft_watchdog_warn); > > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > > static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); > > +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > static DEFINE_PER_CPU(bool, hard_watchdog_warn); > > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > > @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct > > hrtimer *hrtimer) > > return HRTIMER_RESTART; > > > > /* only warn once */ > > - if (__this_cpu_read(soft_watchdog_warn) == true) > > + if (__this_cpu_read(soft_watchdog_warn) == true) { > > + /* > > +* Handle the case where multiple processes are > > +* causing softlockups but the duration is small > > +* enough, the softlockup detector can not reset > > +* itself in time. Use task pointers to detect this. > > +*/ > > This comment is rather hard to follow ("the duration" of what?). Can > you think of some words which are a bit more complete/clear? Agreed. Does this work better? " /* * When multiple processes are causing softlockups * the softlockup detector only warns on the first * one because the code relies on a full quiet cycle * to re-arm. The second process prevents the * quiet cycle and never gets reported. Use task * pointers to detect this. */ Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu
On Thu, 28 Aug 2014 00:52:24 -0400 Don Zickus wrote: > From: chai wen > > For now, soft lockup detector warns once for each case of process softlockup. > But the thread 'watchdog/n' may not always get the cpu at the time slot > between > the task switch of two processes hogging that cpu to reset soft_watchdog_warn. > > An example would be two processes hogging the cpu. Process A causes the > softlockup warning and is killed manually by a user. Process B immediately > becomes the new process hogging the cpu preventing the softlockup code from > resetting the soft_watchdog_warn variable. > > This case is a false negative of "warn only once for a process", as there may > be a different process that is going to hog the cpu. Resolve this by > saving/checking the task pointer of the hogging process and use that to reset > soft_watchdog_warn too. > OK, this should address the PID uniqueness issue which Ingo identified. > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); > static DEFINE_PER_CPU(bool, soft_watchdog_warn); > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); > +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); > #ifdef CONFIG_HARDLOCKUP_DETECTOR > static DEFINE_PER_CPU(bool, hard_watchdog_warn); > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct > hrtimer *hrtimer) > return HRTIMER_RESTART; > > /* only warn once */ > - if (__this_cpu_read(soft_watchdog_warn) == true) > + if (__this_cpu_read(soft_watchdog_warn) == true) { > + /* > + * Handle the case where multiple processes are > + * causing softlockups but the duration is small > + * enough, the softlockup detector can not reset > + * itself in time. Use task pointers to detect this. > + */ This comment is rather hard to follow ("the duration" of what?). Can you think of some words which are a bit more complete/clear? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu
On Thu, 28 Aug 2014 00:52:24 -0400 Don Zickus dzic...@redhat.com wrote: From: chai wen chaiw.f...@cn.fujitsu.com For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. OK, this should address the PID uniqueness issue which Ingo identified. --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use task pointers to detect this. + */ This comment is rather hard to follow (the duration of what?). Can you think of some words which are a bit more complete/clear? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu
On Thu, Aug 28, 2014 at 04:07:23PM -0700, Andrew Morton wrote: On Thu, 28 Aug 2014 00:52:24 -0400 Don Zickus dzic...@redhat.com wrote: From: chai wen chaiw.f...@cn.fujitsu.com For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. OK, this should address the PID uniqueness issue which Ingo identified. --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use task pointers to detect this. +*/ This comment is rather hard to follow (the duration of what?). Can you think of some words which are a bit more complete/clear? Agreed. Does this work better? /* * When multiple processes are causing softlockups * the softlockup detector only warns on the first * one because the code relies on a full quiet cycle * to re-arm. The second process prevents the * quiet cycle and never gets reported. Use task * pointers to detect this. */ Cheers, Don -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu
From: chai wen For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of "warn only once for a process", as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen Signed-off-by: Don Zickus --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index c3319bd..499f65f 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use task pointers to detect this. +*/ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -348,6 +361,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu
From: chai wen chaiw.f...@cn.fujitsu.com For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com Signed-off-by: Don Zickus dzic...@redhat.com --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index c3319bd..499f65f 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use task pointers to detect this. +*/ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -348,6 +361,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) printk(KERN_EMERG BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n, smp_processor_id(), duration, current-comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/26/2014 10:22 PM, Don Zickus wrote: > On Tue, Aug 26, 2014 at 08:51:30PM +0800, Chai Wen wrote: >> On 08/22/2014 09:58 AM, Don Zickus wrote: >> >>> On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote: For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of "warn only once for a process", as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen Signed-off-by: Don Zickus >>> >>> Acked-by: Don Zickus >>> >> >> >> Hi Andrew >> >> Sorry for some disturbing. >> Could you help to check and pick up this little improvement patch ? >> >> I am not sure which MAINTAINER I should talk to, but the original version of >> this patch is queued to -mm tree by you, so I assume that they are in the >> charge of you. >> >> >> thanks >> chai wen > > Hi Chai, > > Sorry about that. Ingo asked me privately to pick this up and re-post > with my signoff. I was converting to a new test env and was going to use this > patch as an excuse to exercise it. That is the delay. Let me get this > out today. > OK, It is kind of you to do that, thanks for your work. :) thanks chai wen > Cheers, > Don > >> --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 0037db6..2e55620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use task pointers to detect this. + */ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 >>> . >>> >> >> >> >> -- >> Regards >> >> Chai Wen > . > -- Regards Chai Wen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On Tue, Aug 26, 2014 at 08:51:30PM +0800, Chai Wen wrote: > On 08/22/2014 09:58 AM, Don Zickus wrote: > > > On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote: > >> For now, soft lockup detector warns once for each case of process > >> softlockup. > >> But the thread 'watchdog/n' may not always get the cpu at the time slot > >> between > >> the task switch of two processes hogging that cpu to reset > >> soft_watchdog_warn. > >> > >> An example would be two processes hogging the cpu. Process A causes the > >> softlockup warning and is killed manually by a user. Process B immediately > >> becomes the new process hogging the cpu preventing the softlockup code from > >> resetting the soft_watchdog_warn variable. > >> > >> This case is a false negative of "warn only once for a process", as there > >> may > >> be a different process that is going to hog the cpu. Resolve this by > >> saving/checking the task pointer of the hogging process and use that to > >> reset > >> soft_watchdog_warn too. > >> > >> Signed-off-by: chai wen > >> Signed-off-by: Don Zickus > > > > Acked-by: Don Zickus > > > > > Hi Andrew > > Sorry for some disturbing. > Could you help to check and pick up this little improvement patch ? > > I am not sure which MAINTAINER I should talk to, but the original version of > this patch is queued to -mm tree by you, so I assume that they are in the > charge of you. > > > thanks > chai wen Hi Chai, Sorry about that. Ingo asked me privately to pick this up and re-post with my signoff. I was converting to a new test env and was going to use this patch as an excuse to exercise it. That is the delay. Let me get this out today. Cheers, Don > > >> --- > >> kernel/watchdog.c | 16 +++- > >> 1 files changed, 15 insertions(+), 1 deletions(-) > >> > >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c > >> index 0037db6..2e55620 100644 > >> --- a/kernel/watchdog.c > >> +++ b/kernel/watchdog.c > >> @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); > >> static DEFINE_PER_CPU(bool, soft_watchdog_warn); > >> static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > >> static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); > >> +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); > >> #ifdef CONFIG_HARDLOCKUP_DETECTOR > >> static DEFINE_PER_CPU(bool, hard_watchdog_warn); > >> static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > >> @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct > >> hrtimer *hrtimer) > >>return HRTIMER_RESTART; > >> > >>/* only warn once */ > >> - if (__this_cpu_read(soft_watchdog_warn) == true) > >> + if (__this_cpu_read(soft_watchdog_warn) == true) { > >> + /* > >> + * Handle the case where multiple processes are > >> + * causing softlockups but the duration is small > >> + * enough, the softlockup detector can not reset > >> + * itself in time. Use task pointers to detect this. > >> + */ > >> + if (__this_cpu_read(softlockup_task_ptr_saved) != > >> + current) { > >> + __this_cpu_write(soft_watchdog_warn, false); > >> + __touch_watchdog(); > >> + } > >>return HRTIMER_RESTART; > >> + } > >> > >>if (softlockup_all_cpu_backtrace) { > >>/* Prevent multiple soft-lockup reports if one cpu is > >> already > >> @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct > >> hrtimer *hrtimer) > >>pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > >>smp_processor_id(), duration, > >>current->comm, task_pid_nr(current)); > >> + __this_cpu_write(softlockup_task_ptr_saved, current); > >>print_modules(); > >>print_irqtrace_events(current); > >>if (regs) > >> -- > >> 1.7.1 > >> > > . > > > > > > -- > Regards > > Chai Wen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/22/2014 09:58 AM, Don Zickus wrote: > On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote: >> For now, soft lockup detector warns once for each case of process softlockup. >> But the thread 'watchdog/n' may not always get the cpu at the time slot >> between >> the task switch of two processes hogging that cpu to reset >> soft_watchdog_warn. >> >> An example would be two processes hogging the cpu. Process A causes the >> softlockup warning and is killed manually by a user. Process B immediately >> becomes the new process hogging the cpu preventing the softlockup code from >> resetting the soft_watchdog_warn variable. >> >> This case is a false negative of "warn only once for a process", as there may >> be a different process that is going to hog the cpu. Resolve this by >> saving/checking the task pointer of the hogging process and use that to reset >> soft_watchdog_warn too. >> >> Signed-off-by: chai wen >> Signed-off-by: Don Zickus > > Acked-by: Don Zickus > Hi Andrew Sorry for some disturbing. Could you help to check and pick up this little improvement patch ? I am not sure which MAINTAINER I should talk to, but the original version of this patch is queued to -mm tree by you, so I assume that they are in the charge of you. thanks chai wen >> --- >> kernel/watchdog.c | 16 +++- >> 1 files changed, 15 insertions(+), 1 deletions(-) >> >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c >> index 0037db6..2e55620 100644 >> --- a/kernel/watchdog.c >> +++ b/kernel/watchdog.c >> @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); >> static DEFINE_PER_CPU(bool, soft_watchdog_warn); >> static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); >> static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); >> +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); >> #ifdef CONFIG_HARDLOCKUP_DETECTOR >> static DEFINE_PER_CPU(bool, hard_watchdog_warn); >> static DEFINE_PER_CPU(bool, watchdog_nmi_touch); >> @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct >> hrtimer *hrtimer) >> return HRTIMER_RESTART; >> >> /* only warn once */ >> -if (__this_cpu_read(soft_watchdog_warn) == true) >> +if (__this_cpu_read(soft_watchdog_warn) == true) { >> +/* >> + * Handle the case where multiple processes are >> + * causing softlockups but the duration is small >> + * enough, the softlockup detector can not reset >> + * itself in time. Use task pointers to detect this. >> + */ >> +if (__this_cpu_read(softlockup_task_ptr_saved) != >> +current) { >> +__this_cpu_write(soft_watchdog_warn, false); >> +__touch_watchdog(); >> +} >> return HRTIMER_RESTART; >> +} >> >> if (softlockup_all_cpu_backtrace) { >> /* Prevent multiple soft-lockup reports if one cpu is >> already >> @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct >> hrtimer *hrtimer) >> pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", >> smp_processor_id(), duration, >> current->comm, task_pid_nr(current)); >> +__this_cpu_write(softlockup_task_ptr_saved, current); >> print_modules(); >> print_irqtrace_events(current); >> if (regs) >> -- >> 1.7.1 >> > . > -- Regards Chai Wen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/22/2014 09:58 AM, Don Zickus wrote: On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote: For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com Signed-off-by: Don Zickus dzic...@redhat.com Acked-by: Don Zickus dzic...@redhat.com Hi Andrew Sorry for some disturbing. Could you help to check and pick up this little improvement patch ? I am not sure which MAINTAINER I should talk to, but the original version of this patch is queued to -mm tree by you, so I assume that they are in the charge of you. thanks chai wen --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 0037db6..2e55620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ -if (__this_cpu_read(soft_watchdog_warn) == true) +if (__this_cpu_read(soft_watchdog_warn) == true) { +/* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use task pointers to detect this. + */ +if (__this_cpu_read(softlockup_task_ptr_saved) != +current) { +__this_cpu_write(soft_watchdog_warn, false); +__touch_watchdog(); +} return HRTIMER_RESTART; +} if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n, smp_processor_id(), duration, current-comm, task_pid_nr(current)); +__this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 . -- Regards Chai Wen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On Tue, Aug 26, 2014 at 08:51:30PM +0800, Chai Wen wrote: On 08/22/2014 09:58 AM, Don Zickus wrote: On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote: For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com Signed-off-by: Don Zickus dzic...@redhat.com Acked-by: Don Zickus dzic...@redhat.com Hi Andrew Sorry for some disturbing. Could you help to check and pick up this little improvement patch ? I am not sure which MAINTAINER I should talk to, but the original version of this patch is queued to -mm tree by you, so I assume that they are in the charge of you. thanks chai wen Hi Chai, Sorry about that. Ingo asked me privately to pick this up and re-post with my signoff. I was converting to a new test env and was going to use this patch as an excuse to exercise it. That is the delay. Let me get this out today. Cheers, Don --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 0037db6..2e55620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use task pointers to detect this. + */ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n, smp_processor_id(), duration, current-comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 . -- Regards Chai Wen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/26/2014 10:22 PM, Don Zickus wrote: On Tue, Aug 26, 2014 at 08:51:30PM +0800, Chai Wen wrote: On 08/22/2014 09:58 AM, Don Zickus wrote: On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote: For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com Signed-off-by: Don Zickus dzic...@redhat.com Acked-by: Don Zickus dzic...@redhat.com Hi Andrew Sorry for some disturbing. Could you help to check and pick up this little improvement patch ? I am not sure which MAINTAINER I should talk to, but the original version of this patch is queued to -mm tree by you, so I assume that they are in the charge of you. thanks chai wen Hi Chai, Sorry about that. Ingo asked me privately to pick this up and re-post with my signoff. I was converting to a new test env and was going to use this patch as an excuse to exercise it. That is the delay. Let me get this out today. OK, It is kind of you to do that, thanks for your work. :) thanks chai wen Cheers, Don --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 0037db6..2e55620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use task pointers to detect this. + */ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n, smp_processor_id(), duration, current-comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 . -- Regards Chai Wen . -- Regards Chai Wen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote: > For now, soft lockup detector warns once for each case of process softlockup. > But the thread 'watchdog/n' may not always get the cpu at the time slot > between > the task switch of two processes hogging that cpu to reset soft_watchdog_warn. > > An example would be two processes hogging the cpu. Process A causes the > softlockup warning and is killed manually by a user. Process B immediately > becomes the new process hogging the cpu preventing the softlockup code from > resetting the soft_watchdog_warn variable. > > This case is a false negative of "warn only once for a process", as there may > be a different process that is going to hog the cpu. Resolve this by > saving/checking the task pointer of the hogging process and use that to reset > soft_watchdog_warn too. > > Signed-off-by: chai wen > Signed-off-by: Don Zickus Acked-by: Don Zickus > --- > kernel/watchdog.c | 16 +++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 0037db6..2e55620 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); > static DEFINE_PER_CPU(bool, soft_watchdog_warn); > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); > +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); > #ifdef CONFIG_HARDLOCKUP_DETECTOR > static DEFINE_PER_CPU(bool, hard_watchdog_warn); > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct > hrtimer *hrtimer) > return HRTIMER_RESTART; > > /* only warn once */ > - if (__this_cpu_read(soft_watchdog_warn) == true) > + if (__this_cpu_read(soft_watchdog_warn) == true) { > + /* > + * Handle the case where multiple processes are > + * causing softlockups but the duration is small > + * enough, the softlockup detector can not reset > + * itself in time. Use task pointers to detect this. > + */ > + if (__this_cpu_read(softlockup_task_ptr_saved) != > + current) { > + __this_cpu_write(soft_watchdog_warn, false); > + __touch_watchdog(); > + } > return HRTIMER_RESTART; > + } > > if (softlockup_all_cpu_backtrace) { > /* Prevent multiple soft-lockup reports if one cpu is > already > @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct > hrtimer *hrtimer) > pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > smp_processor_id(), duration, > current->comm, task_pid_nr(current)); > + __this_cpu_write(softlockup_task_ptr_saved, current); > print_modules(); > print_irqtrace_events(current); > if (regs) > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/21/2014 01:42 PM, chai wen wrote: > For now, soft lockup detector warns once for each case of process softlockup. > But the thread 'watchdog/n' may not always get the cpu at the time slot > between > the task switch of two processes hogging that cpu to reset soft_watchdog_warn. > > An example would be two processes hogging the cpu. Process A causes the > softlockup warning and is killed manually by a user. Process B immediately > becomes the new process hogging the cpu preventing the softlockup code from > resetting the soft_watchdog_warn variable. > > This case is a false negative of "warn only once for a process", as there may > be a different process that is going to hog the cpu. Resolve this by > saving/checking the task pointer of the hogging process and use that to reset > soft_watchdog_warn too. > > Signed-off-by: chai wen > Signed-off-by: Don Zickus Hi Ingo & Don Ping... This patch is using the task pointer to check cases that softlockup can not reset itself, and has been tested. thanks chai wen > --- > kernel/watchdog.c | 16 +++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 0037db6..2e55620 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); > static DEFINE_PER_CPU(bool, soft_watchdog_warn); > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); > +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); > #ifdef CONFIG_HARDLOCKUP_DETECTOR > static DEFINE_PER_CPU(bool, hard_watchdog_warn); > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct > hrtimer *hrtimer) > return HRTIMER_RESTART; > > /* only warn once */ > - if (__this_cpu_read(soft_watchdog_warn) == true) > + if (__this_cpu_read(soft_watchdog_warn) == true) { > + /* > + * Handle the case where multiple processes are > + * causing softlockups but the duration is small > + * enough, the softlockup detector can not reset > + * itself in time. Use task pointers to detect this. > + */ > + if (__this_cpu_read(softlockup_task_ptr_saved) != > + current) { > + __this_cpu_write(soft_watchdog_warn, false); > + __touch_watchdog(); > + } > return HRTIMER_RESTART; > + } > > if (softlockup_all_cpu_backtrace) { > /* Prevent multiple soft-lockup reports if one cpu is > already > @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct > hrtimer *hrtimer) > pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > smp_processor_id(), duration, > current->comm, task_pid_nr(current)); > + __this_cpu_write(softlockup_task_ptr_saved, current); > print_modules(); > print_irqtrace_events(current); > if (regs) -- Regards Chai Wen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/21/2014 01:42 PM, chai wen wrote: For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com Signed-off-by: Don Zickus dzic...@redhat.com Hi Ingo Don Ping... This patch is using the task pointer to check cases that softlockup can not reset itself, and has been tested. thanks chai wen --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 0037db6..2e55620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use task pointers to detect this. + */ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n, smp_processor_id(), duration, current-comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- Regards Chai Wen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
On Thu, Aug 21, 2014 at 01:42:22PM +0800, chai wen wrote: For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com Signed-off-by: Don Zickus dzic...@redhat.com Acked-by: Don Zickus dzic...@redhat.com --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 0037db6..2e55620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use task pointers to detect this. + */ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n, smp_processor_id(), duration, current-comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of "warn only once for a process", as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen Signed-off-by: Don Zickus --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 0037db6..2e55620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use task pointers to detect this. +*/ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] softlockup: make detector be aware of task switch of processes hogging cpu
For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com Signed-off-by: Don Zickus dzic...@redhat.com --- kernel/watchdog.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 0037db6..2e55620 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -328,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use task pointers to detect this. +*/ + if (__this_cpu_read(softlockup_task_ptr_saved) != + current) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -345,6 +358,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) pr_emerg(BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n, smp_processor_id(), duration, current-comm, task_pid_nr(current)); + __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/