Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Thu, Aug 21, 2014 at 09:37:04AM +0800, Chai Wen wrote: > On 08/19/2014 09:36 AM, Chai Wen wrote: > > > On 08/19/2014 04:38 AM, Don Zickus wrote: > > > >> On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: > >>> > >>> * Don Zickus wrote: > >>> > >>> So I agree with the motivation of this improvement, but > >>> is this implementation namespace-safe? > >> > >> What namespace are you worried about colliding with? I > >> thought softlockup_ would provide the safety?? Maybe I > >> am missing something obvious. :-( > > > > I meant PID namespaces - a PID in itself isn't guaranteed > > to be unique across the system. > > Ah, I don't think we thought about that. Is there a better > way to do this? Is there a domain id or something that can > be OR'd with the pid? > >>> > >>> What is always unique is the task pointer itself. We use pids > >>> when we interface with user-space - but we don't really do that > >>> here, right? > >> > >> No, I don't believe so. Ok, so saving 'current' and comparing that should > >> be enough, correct? > >> > > > > > > I am not sure of the safety about using pid here with namespace. > > But as to the pointer of process, is there a chance that we got a > > 'historical' > > address saved in the 'softlockup_warn_pid(or address)_saved' and the current > > hogging process happened to get the same task pointer address? > > If it never happens, I think the comparing of address is ok. > > > > > Hi Ingo > > what do you think of Don's solution- 'comparing of task pointer' ? > Anyway this is just an additional check about some very special cases, > so I think the issue that I am concerned above is not a problem at all. > And after learning some concepts about PID namespace, I think comparing > of task pointer is reliable dealing with PID namespace here. > > And Don, If you want me to re-post this patch, please let me know that. Sure, just quickly test with the task pointer to make sure it still works and then re-post. Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/19/2014 09:36 AM, Chai Wen wrote: > On 08/19/2014 04:38 AM, Don Zickus wrote: > >> On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: >>> >>> * Don Zickus wrote: >>> >>> So I agree with the motivation of this improvement, but >>> is this implementation namespace-safe? >> >> What namespace are you worried about colliding with? I >> thought softlockup_ would provide the safety?? Maybe I >> am missing something obvious. :-( > > I meant PID namespaces - a PID in itself isn't guaranteed > to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? >>> >>> What is always unique is the task pointer itself. We use pids >>> when we interface with user-space - but we don't really do that >>> here, right? >> >> No, I don't believe so. Ok, so saving 'current' and comparing that should >> be enough, correct? >> > > > I am not sure of the safety about using pid here with namespace. > But as to the pointer of process, is there a chance that we got a 'historical' > address saved in the 'softlockup_warn_pid(or address)_saved' and the current > hogging process happened to get the same task pointer address? > If it never happens, I think the comparing of address is ok. > Hi Ingo what do you think of Don's solution- 'comparing of task pointer' ? Anyway this is just an additional check about some very special cases, so I think the issue that I am concerned above is not a problem at all. And after learning some concepts about PID namespace, I think comparing of task pointer is reliable dealing with PID namespace here. And Don, If you want me to re-post this patch, please let me know that. thanks chai wen > thanks > chai wen > >> Cheers, >> Don >> . >> > > > -- Regards Chai Wen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/19/2014 09:36 AM, Chai Wen wrote: On 08/19/2014 04:38 AM, Don Zickus wrote: On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: * Don Zickus dzic...@redhat.com wrote: So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? What is always unique is the task pointer itself. We use pids when we interface with user-space - but we don't really do that here, right? No, I don't believe so. Ok, so saving 'current' and comparing that should be enough, correct? I am not sure of the safety about using pid here with namespace. But as to the pointer of process, is there a chance that we got a 'historical' address saved in the 'softlockup_warn_pid(or address)_saved' and the current hogging process happened to get the same task pointer address? If it never happens, I think the comparing of address is ok. Hi Ingo what do you think of Don's solution- 'comparing of task pointer' ? Anyway this is just an additional check about some very special cases, so I think the issue that I am concerned above is not a problem at all. And after learning some concepts about PID namespace, I think comparing of task pointer is reliable dealing with PID namespace here. And Don, If you want me to re-post this patch, please let me know that. thanks chai wen thanks chai wen Cheers, Don . -- Regards Chai Wen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Thu, Aug 21, 2014 at 09:37:04AM +0800, Chai Wen wrote: On 08/19/2014 09:36 AM, Chai Wen wrote: On 08/19/2014 04:38 AM, Don Zickus wrote: On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: * Don Zickus dzic...@redhat.com wrote: So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? What is always unique is the task pointer itself. We use pids when we interface with user-space - but we don't really do that here, right? No, I don't believe so. Ok, so saving 'current' and comparing that should be enough, correct? I am not sure of the safety about using pid here with namespace. But as to the pointer of process, is there a chance that we got a 'historical' address saved in the 'softlockup_warn_pid(or address)_saved' and the current hogging process happened to get the same task pointer address? If it never happens, I think the comparing of address is ok. Hi Ingo what do you think of Don's solution- 'comparing of task pointer' ? Anyway this is just an additional check about some very special cases, so I think the issue that I am concerned above is not a problem at all. And after learning some concepts about PID namespace, I think comparing of task pointer is reliable dealing with PID namespace here. And Don, If you want me to re-post this patch, please let me know that. Sure, just quickly test with the task pointer to make sure it still works and then re-post. Cheers, Don -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/19/2014 04:38 AM, Don Zickus wrote: > On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: >> >> * Don Zickus wrote: >> >> So I agree with the motivation of this improvement, but >> is this implementation namespace-safe? > > What namespace are you worried about colliding with? I > thought softlockup_ would provide the safety?? Maybe I > am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. >>> >>> Ah, I don't think we thought about that. Is there a better >>> way to do this? Is there a domain id or something that can >>> be OR'd with the pid? >> >> What is always unique is the task pointer itself. We use pids >> when we interface with user-space - but we don't really do that >> here, right? > > No, I don't believe so. Ok, so saving 'current' and comparing that should > be enough, correct? > I am not sure of the safety about using pid here with namespace. But as to the pointer of process, is there a chance that we got a 'historical' address saved in the 'softlockup_warn_pid(or address)_saved' and the current hogging process happened to get the same task pointer address? If it never happens, I think the comparing of address is ok. thanks chai wen > Cheers, > Don > . > -- Regards Chai Wen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: > > * Don Zickus wrote: > > > > > > So I agree with the motivation of this improvement, but > > > > > is this implementation namespace-safe? > > > > > > > > What namespace are you worried about colliding with? I > > > > thought softlockup_ would provide the safety?? Maybe I > > > > am missing something obvious. :-( > > > > > > I meant PID namespaces - a PID in itself isn't guaranteed > > > to be unique across the system. > > > > Ah, I don't think we thought about that. Is there a better > > way to do this? Is there a domain id or something that can > > be OR'd with the pid? > > What is always unique is the task pointer itself. We use pids > when we interface with user-space - but we don't really do that > here, right? No, I don't believe so. Ok, so saving 'current' and comparing that should be enough, correct? Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
* Don Zickus wrote: > > > > So I agree with the motivation of this improvement, but > > > > is this implementation namespace-safe? > > > > > > What namespace are you worried about colliding with? I > > > thought softlockup_ would provide the safety?? Maybe I > > > am missing something obvious. :-( > > > > I meant PID namespaces - a PID in itself isn't guaranteed > > to be unique across the system. > > Ah, I don't think we thought about that. Is there a better > way to do this? Is there a domain id or something that can > be OR'd with the pid? What is always unique is the task pointer itself. We use pids when we interface with user-space - but we don't really do that here, right? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Mon, Aug 18, 2014 at 08:01:58PM +0200, Ingo Molnar wrote: > > > > duration = is_softlockup(touch_ts); > > > > if (unlikely(duration)) { > > > > + pid_t pid = task_pid_nr(current); > > > > + > > > > /* > > > > * If a virtual machine is stopped by the host it can > > > > look to > > > > * the watchdog like a soft lockup, check to see if the > > > > host > > > > @@ -326,8 +329,20 @@ static enum hrtimer_restart > > > > watchdog_timer_fn(struct hrtimer *hrtimer) > > > > return HRTIMER_RESTART; > > > > > > > > /* only warn once */ > > > > - if (__this_cpu_read(soft_watchdog_warn) == true) > > > > + if (__this_cpu_read(soft_watchdog_warn) == true) { > > > > + > > > > + /* > > > > +* Handle the case where multiple processes are > > > > +* causing softlockups but the duration is small > > > > +* enough, the softlockup detector can not reset > > > > +* itself in time. Use pids to detect this. > > > > +*/ > > > > + if (__this_cpu_read(softlockup_warn_pid_saved) > > > > != pid) { > > > > > > So I agree with the motivation of this improvement, but is this > > > implementation namespace-safe? > > > > What namespace are you worried about colliding with? I thought > > softlockup_ would provide the safety?? Maybe I am missing something > > obvious. :-( > > I meant PID namespaces - a PID in itself isn't guaranteed to be > unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
* Don Zickus wrote: > On Mon, Aug 18, 2014 at 11:03:19AM +0200, Ingo Molnar wrote: > > * Don Zickus wrote: > > > > > From: chai wen > > > > > > For now, soft lockup detector warns once for each case of process > > > softlockup. > > > But the thread 'watchdog/n' may not always get the cpu at the time slot > > > between > > > the task switch of two processes hogging that cpu to reset > > > soft_watchdog_warn. > > > > > > An example would be two processes hogging the cpu. Process A causes the > > > softlockup warning and is killed manually by a user. Process B > > > immediately > > > becomes the new process hogging the cpu preventing the softlockup code > > > from > > > resetting the soft_watchdog_warn variable. > > > > > > This case is a false negative of "warn only once for a process", as there > > > may > > > be a different process that is going to hog the cpu. Resolve this by > > > saving/checking the pid of the hogging process and use that to reset > > > soft_watchdog_warn too. > > > > > > Signed-off-by: chai wen > > > [modified the comment and changelog to be more specific] > > > Signed-off-by: Don Zickus > > > --- > > > kernel/watchdog.c | 20 ++-- > > > 1 files changed, 18 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > > index 4c2e11c..6d0a891 100644 > > > --- a/kernel/watchdog.c > > > +++ b/kernel/watchdog.c > > > @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); > > > static DEFINE_PER_CPU(bool, soft_watchdog_warn); > > > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > > > static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); > > > +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved); > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > > static DEFINE_PER_CPU(bool, hard_watchdog_warn); > > > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > > > @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct > > > hrtimer *hrtimer) > > >*/ > > > duration = is_softlockup(touch_ts); > > > if (unlikely(duration)) { > > > + pid_t pid = task_pid_nr(current); > > > + > > > /* > > >* If a virtual machine is stopped by the host it can look to > > >* the watchdog like a soft lockup, check to see if the host > > > @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct > > > hrtimer *hrtimer) > > > return HRTIMER_RESTART; > > > > > > /* only warn once */ > > > - if (__this_cpu_read(soft_watchdog_warn) == true) > > > + if (__this_cpu_read(soft_watchdog_warn) == true) { > > > + > > > + /* > > > + * Handle the case where multiple processes are > > > + * causing softlockups but the duration is small > > > + * enough, the softlockup detector can not reset > > > + * itself in time. Use pids to detect this. > > > + */ > > > + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { > > > > So I agree with the motivation of this improvement, but is this > > implementation namespace-safe? > > What namespace are you worried about colliding with? I thought > softlockup_ would provide the safety?? Maybe I am missing something > obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Mon, Aug 18, 2014 at 11:03:19AM +0200, Ingo Molnar wrote: > * Don Zickus wrote: > > > From: chai wen > > > > For now, soft lockup detector warns once for each case of process > > softlockup. > > But the thread 'watchdog/n' may not always get the cpu at the time slot > > between > > the task switch of two processes hogging that cpu to reset > > soft_watchdog_warn. > > > > An example would be two processes hogging the cpu. Process A causes the > > softlockup warning and is killed manually by a user. Process B immediately > > becomes the new process hogging the cpu preventing the softlockup code from > > resetting the soft_watchdog_warn variable. > > > > This case is a false negative of "warn only once for a process", as there > > may > > be a different process that is going to hog the cpu. Resolve this by > > saving/checking the pid of the hogging process and use that to reset > > soft_watchdog_warn too. > > > > Signed-off-by: chai wen > > [modified the comment and changelog to be more specific] > > Signed-off-by: Don Zickus > > --- > > kernel/watchdog.c | 20 ++-- > > 1 files changed, 18 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > index 4c2e11c..6d0a891 100644 > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); > > static DEFINE_PER_CPU(bool, soft_watchdog_warn); > > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > > static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); > > +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved); > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > static DEFINE_PER_CPU(bool, hard_watchdog_warn); > > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > > @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct > > hrtimer *hrtimer) > > */ > > duration = is_softlockup(touch_ts); > > if (unlikely(duration)) { > > + pid_t pid = task_pid_nr(current); > > + > > /* > > * If a virtual machine is stopped by the host it can look to > > * the watchdog like a soft lockup, check to see if the host > > @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct > > hrtimer *hrtimer) > > return HRTIMER_RESTART; > > > > /* only warn once */ > > - if (__this_cpu_read(soft_watchdog_warn) == true) > > + if (__this_cpu_read(soft_watchdog_warn) == true) { > > + > > + /* > > +* Handle the case where multiple processes are > > +* causing softlockups but the duration is small > > +* enough, the softlockup detector can not reset > > +* itself in time. Use pids to detect this. > > +*/ > > + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { > > So I agree with the motivation of this improvement, but is this > implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
* Don Zickus wrote: > From: chai wen > > For now, soft lockup detector warns once for each case of process softlockup. > But the thread 'watchdog/n' may not always get the cpu at the time slot > between > the task switch of two processes hogging that cpu to reset soft_watchdog_warn. > > An example would be two processes hogging the cpu. Process A causes the > softlockup warning and is killed manually by a user. Process B immediately > becomes the new process hogging the cpu preventing the softlockup code from > resetting the soft_watchdog_warn variable. > > This case is a false negative of "warn only once for a process", as there may > be a different process that is going to hog the cpu. Resolve this by > saving/checking the pid of the hogging process and use that to reset > soft_watchdog_warn too. > > Signed-off-by: chai wen > [modified the comment and changelog to be more specific] > Signed-off-by: Don Zickus > --- > kernel/watchdog.c | 20 ++-- > 1 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 4c2e11c..6d0a891 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); > static DEFINE_PER_CPU(bool, soft_watchdog_warn); > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); > +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved); > #ifdef CONFIG_HARDLOCKUP_DETECTOR > static DEFINE_PER_CPU(bool, hard_watchdog_warn); > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct > hrtimer *hrtimer) >*/ > duration = is_softlockup(touch_ts); > if (unlikely(duration)) { > + pid_t pid = task_pid_nr(current); > + > /* >* If a virtual machine is stopped by the host it can look to >* the watchdog like a soft lockup, check to see if the host > @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct > hrtimer *hrtimer) > return HRTIMER_RESTART; > > /* only warn once */ > - if (__this_cpu_read(soft_watchdog_warn) == true) > + if (__this_cpu_read(soft_watchdog_warn) == true) { > + > + /* > + * Handle the case where multiple processes are > + * causing softlockups but the duration is small > + * enough, the softlockup detector can not reset > + * itself in time. Use pids to detect this. > + */ > + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { So I agree with the motivation of this improvement, but is this implementation namespace-safe? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
* Don Zickus dzic...@redhat.com wrote: From: chai wen chaiw.f...@cn.fujitsu.com For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the pid of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com [modified the comment and changelog to be more specific] Signed-off-by: Don Zickus dzic...@redhat.com --- kernel/watchdog.c | 20 ++-- 1 files changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 4c2e11c..6d0a891 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) */ duration = is_softlockup(touch_ts); if (unlikely(duration)) { + pid_t pid = task_pid_nr(current); + /* * If a virtual machine is stopped by the host it can look to * the watchdog like a soft lockup, check to see if the host @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + + /* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use pids to detect this. + */ + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { So I agree with the motivation of this improvement, but is this implementation namespace-safe? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Mon, Aug 18, 2014 at 11:03:19AM +0200, Ingo Molnar wrote: * Don Zickus dzic...@redhat.com wrote: From: chai wen chaiw.f...@cn.fujitsu.com For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the pid of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com [modified the comment and changelog to be more specific] Signed-off-by: Don Zickus dzic...@redhat.com --- kernel/watchdog.c | 20 ++-- 1 files changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 4c2e11c..6d0a891 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) */ duration = is_softlockup(touch_ts); if (unlikely(duration)) { + pid_t pid = task_pid_nr(current); + /* * If a virtual machine is stopped by the host it can look to * the watchdog like a soft lockup, check to see if the host @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use pids to detect this. +*/ + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( Cheers, Don -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
* Don Zickus dzic...@redhat.com wrote: On Mon, Aug 18, 2014 at 11:03:19AM +0200, Ingo Molnar wrote: * Don Zickus dzic...@redhat.com wrote: From: chai wen chaiw.f...@cn.fujitsu.com For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the pid of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com [modified the comment and changelog to be more specific] Signed-off-by: Don Zickus dzic...@redhat.com --- kernel/watchdog.c | 20 ++-- 1 files changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 4c2e11c..6d0a891 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) */ duration = is_softlockup(touch_ts); if (unlikely(duration)) { + pid_t pid = task_pid_nr(current); + /* * If a virtual machine is stopped by the host it can look to * the watchdog like a soft lockup, check to see if the host @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + + /* + * Handle the case where multiple processes are + * causing softlockups but the duration is small + * enough, the softlockup detector can not reset + * itself in time. Use pids to detect this. + */ + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Mon, Aug 18, 2014 at 08:01:58PM +0200, Ingo Molnar wrote: duration = is_softlockup(touch_ts); if (unlikely(duration)) { + pid_t pid = task_pid_nr(current); + /* * If a virtual machine is stopped by the host it can look to * the watchdog like a soft lockup, check to see if the host @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use pids to detect this. +*/ + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? Cheers, Don -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
* Don Zickus dzic...@redhat.com wrote: So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? What is always unique is the task pointer itself. We use pids when we interface with user-space - but we don't really do that here, right? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: * Don Zickus dzic...@redhat.com wrote: So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? What is always unique is the task pointer itself. We use pids when we interface with user-space - but we don't really do that here, right? No, I don't believe so. Ok, so saving 'current' and comparing that should be enough, correct? Cheers, Don -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/19/2014 04:38 AM, Don Zickus wrote: On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: * Don Zickus dzic...@redhat.com wrote: So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? What is always unique is the task pointer itself. We use pids when we interface with user-space - but we don't really do that here, right? No, I don't believe so. Ok, so saving 'current' and comparing that should be enough, correct? I am not sure of the safety about using pid here with namespace. But as to the pointer of process, is there a chance that we got a 'historical' address saved in the 'softlockup_warn_pid(or address)_saved' and the current hogging process happened to get the same task pointer address? If it never happens, I think the comparing of address is ok. thanks chai wen Cheers, Don . -- Regards Chai Wen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
From: chai wen For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of "warn only once for a process", as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the pid of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen [modified the comment and changelog to be more specific] Signed-off-by: Don Zickus --- kernel/watchdog.c | 20 ++-- 1 files changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 4c2e11c..6d0a891 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) */ duration = is_softlockup(touch_ts); if (unlikely(duration)) { + pid_t pid = task_pid_nr(current); + /* * If a virtual machine is stopped by the host it can look to * the watchdog like a soft lockup, check to see if the host @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use pids to detect this. +*/ + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -342,7 +357,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, - current->comm, task_pid_nr(current)); + current->comm, pid); + __this_cpu_write(softlockup_warn_pid_saved, pid); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
From: chai wen chaiw.f...@cn.fujitsu.com For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of warn only once for a process, as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the pid of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com [modified the comment and changelog to be more specific] Signed-off-by: Don Zickus dzic...@redhat.com --- kernel/watchdog.c | 20 ++-- 1 files changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 4c2e11c..6d0a891 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt); +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved); #ifdef CONFIG_HARDLOCKUP_DETECTOR static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) */ duration = is_softlockup(touch_ts); if (unlikely(duration)) { + pid_t pid = task_pid_nr(current); + /* * If a virtual machine is stopped by the host it can look to * the watchdog like a soft lockup, check to see if the host @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) return HRTIMER_RESTART; /* only warn once */ - if (__this_cpu_read(soft_watchdog_warn) == true) + if (__this_cpu_read(soft_watchdog_warn) == true) { + + /* +* Handle the case where multiple processes are +* causing softlockups but the duration is small +* enough, the softlockup detector can not reset +* itself in time. Use pids to detect this. +*/ + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) { + __this_cpu_write(soft_watchdog_warn, false); + __touch_watchdog(); + } return HRTIMER_RESTART; + } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already @@ -342,7 +357,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) printk(KERN_EMERG BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n, smp_processor_id(), duration, - current-comm, task_pid_nr(current)); + current-comm, pid); + __this_cpu_write(softlockup_warn_pid_saved, pid); print_modules(); print_irqtrace_events(current); if (regs) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/