Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
Jeremy Fitzhardinge wrote: diff -r 4c81d8cafb67 drivers/char/sysrq.c --- a/drivers/char/sysrq.c Tue Mar 27 01:16:07 2007 -0700 +++ b/drivers/char/sysrq.c Tue Mar 27 01:18:05 2007 -0700 @@ -408,6 +408,8 @@ void __handle_sysrq(int key, struct tty_ int i; unsigned long flags; + softlockup_global_disable(); + spin_lock_irqsave(_key_table_lock, flags); orig_log_level = console_loglevel; console_loglevel = 7; @@ -445,6 +447,8 @@ void __handle_sysrq(int key, struct tty_ console_loglevel = orig_log_level; } spin_unlock_irqrestore(_key_table_lock, flags); + + softlockup_global_enable(); } I wonder if that's too strong of a fix. It's only the sysrq-t case that is the problem. AFAIK, none of the other sysrq-t operations hold the tasklist_lock for a long time. I'd move these around show_state. P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
Jeremy Fitzhardinge wrote: Prarit Bhargava wrote: I think that's a good idea -- I'll propose an add on patch to fix the sysrq-t case ... I'm working on this patch at the moment. I'm just wondering what happens if you do a global re-enable while a CPU is locally disabled. I think it won't matter; it will end up in the "enabled but need to update timestamp" state, and the next time it gets a timer tick, it will simply update the timestamp and carry on. (This is relative to the other two softlockup patches, but modified since I posted them.) J diff -r 4c81d8cafb67 drivers/char/sysrq.c --- a/drivers/char/sysrq.c Tue Mar 27 01:16:07 2007 -0700 +++ b/drivers/char/sysrq.c Tue Mar 27 01:18:05 2007 -0700 @@ -408,6 +408,8 @@ void __handle_sysrq(int key, struct tty_ int i; unsigned long flags; + softlockup_global_disable(); + spin_lock_irqsave(_key_table_lock, flags); orig_log_level = console_loglevel; console_loglevel = 7; @@ -445,6 +447,8 @@ void __handle_sysrq(int key, struct tty_ console_loglevel = orig_log_level; } spin_unlock_irqrestore(_key_table_lock, flags); + + softlockup_global_enable(); I think that works -- I'll test it out on a big honkin' ia64 box ;) Shouldn't you also do softlockup_disable/softlockup_enable instead of touch_softlockup_watchdog? Why do we have both? I can't see why we would have two exported methods to stop/reset the softlockup timer... P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
Prarit Bhargava wrote: > I think that's a good idea -- I'll propose an add on patch to fix the > sysrq-t case ... I'm working on this patch at the moment. I'm just wondering what happens if you do a global re-enable while a CPU is locally disabled. I think it won't matter; it will end up in the "enabled but need to update timestamp" state, and the next time it gets a timer tick, it will simply update the timestamp and carry on. (This is relative to the other two softlockup patches, but modified since I posted them.) J diff -r 4c81d8cafb67 drivers/char/sysrq.c --- a/drivers/char/sysrq.c Tue Mar 27 01:16:07 2007 -0700 +++ b/drivers/char/sysrq.c Tue Mar 27 01:18:05 2007 -0700 @@ -408,6 +408,8 @@ void __handle_sysrq(int key, struct tty_ int i; unsigned long flags; + softlockup_global_disable(); + spin_lock_irqsave(_key_table_lock, flags); orig_log_level = console_loglevel; console_loglevel = 7; @@ -445,6 +447,8 @@ void __handle_sysrq(int key, struct tty_ console_loglevel = orig_log_level; } spin_unlock_irqrestore(_key_table_lock, flags); + + softlockup_global_enable(); } /* diff -r 4c81d8cafb67 include/linux/sched.h --- a/include/linux/sched.h Tue Mar 27 01:16:07 2007 -0700 +++ b/include/linux/sched.h Tue Mar 27 01:18:05 2007 -0700 @@ -235,6 +235,8 @@ extern void softlockup_tick(void); extern void softlockup_tick(void); extern void softlockup_enable(void); extern void softlockup_disable(void); +extern void softlockup_global_enable(void); +extern void softlockup_global_disable(void); extern void spawn_softlockup_task(void); extern void touch_softlockup_watchdog(void); #else @@ -245,6 +247,12 @@ static inline void softlockup_enable(voi { } static inline void softlockup_disable(void) +{ +} +static inline void softlockup_global_enable(void) +{ +} +static inline void softlockup_global_disable(void) { } static inline void spawn_softlockup_task(void) diff -r 4c81d8cafb67 kernel/softlockup.c --- a/kernel/softlockup.c Tue Mar 27 01:16:07 2007 -0700 +++ b/kernel/softlockup.c Tue Mar 27 01:18:05 2007 -0700 @@ -17,10 +17,16 @@ static DEFINE_SPINLOCK(print_lock); +enum enable { + SL_OFF = 0, /* disabled */ + SL_UPDATE, /* enabled, but timestamp old */ + SL_ON, /* enabled */ +}; + static DEFINE_PER_CPU(unsigned long long, touch_timestamp); static DEFINE_PER_CPU(unsigned long long, print_timestamp); static DEFINE_PER_CPU(struct task_struct *, watchdog_task); -static DEFINE_PER_CPU(int, enabled); +static DEFINE_PER_CPU(enum enable, enabled); static int did_panic = 0; @@ -39,6 +45,8 @@ void touch_softlockup_watchdog(void) void touch_softlockup_watchdog(void) { __raw_get_cpu_var(touch_timestamp) = sched_clock(); + barrier(); /* update timestamp before enable */ + __raw_get_cpu_var(enabled) = SL_ON; } EXPORT_SYMBOL(touch_softlockup_watchdog); @@ -57,13 +65,27 @@ void softlockup_enable(void) void softlockup_enable(void) { touch_softlockup_watchdog(); - barrier(); /* update timestamp before enable */ - __get_cpu_var(enabled) = 1; } void softlockup_disable(void) { - __get_cpu_var(enabled) = 0; + __get_cpu_var(enabled) = SL_OFF; +} + +void softlockup_global_enable() +{ + unsigned cpu; + + for_each_online_cpu(cpu) + per_cpu(enabled, cpu) = SL_UPDATE; +} + +void softlockup_global_disable() +{ + unsigned cpu; + + for_each_online_cpu(cpu) + per_cpu(enabled, cpu) = SL_OFF; } /* @@ -79,9 +101,19 @@ void softlockup_tick(void) touch_timestamp = get_timestamp(&__get_cpu_var(touch_timestamp)); - /* return if not enabled */ - if (!__get_cpu_var(enabled)) - return; + switch(__get_cpu_var(enabled)) { + case SL_OFF: + /* not enabled */ + return; + + case SL_UPDATE: + /* update timestamp */ + touch_softlockup_watchdog(); + break; + + case SL_ON: + break; + } print_timestamp = __get_cpu_var(print_timestamp); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
I have another pair of softlockup patches in which I try to address: * ignoring time stolen by hypervisors * threads going to sleep tickless for long periods of time I'm looking at the code now. Your solution is definately better :) I could easy add a "global disable" function, which would allow long sysrq messages, and it would help Thilo with his long flash update freezes. I think that's a good idea -- I'll propose an add on patch to fix the sysrq-t case ... P. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
I have another pair of softlockup patches in which I try to address: * ignoring time stolen by hypervisors * threads going to sleep tickless for long periods of time I'm looking at the code now. Your solution is definately better :) I could easy add a global disable function, which would allow long sysrq messages, and it would help Thilo with his long flash update freezes. I think that's a good idea -- I'll propose an add on patch to fix the sysrq-t case ... P. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
Prarit Bhargava wrote: I think that's a good idea -- I'll propose an add on patch to fix the sysrq-t case ... I'm working on this patch at the moment. I'm just wondering what happens if you do a global re-enable while a CPU is locally disabled. I think it won't matter; it will end up in the enabled but need to update timestamp state, and the next time it gets a timer tick, it will simply update the timestamp and carry on. (This is relative to the other two softlockup patches, but modified since I posted them.) J diff -r 4c81d8cafb67 drivers/char/sysrq.c --- a/drivers/char/sysrq.c Tue Mar 27 01:16:07 2007 -0700 +++ b/drivers/char/sysrq.c Tue Mar 27 01:18:05 2007 -0700 @@ -408,6 +408,8 @@ void __handle_sysrq(int key, struct tty_ int i; unsigned long flags; + softlockup_global_disable(); + spin_lock_irqsave(sysrq_key_table_lock, flags); orig_log_level = console_loglevel; console_loglevel = 7; @@ -445,6 +447,8 @@ void __handle_sysrq(int key, struct tty_ console_loglevel = orig_log_level; } spin_unlock_irqrestore(sysrq_key_table_lock, flags); + + softlockup_global_enable(); } /* diff -r 4c81d8cafb67 include/linux/sched.h --- a/include/linux/sched.h Tue Mar 27 01:16:07 2007 -0700 +++ b/include/linux/sched.h Tue Mar 27 01:18:05 2007 -0700 @@ -235,6 +235,8 @@ extern void softlockup_tick(void); extern void softlockup_tick(void); extern void softlockup_enable(void); extern void softlockup_disable(void); +extern void softlockup_global_enable(void); +extern void softlockup_global_disable(void); extern void spawn_softlockup_task(void); extern void touch_softlockup_watchdog(void); #else @@ -245,6 +247,12 @@ static inline void softlockup_enable(voi { } static inline void softlockup_disable(void) +{ +} +static inline void softlockup_global_enable(void) +{ +} +static inline void softlockup_global_disable(void) { } static inline void spawn_softlockup_task(void) diff -r 4c81d8cafb67 kernel/softlockup.c --- a/kernel/softlockup.c Tue Mar 27 01:16:07 2007 -0700 +++ b/kernel/softlockup.c Tue Mar 27 01:18:05 2007 -0700 @@ -17,10 +17,16 @@ static DEFINE_SPINLOCK(print_lock); +enum enable { + SL_OFF = 0, /* disabled */ + SL_UPDATE, /* enabled, but timestamp old */ + SL_ON, /* enabled */ +}; + static DEFINE_PER_CPU(unsigned long long, touch_timestamp); static DEFINE_PER_CPU(unsigned long long, print_timestamp); static DEFINE_PER_CPU(struct task_struct *, watchdog_task); -static DEFINE_PER_CPU(int, enabled); +static DEFINE_PER_CPU(enum enable, enabled); static int did_panic = 0; @@ -39,6 +45,8 @@ void touch_softlockup_watchdog(void) void touch_softlockup_watchdog(void) { __raw_get_cpu_var(touch_timestamp) = sched_clock(); + barrier(); /* update timestamp before enable */ + __raw_get_cpu_var(enabled) = SL_ON; } EXPORT_SYMBOL(touch_softlockup_watchdog); @@ -57,13 +65,27 @@ void softlockup_enable(void) void softlockup_enable(void) { touch_softlockup_watchdog(); - barrier(); /* update timestamp before enable */ - __get_cpu_var(enabled) = 1; } void softlockup_disable(void) { - __get_cpu_var(enabled) = 0; + __get_cpu_var(enabled) = SL_OFF; +} + +void softlockup_global_enable() +{ + unsigned cpu; + + for_each_online_cpu(cpu) + per_cpu(enabled, cpu) = SL_UPDATE; +} + +void softlockup_global_disable() +{ + unsigned cpu; + + for_each_online_cpu(cpu) + per_cpu(enabled, cpu) = SL_OFF; } /* @@ -79,9 +101,19 @@ void softlockup_tick(void) touch_timestamp = get_timestamp(__get_cpu_var(touch_timestamp)); - /* return if not enabled */ - if (!__get_cpu_var(enabled)) - return; + switch(__get_cpu_var(enabled)) { + case SL_OFF: + /* not enabled */ + return; + + case SL_UPDATE: + /* update timestamp */ + touch_softlockup_watchdog(); + break; + + case SL_ON: + break; + } print_timestamp = __get_cpu_var(print_timestamp); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
Jeremy Fitzhardinge wrote: Prarit Bhargava wrote: I think that's a good idea -- I'll propose an add on patch to fix the sysrq-t case ... I'm working on this patch at the moment. I'm just wondering what happens if you do a global re-enable while a CPU is locally disabled. I think it won't matter; it will end up in the enabled but need to update timestamp state, and the next time it gets a timer tick, it will simply update the timestamp and carry on. (This is relative to the other two softlockup patches, but modified since I posted them.) J diff -r 4c81d8cafb67 drivers/char/sysrq.c --- a/drivers/char/sysrq.c Tue Mar 27 01:16:07 2007 -0700 +++ b/drivers/char/sysrq.c Tue Mar 27 01:18:05 2007 -0700 @@ -408,6 +408,8 @@ void __handle_sysrq(int key, struct tty_ int i; unsigned long flags; + softlockup_global_disable(); + spin_lock_irqsave(sysrq_key_table_lock, flags); orig_log_level = console_loglevel; console_loglevel = 7; @@ -445,6 +447,8 @@ void __handle_sysrq(int key, struct tty_ console_loglevel = orig_log_level; } spin_unlock_irqrestore(sysrq_key_table_lock, flags); + + softlockup_global_enable(); I think that works -- I'll test it out on a big honkin' ia64 box ;) Shouldn't you also do softlockup_disable/softlockup_enable instead of touch_softlockup_watchdog? Why do we have both? I can't see why we would have two exported methods to stop/reset the softlockup timer... P. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
Jeremy Fitzhardinge wrote: diff -r 4c81d8cafb67 drivers/char/sysrq.c --- a/drivers/char/sysrq.c Tue Mar 27 01:16:07 2007 -0700 +++ b/drivers/char/sysrq.c Tue Mar 27 01:18:05 2007 -0700 @@ -408,6 +408,8 @@ void __handle_sysrq(int key, struct tty_ int i; unsigned long flags; + softlockup_global_disable(); + spin_lock_irqsave(sysrq_key_table_lock, flags); orig_log_level = console_loglevel; console_loglevel = 7; @@ -445,6 +447,8 @@ void __handle_sysrq(int key, struct tty_ console_loglevel = orig_log_level; } spin_unlock_irqrestore(sysrq_key_table_lock, flags); + + softlockup_global_enable(); } I wonder if that's too strong of a fix. It's only the sysrq-t case that is the problem. AFAIK, none of the other sysrq-t operations hold the tasklist_lock for a long time. I'd move these around show_state. P. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH]: Fix bogus softlockup warning with sysrq-t
> I could easy add a "global disable" function, which would allow long > sysrq messages, and it would help Thilo with his long flash update freezes. A "global disable" and "reenable" functions pair which works during irq disabled, would be a perfect solution for me. Thx Jeremy for your effort :) Ciao Thilo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
Prarit Bhargava wrote: > There are some situations when soft lockup warnings are expected in the > kernel. For example, when doing an alt-sysrq-t on a large number of > processes, > the dump to console can take a long time and the tasklist_lock is held over > that period. This results in a bogus soft lockup warning. > Wouldn't it be better to just temporarily disable softlockups for the duration? > This patch reworks touch_softlockup_watchdog to touch ALL cpu's > touch_timestamp. It also introduces touch_cpu_softlockup_watchdog to touch > a single cpu's touch_timestamp. Doesn't this mean that if one CPU gets locked up, it will be undetected so long as some other CPU is making progress? I have another pair of softlockup patches in which I try to address: * ignoring time stolen by hypervisors * threads going to sleep tickless for long periods of time I could easy add a "global disable" function, which would allow long sysrq messages, and it would help Thilo with his long flash update freezes. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
Prarit Bhargava wrote: There are some situations when soft lockup warnings are expected in the kernel. For example, when doing an alt-sysrq-t on a large number of processes, the dump to console can take a long time and the tasklist_lock is held over that period. This results in a bogus soft lockup warning. Wouldn't it be better to just temporarily disable softlockups for the duration? This patch reworks touch_softlockup_watchdog to touch ALL cpu's touch_timestamp. It also introduces touch_cpu_softlockup_watchdog to touch a single cpu's touch_timestamp. Doesn't this mean that if one CPU gets locked up, it will be undetected so long as some other CPU is making progress? I have another pair of softlockup patches in which I try to address: * ignoring time stolen by hypervisors * threads going to sleep tickless for long periods of time I could easy add a global disable function, which would allow long sysrq messages, and it would help Thilo with his long flash update freezes. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH]: Fix bogus softlockup warning with sysrq-t
I could easy add a global disable function, which would allow long sysrq messages, and it would help Thilo with his long flash update freezes. A global disable and reenable functions pair which works during irq disabled, would be a perfect solution for me. Thx Jeremy for your effort :) Ciao Thilo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
We've seen these here and had arrived at a similar patch. Extensive prints on the console can take longer than the watchdog likes. Acked-by: Rick Lindsley <[EMAIL PROTECTED]> Rick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Fix bogus softlockup warning with sysrq-t
We've seen these here and had arrived at a similar patch. Extensive prints on the console can take longer than the watchdog likes. Acked-by: Rick Lindsley [EMAIL PROTECTED] Rick - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/