Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread
Hi! > On (04/07/17 09:21), Pavel Machek wrote: > > > spin_dump() and trigger_all_cpu_backtrace() result in a bunch of > > > additional printk()-s so CPU0 has even more job to do in console_unlock(), > > > while it still holds the contended spin_lock. and so on; there are > > > many other examples. > > > > > > so should we declare a "we can spend only 2 seconds in direct printk() > > > and then must offload printing" rule? I don't think it's much better > > > than a simpler "we always offload, as long as we think it's safe". > > > > I believe we should do the 2 seconds rule. It allows us to print "some > > messages delayed" message, so at least whoever is trying to debug the > > crash will have the hints that he needs to look at the printk system. > > do you mean panic()? in panic() we call console_flush_on_panic(), > which immediately outputs all pending logbuf messages. printk() > offloading does not happen there. Not panic(). I have seen many crashes where we had printk(KERN_ERR) and then hard hang. And the printk() was really important for debugging. > > "we always offload, as long as we think it's safe" rule does not > > really work, as printk() can not detect if it is safe or not. > > but "2 seconds" rule has that "as long as we think it's safe" string > attached as well. just because we do offloading. which is sometimes > un-safe. so regardless the timeout value (0 seconds or 2 seconds) we > still need some sort of a hint from the path that issues printk() > because that path (panic, kexec, sysrq, etc.) knows for sure when > things are abnormal. printk() is pretty clueless in this regard. > /* well, I still think that EMERG loglevel thing is not completely > broken. */ Well, at least with my solution you know there are messages that were not printed. Yes, you'd still want to switch printk_now() for stuff like panic(). But if you get it wrong (and you will), at least you will see the "something is missing here" message in the log. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread
Hello, On (04/07/17 09:21), Pavel Machek wrote: > > spin_dump() and trigger_all_cpu_backtrace() result in a bunch of > > additional printk()-s so CPU0 has even more job to do in console_unlock(), > > while it still holds the contended spin_lock. and so on; there are > > many other examples. > > > > so should we declare a "we can spend only 2 seconds in direct printk() > > and then must offload printing" rule? I don't think it's much better > > than a simpler "we always offload, as long as we think it's safe". > > I believe we should do the 2 seconds rule. It allows us to print "some > messages delayed" message, so at least whoever is trying to debug the > crash will have the hints that he needs to look at the printk system. do you mean panic()? in panic() we call console_flush_on_panic(), which immediately outputs all pending logbuf messages. printk() offloading does not happen there. > "we always offload, as long as we think it's safe" rule does not > really work, as printk() can not detect if it is safe or not. but "2 seconds" rule has that "as long as we think it's safe" string attached as well. just because we do offloading. which is sometimes un-safe. so regardless the timeout value (0 seconds or 2 seconds) we still need some sort of a hint from the path that issues printk() because that path (panic, kexec, sysrq, etc.) knows for sure when things are abnormal. printk() is pretty clueless in this regard. /* well, I still think that EMERG loglevel thing is not completely broken. */ -ss
Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread
Hi! > spin_dump() and trigger_all_cpu_backtrace() result in a bunch of > additional printk()-s so CPU0 has even more job to do in console_unlock(), > while it still holds the contended spin_lock. and so on; there are > many other examples. > > so should we declare a "we can spend only 2 seconds in direct printk() > and then must offload printing" rule? I don't think it's much better > than a simpler "we always offload, as long as we think it's safe". I believe we should do the 2 seconds rule. It allows us to print "some messages delayed" message, so at least whoever is trying to debug the crash will have the hints that he needs to look at the printk system. "we always offload, as long as we think it's safe" rule does not really work, as printk() can not detect if it is safe or not. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread
Hello, On (04/06/17 19:14), Pavel Machek wrote: [..] > > @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level, > > > > printed_len += log_output(facility, level, lflags, dict, dictlen, text, > > text_len); > > > > + /* > > +* Emergency level indicates that the system is unstable and, thus, > > +* we better stop relying on wake_up(printk_kthread) and try to do > > +* a direct printing. > > +*/ > > + if (level == LOGLEVEL_EMERG) > > + printk_kthread_disabled = true; > > + > > + set_bit(PRINTK_PENDING_OUTPUT, &printk_pending); > > Messages lower then _EMERG may be important, too.. and usually are, > for debugging. > > And you keep both code paths, anyway, so they have to work. So you did > not really "fix" issues you are pointing out -- they still remain > there for _EMERG and above. we don't drop messages of lower levels. we just print then from a schedulable context. once the things go off the rails, and EMERG is a good hint, I think, we stop being optimismitcs and switch to a "best effort" mode. that is sort of reasonable. if there is a flood of EMERG messages that are not actually important and, basically, are result of a coding error, then, I think, the we must fix that coding error. I mean, there should be some common sense, and doing while (1) printk(KERN_EMERG "hello\n"); is probably not. > I agree that printing too much is a problem. Could you just print > "(messages delayed)" in that case, then wake a kernel thread to [rint > the rest? sorry, but what difference would it make? it's really unclear at what point we should offload printing if we begin that "we will offload sometimes". for example, I've seen many spin-lock lockups where printk was involved. CPU0CPU1CPU2 CPU3 spin_lock(&lock) spin_lock(&lock)spin_lock(&lock) printk("foo") // grabs the console_sem printk("bar") printk("a") printk("b") printk("c") ... printk("z") spin_dump() spin_dump() call_console_drivers() printk() printk() serial_driver_write() printk() printk() spin_lock_irqsave(port->lock)... ... uart_console_write(...) trigger_all_cpu_backtrace() trigger_all_cpu_backtrace() serial_driver_putchar() while (!txrdy(...)) cpu_relax() spin_dump() and trigger_all_cpu_backtrace() result in a bunch of additional printk()-s so CPU0 has even more job to do in console_unlock(), while it still holds the contended spin_lock. and so on; there are many other examples. so should we declare a "we can spend only 2 seconds in direct printk() and then must offload printing" rule? I don't think it's much better than a simpler "we always offload, as long as we think it's safe". -ss
Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread
Hi! > printk() is quite complex internally and, basically, it does two > slightly independent things: > a) adds a new message to a kernel log buffer (log_store()) > b) prints kernel log messages to serial consoles (console_unlock()) > > while (a) is guaranteed to be executed by printk(), (b) is not, for a > variety of reasons, and, unlike log_store(), it comes at a price: > > 1) console_unlock() attempts to flush all pending kernel log messages > to the console. Thus, it can loop indefinitely. > > 2) while console_unlock() is executed on one particular CPU, printing > pending kernel log messages, other CPUs can simultaneously append new > messages to the kernel log buffer. > > 3) the time it takes console_unlock() to print kernel messages also > depends on the speed of the console -- which may not be fast at all. > > 4) console_unlock() is executed in the same context as printk(), so > it may be non-preemptible/atomic, which makes 1)-3) dangerous. > > As a result, nobody knows how long a printk() call will take, so > it's not really safe to call printk() in a number of situations, > including atomic context, RCU critical sections, interrupt context, > and more. You have made good argumentation for not flushing unlimited ammount of messages from printk() -- okay. But I don't think this is good idea: > @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level, > > printed_len += log_output(facility, level, lflags, dict, dictlen, text, > text_len); > > + /* > + * Emergency level indicates that the system is unstable and, thus, > + * we better stop relying on wake_up(printk_kthread) and try to do > + * a direct printing. > + */ > + if (level == LOGLEVEL_EMERG) > + printk_kthread_disabled = true; > + > + set_bit(PRINTK_PENDING_OUTPUT, &printk_pending); Messages lower then _EMERG may be important, too.. and usually are, for debugging. And you keep both code paths, anyway, so they have to work. So you did not really "fix" issues you are pointing out -- they still remain there for _EMERG and above. I agree that printing too much is a problem. Could you just print "(messages delayed)" in that case, then wake a kernel thread to [rint the rest? Pavel > logbuf_unlock_irqrestore(flags); > > /* If called from the scheduler, we can not call up(). */ > if (!in_sched) { > /* > - * Try to acquire and then immediately release the console > - * semaphore. The release will print out buffers and wake up > - * /dev/kmsg and syslog() users. > + * Under heavy printing load/slow serial console/etc > + * console_unlock() can stall CPUs, which can result in > + * soft/hard-lockups, lost interrupts, RCU stalls, etc. > + * Therefore we attempt to print the messages to console > + * from a dedicated printk_kthread, which always runs in > + * schedulable context. >*/ > - if (console_trylock()) > - console_unlock(); > + if (printk_kthread_enabled()) { > + printk_safe_enter_irqsave(flags); > + wake_up_process(printk_kthread); > + printk_safe_exit_irqrestore(flags); > + } else { > + /* > + * Try to acquire and then immediately release the > + * console semaphore. The release will print out > + * buffers and wake up /dev/kmsg and syslog() users. > + */ > + if (console_trylock()) > + console_unlock(); > + } > } > -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread
On (04/04/17 11:01), Petr Mladek wrote: [..] > > +static atomic_t printk_emergency __read_mostly; > > +/* > > + * Disable printk_kthread permanently. Unlike `oops_in_progress' > > + * it doesn't go back to 0. > > + */ > > The comment is not valid once we allow to modify the variable using > the sysfs knob. it's updated in that patch (sysfs knob introduction). [..] > > @@ -2182,6 +2253,7 @@ void console_unlock(void) > > console_may_schedule = 0; > > > > again: > > + clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending); > > This will not help if new messages appear during > call_console_drivers(). you are right. wouldn't do much harm (an extra console_unlock() from printk_kthread in the worst case), but agree. I added it there because of that "!can_use_console()" branch. not that I expect printk_kthread being executed on !online CPU, but we might have no callable consoles. probably should have that clear_bit() before and after the loop. -ss
Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread
On Wed 2017-03-29 18:25:05, Sergey Senozhatsky wrote: > This patch introduces a dedicated printing kernel thread - printk_kthread. > The main purpose of this kthread is to offload printing to a non-atomic > and always scheduleable context, which eliminates 4) and makes 1)-3) less > critical. printk() now just appends log messages to the kernel log buffer > and wake_up()s printk_kthread instead of locking console_sem and calling > into potentially unsafe console_unlock(). > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 2d07678e9ff9..ab6b3b2a68c6 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -445,6 +447,42 @@ static char __log_buf[__LOG_BUF_LEN] > __aligned(LOG_ALIGN); > static char *log_buf = __log_buf; > static u32 log_buf_len = __LOG_BUF_LEN; > > +static struct task_struct *printk_kthread __read_mostly; > +/* > + * We can't call into the scheduler (wake_up() printk kthread) during > + * suspend/kexec/etc. This temporarily switches printk to old behaviour. > + */ > +static atomic_t printk_emergency __read_mostly; > +/* > + * Disable printk_kthread permanently. Unlike `oops_in_progress' > + * it doesn't go back to 0. > + */ The comment is not valid once we allow to modify the variable using the sysfs knob. > @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level, > > printed_len += log_output(facility, level, lflags, dict, dictlen, text, > text_len); > > + /* > + * Emergency level indicates that the system is unstable and, thus, > + * we better stop relying on wake_up(printk_kthread) and try to do > + * a direct printing. > + */ > + if (level == LOGLEVEL_EMERG) > + printk_kthread_disabled = true; > + > + set_bit(PRINTK_PENDING_OUTPUT, &printk_pending); > logbuf_unlock_irqrestore(flags); > > /* If called from the scheduler, we can not call up(). */ > if (!in_sched) { > /* > - * Try to acquire and then immediately release the console > - * semaphore. The release will print out buffers and wake up > - * /dev/kmsg and syslog() users. > + * Under heavy printing load/slow serial console/etc > + * console_unlock() can stall CPUs, which can result in > + * soft/hard-lockups, lost interrupts, RCU stalls, etc. > + * Therefore we attempt to print the messages to console > + * from a dedicated printk_kthread, which always runs in > + * schedulable context. >*/ > - if (console_trylock()) > - console_unlock(); > + if (printk_kthread_enabled()) { > + printk_safe_enter_irqsave(flags); > + wake_up_process(printk_kthread); > + printk_safe_exit_irqrestore(flags); I am really happy that we have the printk_safe stuff available! > + } else { > + /* > + * Try to acquire and then immediately release the > + * console semaphore. The release will print out > + * buffers and wake up /dev/kmsg and syslog() users. > + */ > + if (console_trylock()) > + console_unlock(); > + } > } > > return printed_len; > @@ -1882,6 +1943,9 @@ static size_t msg_print_text(const struct printk_log > *msg, >bool syslog, char *buf, size_t size) { return 0; } > static bool suppress_message_printing(int level) { return false; } > > +void printk_emergency_begin(void) {} > +void printk_emergency_end(void) {} > + > #endif /* CONFIG_PRINTK */ > > #ifdef CONFIG_EARLY_PRINTK > @@ -2164,6 +2228,13 @@ void console_unlock(void) > bool do_cond_resched, retry; > > if (console_suspended) { > + /* > + * Avoid an infinite loop in printk_kthread function > + * when console_unlock() cannot flush messages because > + * we suspended consoles. Someone else will print the > + * messages from resume_console(). > + */ > + clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending); Great catch! > up_console_sem(); > return; > } > @@ -2182,6 +2253,7 @@ void console_unlock(void) > console_may_schedule = 0; > > again: > + clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending); This will not help if new messages appear during call_console_drivers(). I would move this line after the for(;;) cycle. It will be cleared when all messages are really handled. Otherwise, it looks fine to me. Best Regards, Petr
[RFC][PATCHv2 2/8] printk: introduce printing kernel thread
printk() is quite complex internally and, basically, it does two slightly independent things: a) adds a new message to a kernel log buffer (log_store()) b) prints kernel log messages to serial consoles (console_unlock()) while (a) is guaranteed to be executed by printk(), (b) is not, for a variety of reasons, and, unlike log_store(), it comes at a price: 1) console_unlock() attempts to flush all pending kernel log messages to the console. Thus, it can loop indefinitely. 2) while console_unlock() is executed on one particular CPU, printing pending kernel log messages, other CPUs can simultaneously append new messages to the kernel log buffer. 3) the time it takes console_unlock() to print kernel messages also depends on the speed of the console -- which may not be fast at all. 4) console_unlock() is executed in the same context as printk(), so it may be non-preemptible/atomic, which makes 1)-3) dangerous. As a result, nobody knows how long a printk() call will take, so it's not really safe to call printk() in a number of situations, including atomic context, RCU critical sections, interrupt context, and more. This patch introduces a dedicated printing kernel thread - printk_kthread. The main purpose of this kthread is to offload printing to a non-atomic and always scheduleable context, which eliminates 4) and makes 1)-3) less critical. printk() now just appends log messages to the kernel log buffer and wake_up()s printk_kthread instead of locking console_sem and calling into potentially unsafe console_unlock(). This, however, is not always safe on its own. For example, we can't call into the scheduler from panic(), because this may cause deadlock. That's why we introduce a concept of printk_emergency() mode, when printk() switches back to the old behaviour (console_unlock() from vprintk_emit()) in those cases. In general, this switch happens automatically once a EMERG log level message appears in the log buffer. Another cases when wake_up() might not work as expected are suspend, hibernate, etc. For those situations we provide two new functions: -- printk_emergency_begin() Disables printk offloading. All printk() calls will attempt to lock the console_sem and, if successful, flush kernel log messages. -- printk_emergency_end() Enables printk offloading. Signed-off-by: Sergey Senozhatsky Signed-off-by: Jan Kara --- include/linux/console.h | 3 ++ kernel/printk/printk.c | 107 2 files changed, 102 insertions(+), 8 deletions(-) diff --git a/include/linux/console.h b/include/linux/console.h index 5949d1855589..f1a86944072e 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -187,6 +187,9 @@ extern bool console_suspend_enabled; extern void suspend_console(void); extern void resume_console(void); +extern void printk_emergency_begin(void); +extern void printk_emergency_end(void); + int mda_console_init(void); void prom_con_init(void); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 2d07678e9ff9..ab6b3b2a68c6 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include @@ -402,7 +403,8 @@ DEFINE_RAW_SPINLOCK(logbuf_lock); } while (0) /* - * Delayed printk version, for scheduler-internal messages: + * Used both for deferred printk version (scheduler-internal messages) + * and printk_kthread control. */ #define PRINTK_PENDING_WAKEUP 0x01 #define PRINTK_PENDING_OUTPUT 0x02 @@ -445,6 +447,42 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +static struct task_struct *printk_kthread __read_mostly; +/* + * We can't call into the scheduler (wake_up() printk kthread) during + * suspend/kexec/etc. This temporarily switches printk to old behaviour. + */ +static atomic_t printk_emergency __read_mostly; +/* + * Disable printk_kthread permanently. Unlike `oops_in_progress' + * it doesn't go back to 0. + */ +static bool printk_kthread_disabled __read_mostly; + +static inline bool printk_kthread_enabled(void) +{ + return !printk_kthread_disabled && + printk_kthread && atomic_read(&printk_emergency) == 0; +} + +/* + * This disables printing offloading and instead attempts + * to do the usual console_trylock()->console_unlock(). + * + * Note, this does not stop the printk_kthread if it's already + * printing logbuf messages. + */ +void printk_emergency_begin(void) +{ + atomic_inc(&printk_emergency); +} + +/* This re-enables printk_kthread offloading. */ +void printk_emergency_end(void) +{ + atomic_dec(&printk_emergency); +} + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level, printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len); + /* +