Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hi, Move printk and (some of) MM people to the recipients list. On (01/10/18 09:02), Tejun Heo wrote: [..] > The particular case that we've been seeing regularly in the fleet was > the following scenario. > > 1. Console is IPMI emulated serial console. Super slow. Also >netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack >/ driver tries to allocate memory and then fail, which in turn >triggers allocation failure or other warning messages. printk was >already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never >shrinking. Because OOM handler is trapped in printk flushing, it >never manages to free memory and no one else can enter OOM path >either, so the system is trapped in this state. Tejun, we have a theory [since there are no logs available] that what you are looking at is something as follows: console_unlock() { for (;;) { call_console_drivers() kmalloc()/etc/* netconsole, skb kmalloc(), for instance */ __alloc_pages_slowpath() warn_alloc() /* a bunch of printk() -> log_store() */ } } Now, warn_alloc() is rate limited to DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST so net console driver can add 10 warn_alloc() reports every 5 seconds to the logbuf. You have a "super slow" IPMI console and net console. So for every logbuf entry we do: console_unlock() { for (;;) { call_console_drivers(msg) -> IPMI_write() call_console_drivers(msg) -> netconsole_write() -> skb kmalloc() -> warn_alloc() -> ratelimit } } IPMI_write() is very slow, as you have noted, so it consumes time printing messages, simultaneously warn_alloc() rate limit depends on time. *Probably*, slow IPMI_write() is unable to flush 10 warn_alloc() reports under 5 seconds, which gives net console a chance to add another 10 warn_alloc()-s, while the previous 10 warn_alloc()-s have not been flushed yet. It seems that DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST warn_alloc() rate limit is too permissive for your setup. Can you confirm that the theory is actually correct? If it is correct, then can we simply tweak warn_alloc() rate limit? Say, make it x2 / x4 / etc. times less verbose? E.g. "up to 5 warn_alloc()-s every 10 seconds"? What do MM folks think? -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hi, Move printk and (some of) MM people to the recipients list. On (01/10/18 09:02), Tejun Heo wrote: [..] > The particular case that we've been seeing regularly in the fleet was > the following scenario. > > 1. Console is IPMI emulated serial console. Super slow. Also >netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack >/ driver tries to allocate memory and then fail, which in turn >triggers allocation failure or other warning messages. printk was >already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never >shrinking. Because OOM handler is trapped in printk flushing, it >never manages to free memory and no one else can enter OOM path >either, so the system is trapped in this state. Tejun, we have a theory [since there are no logs available] that what you are looking at is something as follows: console_unlock() { for (;;) { call_console_drivers() kmalloc()/etc/* netconsole, skb kmalloc(), for instance */ __alloc_pages_slowpath() warn_alloc() /* a bunch of printk() -> log_store() */ } } Now, warn_alloc() is rate limited to DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST so net console driver can add 10 warn_alloc() reports every 5 seconds to the logbuf. You have a "super slow" IPMI console and net console. So for every logbuf entry we do: console_unlock() { for (;;) { call_console_drivers(msg) -> IPMI_write() call_console_drivers(msg) -> netconsole_write() -> skb kmalloc() -> warn_alloc() -> ratelimit } } IPMI_write() is very slow, as you have noted, so it consumes time printing messages, simultaneously warn_alloc() rate limit depends on time. *Probably*, slow IPMI_write() is unable to flush 10 warn_alloc() reports under 5 seconds, which gives net console a chance to add another 10 warn_alloc()-s, while the previous 10 warn_alloc()-s have not been flushed yet. It seems that DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST warn_alloc() rate limit is too permissive for your setup. Can you confirm that the theory is actually correct? If it is correct, then can we simply tweak warn_alloc() rate limit? Say, make it x2 / x4 / etc. times less verbose? E.g. "up to 5 warn_alloc()-s every 10 seconds"? What do MM folks think? -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 07:43), Tejun Heo wrote: > > > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. Hello Tejun, I'm willing to take a look at those printk()-s from console drivers. Any chance you can send me some of the backtraces you see [the most common/disturbing]? -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 07:43), Tejun Heo wrote: > > > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. Hello Tejun, I'm willing to take a look at those printk()-s from console drivers. Any chance you can send me some of the backtraces you see [the most common/disturbing]? -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Peter. On Wed, Jan 24, 2018 at 10:36:07AM +0100, Peter Zijlstra wrote: > On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > > 1. Console is IPMI emulated serial console. Super slow. Also > >netconsole is in use. > > So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had > trouble like that (granted I don't typically trigger OOM storms, but > they do occasionally happen). > > Is your IPMI much slower and not fixable to be faster? It looks like the latest machines have the baud rate at 57600 and I'm pretty sure we have a lot of slower ones. 57600 isn't 9600 but is still slow enough to get into trouble often enough. There are a huge number of machines running all sorts of things under heavy load and trying to rapidly deploy new kernels / features contributes to encountering bugs and weird corner cases. UART can run a lot faster and I have no idea why IPMI consoles behave as if they were connected over mile-long DB9 cables. Maybe we can convince hardware people to improve it but, even if that happened today, we'd still be looking at years of dealing with slower ones, and IPMI situation here is likely better than what many others are facing. idk, it's not a particularly difficult problem to solve from kernel side. Just need to figure out a better / more robust trade-off. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Peter. On Wed, Jan 24, 2018 at 10:36:07AM +0100, Peter Zijlstra wrote: > On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > > 1. Console is IPMI emulated serial console. Super slow. Also > >netconsole is in use. > > So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had > trouble like that (granted I don't typically trigger OOM storms, but > they do occasionally happen). > > Is your IPMI much slower and not fixable to be faster? It looks like the latest machines have the baud rate at 57600 and I'm pretty sure we have a lot of slower ones. 57600 isn't 9600 but is still slow enough to get into trouble often enough. There are a huge number of machines running all sorts of things under heavy load and trying to rapidly deploy new kernels / features contributes to encountering bugs and weird corner cases. UART can run a lot faster and I have no idea why IPMI consoles behave as if they were connected over mile-long DB9 cables. Maybe we can convince hardware people to improve it but, even if that happened today, we'd still be looking at years of dealing with slower ones, and IPMI situation here is likely better than what many others are facing. idk, it's not a particularly difficult problem to solve from kernel side. Just need to figure out a better / more robust trade-off. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > 1. Console is IPMI emulated serial console. Super slow. Also >netconsole is in use. So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had trouble like that (granted I don't typically trigger OOM storms, but they do occasionally happen). Is your IPMI much slower and not fixable to be faster?
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > 1. Console is IPMI emulated serial console. Super slow. Also >netconsole is in use. So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had trouble like that (granted I don't typically trigger OOM storms, but they do occasionally happen). Is your IPMI much slower and not fixable to be faster?
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 21:52), Steven Rostedt wrote: > On Wed, 24 Jan 2018 11:11:33 +0900 > Sergey Senozhatskywrote: > > > Please take a look. > > Was there something specific to look at? Not really. Just my previous email, basically. You said "I have to look at the latest code." so I replied. Well, if the proposed direction does make sense then I'll send out a patch. > I'm doing a hundred different things at once, and my memory cache... Meltdown vulnerable? Suddenly it all makes sense - you talk too fast because of speculative execution... ;) -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 21:52), Steven Rostedt wrote: > On Wed, 24 Jan 2018 11:11:33 +0900 > Sergey Senozhatsky wrote: > > > Please take a look. > > Was there something specific to look at? Not really. Just my previous email, basically. You said "I have to look at the latest code." so I replied. Well, if the proposed direction does make sense then I'll send out a patch. > I'm doing a hundred different things at once, and my memory cache... Meltdown vulnerable? Suddenly it all makes sense - you talk too fast because of speculative execution... ;) -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 24 Jan 2018 11:11:33 +0900 Sergey Senozhatskywrote: > Please take a look. Was there something specific to look at? I'm doing a hundred different things at once, and my memory cache keeps getting flushed. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 24 Jan 2018 11:11:33 +0900 Sergey Senozhatsky wrote: > Please take a look. Was there something specific to look at? I'm doing a hundred different things at once, and my memory cache keeps getting flushed. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, On (01/23/18 11:24), Steven Rostedt wrote: [..] > > With WQ we don't lockup the kernel, because we flush printk_safe in > > preemptible context. And people are very much expected to fix the > > misbehaving consoles. But that should not be printk_safe problem. > > Right, but now you just made printk safe unreliable to get information > out, because you need to wait for a schedule to occur, and if there's > issues, like a deadlock, that thread will never run. And you just lost > you lockdep splat. Yes and No. printk_safe and printk_nmi are unreliable - both need irq_work. That's why we forcibly flush those buffers in panic(). At least for printk_safe case, and I'm pretty sure the same stands for printk_nmi, we never said that we will store all the messages that were printed from unsafe context (recursion or NMI). The only thing we said - we will try not to deadlock the system. Now we are adding one more thing to printk_safe - we will also try not to lockup the system. Default printk_safe buffer size might not be enough to store a very large lockdep splat. And we will report that the buffer is too small and that we lost some of the lines: "here is what we have, we lost N lines, but at least we didn't deadlock the system". See f975237b76827956fe13ecfe993a319158e2c303 for more details, it contains a list of recursive-printk deadlock scenarios that printk_safe was meant to handle. It is possible and OK to lose messages in printk_safe/printk_nmi printk_safe_enter_irqsave() printk printk ... ... printk printk printk_safe_exit_irqrestore() No flush will take place as long as there is no IRQ on that CPU. But printk_safe and printk_nmi are solving different problem in the first place. > > I'll re-read this one tomorrow. Not quite following it. > > I'll add more capitals next time ;-) Ha-ha-ha ;) [..] > > pintk_safe was designed to be recursive. It was never designed to be > > used to troubleshoot or debug consoles. But it was designed to be > > recursive - because that's the sort of the problems it was meant to > > handle: recursive printks that would otherwise deadlock us. That's why > > we have it in the first place. > > So printk safe is only triggered when at the same context? If we can > guarantee that printk safe is triggered only when its because a printk > is happening at the same context (not because of an interrupt, but > really at the same context, using my context check), then I'm fine with > delaying them to a work queue. printk_safe is for printk recursion only. It happens in the same context only. When we switch to printk_safe we disable local IRQs, NMIs have their own printk_nmi thing. And the way we flush printk_safe is mostly recursive. Because we flush when we know that we will not deadlock [as much as we can; we can't control any 3rd party locks which might be involved; thus printk_deferred() usage]. Usually it's something like printk spin_lock_irqsave(logbuf_lock) printk spin_lock_irqsave(logbuf_lock) << deadlock What we have with printk_safe is printk local_irq_save printk_safe_enter spin_lock(logbuf_lock) printk vprintk_safe queue irq work spin_unlock(logbuf_lock) printk_safe_exit local_irq_restore >>> IRQ work printk_safe_flush printk spin_lock_irqsave(logbuf_lock) log_store() spin_unlock_irqrestore(logbuf_lock) So we flush printk_safe ASAP, which usually (unless originally we were not in IRQ) means that the flush is recursive, but safe - we don't deadlock. > That is, if we have this: > > printk() > console_lock() > > printk() > add to log buffer > > console_unlock(); Right. This is what we have right now. Every time we enable local IRQs in the console_unlock() printing loop - we flush printk_safe. And that's the problem. > printk() > console_lock() > > put in printk safe buffer > trigger work queue > console_unlock() > > flush safe buffer > printk() Right. This is what we will have with WQ. We don't flush printk_safe until we return from console_unlock(). Because printk() disables preemption for the duration of console_unlock(), we can't schedule WQ on that CPU. And we schedule flushing work only on the CPU that has triggered the recursion. Another thing: console_lock() blah blah console_unlock() In this case we will flush printk_safe withing the printing loop. Immediately. But we don't care - the CPU is preemptible, we don't lock up the kernel. > Then I'm fine with that. > > I have to look at the latest code. If this is indeed what we have, then > I admit I misunderstood the problem you want to solve. > > I only want recursive printks (those that are
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, On (01/23/18 11:24), Steven Rostedt wrote: [..] > > With WQ we don't lockup the kernel, because we flush printk_safe in > > preemptible context. And people are very much expected to fix the > > misbehaving consoles. But that should not be printk_safe problem. > > Right, but now you just made printk safe unreliable to get information > out, because you need to wait for a schedule to occur, and if there's > issues, like a deadlock, that thread will never run. And you just lost > you lockdep splat. Yes and No. printk_safe and printk_nmi are unreliable - both need irq_work. That's why we forcibly flush those buffers in panic(). At least for printk_safe case, and I'm pretty sure the same stands for printk_nmi, we never said that we will store all the messages that were printed from unsafe context (recursion or NMI). The only thing we said - we will try not to deadlock the system. Now we are adding one more thing to printk_safe - we will also try not to lockup the system. Default printk_safe buffer size might not be enough to store a very large lockdep splat. And we will report that the buffer is too small and that we lost some of the lines: "here is what we have, we lost N lines, but at least we didn't deadlock the system". See f975237b76827956fe13ecfe993a319158e2c303 for more details, it contains a list of recursive-printk deadlock scenarios that printk_safe was meant to handle. It is possible and OK to lose messages in printk_safe/printk_nmi printk_safe_enter_irqsave() printk printk ... ... printk printk printk_safe_exit_irqrestore() No flush will take place as long as there is no IRQ on that CPU. But printk_safe and printk_nmi are solving different problem in the first place. > > I'll re-read this one tomorrow. Not quite following it. > > I'll add more capitals next time ;-) Ha-ha-ha ;) [..] > > pintk_safe was designed to be recursive. It was never designed to be > > used to troubleshoot or debug consoles. But it was designed to be > > recursive - because that's the sort of the problems it was meant to > > handle: recursive printks that would otherwise deadlock us. That's why > > we have it in the first place. > > So printk safe is only triggered when at the same context? If we can > guarantee that printk safe is triggered only when its because a printk > is happening at the same context (not because of an interrupt, but > really at the same context, using my context check), then I'm fine with > delaying them to a work queue. printk_safe is for printk recursion only. It happens in the same context only. When we switch to printk_safe we disable local IRQs, NMIs have their own printk_nmi thing. And the way we flush printk_safe is mostly recursive. Because we flush when we know that we will not deadlock [as much as we can; we can't control any 3rd party locks which might be involved; thus printk_deferred() usage]. Usually it's something like printk spin_lock_irqsave(logbuf_lock) printk spin_lock_irqsave(logbuf_lock) << deadlock What we have with printk_safe is printk local_irq_save printk_safe_enter spin_lock(logbuf_lock) printk vprintk_safe queue irq work spin_unlock(logbuf_lock) printk_safe_exit local_irq_restore >>> IRQ work printk_safe_flush printk spin_lock_irqsave(logbuf_lock) log_store() spin_unlock_irqrestore(logbuf_lock) So we flush printk_safe ASAP, which usually (unless originally we were not in IRQ) means that the flush is recursive, but safe - we don't deadlock. > That is, if we have this: > > printk() > console_lock() > > printk() > add to log buffer > > console_unlock(); Right. This is what we have right now. Every time we enable local IRQs in the console_unlock() printing loop - we flush printk_safe. And that's the problem. > printk() > console_lock() > > put in printk safe buffer > trigger work queue > console_unlock() > > flush safe buffer > printk() Right. This is what we will have with WQ. We don't flush printk_safe until we return from console_unlock(). Because printk() disables preemption for the duration of console_unlock(), we can't schedule WQ on that CPU. And we schedule flushing work only on the CPU that has triggered the recursion. Another thing: console_lock() blah blah console_unlock() In this case we will flush printk_safe withing the printing loop. Immediately. But we don't care - the CPU is preemptible, we don't lock up the kernel. > Then I'm fine with that. > > I have to look at the latest code. If this is indeed what we have, then > I admit I misunderstood the problem you want to solve. > > I only want recursive printks (those that are
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Sergey. On Wed, Jan 24, 2018 at 01:01:53AM +0900, Sergey Senozhatsky wrote: > On (01/23/18 10:41), Steven Rostedt wrote: > [..] > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > Dropping the messages is not the solution either. The original bug report > report was - this "locks up my kernel". That's it. That's all people asked > us to solve. > > With WQ we don't lockup the kernel, because we flush printk_safe in > preemptible context. And people are very much expected to fix the > misbehaving consoles. But that should not be printk_safe problem. I really don't care as long as it's robust enough. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Sergey. On Wed, Jan 24, 2018 at 01:01:53AM +0900, Sergey Senozhatsky wrote: > On (01/23/18 10:41), Steven Rostedt wrote: > [..] > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > Dropping the messages is not the solution either. The original bug report > report was - this "locks up my kernel". That's it. That's all people asked > us to solve. > > With WQ we don't lockup the kernel, because we flush printk_safe in > preemptible context. And people are very much expected to fix the > misbehaving consoles. But that should not be printk_safe problem. I really don't care as long as it's robust enough. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hey, On Tue, Jan 23, 2018 at 11:13:30AM -0500, Steven Rostedt wrote: > From what I understand is that there's an issue with one of the printk > consoles, due to memory pressure or whatnot. Then a printk happens > within a printk recursively. It gets put into the safe buffer and an > irq is sent to printk this printk. > > The issue you are saying is that when the printk enables interrupts, > the irq work triggers and loads the log buffer with the safe buffer, and > then the printk sees the new data added and continues to print, and > hence never leaves this printk. I'm not sure it's irq or the same calling context, but yeah whatever it may be, it keeps adding new data. > Your solution is to delay the flushing of the safe buffer to another > thread (work queue), which I also have issues with, because you break > the "get printks out ASAP mantra". Then the work queue comes in and > flushes the printks. And since the printks cause printks, we continue > to spam the machine, but hey, we are making forward progress. I'm not sure "get printks out ASAP mantra" is the overriding concern after spending 20s flushing in an unknown context. I'm honestly curious. Would that still matter that much at that point? I went through the recent common crashes in the fleet earlier today and a good number of them are printk taking too long unnecessarily escalating the situation (most commonly triggering NMI watchdog). I'm not saying that this should override other concerns but it seems clear to me that we're pretty badly exposed on this front. > Again, this is treating the symptom and not solving the problem. Or adding a safety net when things go south, but this isn't what I was trying to argue. I mostly thought your understanding of what I reported wasn't accurate and wanted to clear that up. > I really hate delaying printks to another thread, unless we can > guarantee that that thread is ready to go immediately (basically > spinning on a run queue waiting to print). Because if the system is > having issues (which is the main reason for printks to happen), there's > no guarantee that a work queue or another thread will ever schedule, > and the safe printk buffer never gets out to the consoles. > > I much rather have throttling when recursive printks are detected. > Make it a 100 lines to print if you want, but then throttle. Because > once you have 100 lines or so, you will know that printks are causing > printks, and you don't give a crap about the repeated process. Allow > one flushing of the printk safe buffers, and then if it happens again, > throttle it. > > Both methods can lose important data. I believe the throttling of > recursive printks, after 100 prints or whatever, will be the least > likely to lose important data, because printks caused by printks will > just keep repeating the same data, and we don't care about repeats. But > delaying the flushing could very well lose important data that caused > a lockup. Hmmm... what you're suggesting still seems more fragile - ie. when does that 100 count get reset? OOM prints quite a few lines and if we're resetting on each line, that two order explosion of messages can still be really really bad. And issues like that seem to suggest that the root problem to handle here is avoiding locking up a context in flushing for too long. Your approach is trying to avoid causing that but it's a symptom which can be reached in many different ways. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hey, On Tue, Jan 23, 2018 at 11:13:30AM -0500, Steven Rostedt wrote: > From what I understand is that there's an issue with one of the printk > consoles, due to memory pressure or whatnot. Then a printk happens > within a printk recursively. It gets put into the safe buffer and an > irq is sent to printk this printk. > > The issue you are saying is that when the printk enables interrupts, > the irq work triggers and loads the log buffer with the safe buffer, and > then the printk sees the new data added and continues to print, and > hence never leaves this printk. I'm not sure it's irq or the same calling context, but yeah whatever it may be, it keeps adding new data. > Your solution is to delay the flushing of the safe buffer to another > thread (work queue), which I also have issues with, because you break > the "get printks out ASAP mantra". Then the work queue comes in and > flushes the printks. And since the printks cause printks, we continue > to spam the machine, but hey, we are making forward progress. I'm not sure "get printks out ASAP mantra" is the overriding concern after spending 20s flushing in an unknown context. I'm honestly curious. Would that still matter that much at that point? I went through the recent common crashes in the fleet earlier today and a good number of them are printk taking too long unnecessarily escalating the situation (most commonly triggering NMI watchdog). I'm not saying that this should override other concerns but it seems clear to me that we're pretty badly exposed on this front. > Again, this is treating the symptom and not solving the problem. Or adding a safety net when things go south, but this isn't what I was trying to argue. I mostly thought your understanding of what I reported wasn't accurate and wanted to clear that up. > I really hate delaying printks to another thread, unless we can > guarantee that that thread is ready to go immediately (basically > spinning on a run queue waiting to print). Because if the system is > having issues (which is the main reason for printks to happen), there's > no guarantee that a work queue or another thread will ever schedule, > and the safe printk buffer never gets out to the consoles. > > I much rather have throttling when recursive printks are detected. > Make it a 100 lines to print if you want, but then throttle. Because > once you have 100 lines or so, you will know that printks are causing > printks, and you don't give a crap about the repeated process. Allow > one flushing of the printk safe buffers, and then if it happens again, > throttle it. > > Both methods can lose important data. I believe the throttling of > recursive printks, after 100 prints or whatever, will be the least > likely to lose important data, because printks caused by printks will > just keep repeating the same data, and we don't care about repeats. But > delaying the flushing could very well lose important data that caused > a lockup. Hmmm... what you're suggesting still seems more fragile - ie. when does that 100 count get reset? OOM prints quite a few lines and if we're resetting on each line, that two order explosion of messages can still be really really bad. And issues like that seem to suggest that the root problem to handle here is avoiding locking up a context in flushing for too long. Your approach is trying to avoid causing that but it's a symptom which can be reached in many different ways. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 24 Jan 2018 01:01:53 +0900 Sergey Senozhatskywrote: > On (01/23/18 10:41), Steven Rostedt wrote: > [..] > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > Dropping the messages is not the solution either. The original bug report > report was - this "locks up my kernel". That's it. That's all people asked > us to solve. And throttling the printks would stop the lock up too. > > With WQ we don't lockup the kernel, because we flush printk_safe in > preemptible context. And people are very much expected to fix the > misbehaving consoles. But that should not be printk_safe problem. Right, but now you just made printk safe unreliable to get information out, because you need to wait for a schedule to occur, and if there's issues, like a deadlock, that thread will never run. And you just lost you lockdep splat. > > > A printk causing a printk is a special case, and we need to just show > > enough to let the user know that its happening, and why printks are > > being throttled. Yes, we may lose data, but if every printk that goes > > out causes another printk, then there's going to be so much noise that > > we wont know what other things went wrong. Honestly, if someone showed > > me a report where the logs were filled with printks that caused > > printks, I'd stop right there and tell them that needs to be fixed > > before we do anything else. And if that recursion is happening because > > of another problem, I don't want to see the recursion printks. I want > > to see the printks that show what is causing the recursions. > > I'll re-read this one tomorrow. Not quite following it. I'll add more capitals next time ;-) > > > > The problem is - we flush printk_safe too soon and printing CPU ends up > > > in a lockup - it log_store()-s new messages while it's printing the > > > pending > > > > No, the problem is that printks are causing more printks. Yes that will > > make flushing them soon more likely to lock up the system. But that's > > not the problem. The problem is printks causing printks. > > Yes. And ignoring those printk()-s by simply dropping them does not fix > the problem by any means. How so? If we drop them, then the stuck printk has nothing to print and will move forward. I say once you start dropping printks due to recursion, keep dropping them. For at least a second, to allow them to stop killing the machine. > > > > ones. It's fine to do so when CPU is in preemptible context. Really, we > > > should not care in printk_safe as long as we don't lockup the kernel. The > > > misbehaving console must be fixed. If CPU is not in preemptible context > > > then > > > we do lockup the kernel. Because we flush printk_safe regardless of the > > > current CPU context. If we will flush printk_safe via WQ then we > > > automatically > > > > And if we can throttle recursive printks, then we should be able to > > stop that from happening. > > pintk_safe was designed to be recursive. It was never designed to be > used to troubleshoot or debug consoles. But it was designed to be > recursive - because that's the sort of the problems it was meant to > handle: recursive printks that would otherwise deadlock us. That's why > we have it in the first place. So printk safe is only triggered when at the same context? If we can guarantee that printk safe is triggered only when its because a printk is happening at the same context (not because of an interrupt, but really at the same context, using my context check), then I'm fine with delaying them to a work queue. That is, if we have this: printk() console_lock() printk() add to log buffer console_unlock(); printk() console_lock() put in printk safe buffer trigger work queue console_unlock() flush safe buffer printk() Then I'm fine with that. I have to look at the latest code. If this is indeed what we have, then I admit I misunderstood the problem you want to solve. I only want recursive printks (those that are actually triggered by doing a printk) to be allowed to be delayed. Make sense? -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 24 Jan 2018 01:01:53 +0900 Sergey Senozhatsky wrote: > On (01/23/18 10:41), Steven Rostedt wrote: > [..] > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > Dropping the messages is not the solution either. The original bug report > report was - this "locks up my kernel". That's it. That's all people asked > us to solve. And throttling the printks would stop the lock up too. > > With WQ we don't lockup the kernel, because we flush printk_safe in > preemptible context. And people are very much expected to fix the > misbehaving consoles. But that should not be printk_safe problem. Right, but now you just made printk safe unreliable to get information out, because you need to wait for a schedule to occur, and if there's issues, like a deadlock, that thread will never run. And you just lost you lockdep splat. > > > A printk causing a printk is a special case, and we need to just show > > enough to let the user know that its happening, and why printks are > > being throttled. Yes, we may lose data, but if every printk that goes > > out causes another printk, then there's going to be so much noise that > > we wont know what other things went wrong. Honestly, if someone showed > > me a report where the logs were filled with printks that caused > > printks, I'd stop right there and tell them that needs to be fixed > > before we do anything else. And if that recursion is happening because > > of another problem, I don't want to see the recursion printks. I want > > to see the printks that show what is causing the recursions. > > I'll re-read this one tomorrow. Not quite following it. I'll add more capitals next time ;-) > > > > The problem is - we flush printk_safe too soon and printing CPU ends up > > > in a lockup - it log_store()-s new messages while it's printing the > > > pending > > > > No, the problem is that printks are causing more printks. Yes that will > > make flushing them soon more likely to lock up the system. But that's > > not the problem. The problem is printks causing printks. > > Yes. And ignoring those printk()-s by simply dropping them does not fix > the problem by any means. How so? If we drop them, then the stuck printk has nothing to print and will move forward. I say once you start dropping printks due to recursion, keep dropping them. For at least a second, to allow them to stop killing the machine. > > > > ones. It's fine to do so when CPU is in preemptible context. Really, we > > > should not care in printk_safe as long as we don't lockup the kernel. The > > > misbehaving console must be fixed. If CPU is not in preemptible context > > > then > > > we do lockup the kernel. Because we flush printk_safe regardless of the > > > current CPU context. If we will flush printk_safe via WQ then we > > > automatically > > > > And if we can throttle recursive printks, then we should be able to > > stop that from happening. > > pintk_safe was designed to be recursive. It was never designed to be > used to troubleshoot or debug consoles. But it was designed to be > recursive - because that's the sort of the problems it was meant to > handle: recursive printks that would otherwise deadlock us. That's why > we have it in the first place. So printk safe is only triggered when at the same context? If we can guarantee that printk safe is triggered only when its because a printk is happening at the same context (not because of an interrupt, but really at the same context, using my context check), then I'm fine with delaying them to a work queue. That is, if we have this: printk() console_lock() printk() add to log buffer console_unlock(); printk() console_lock() put in printk safe buffer trigger work queue console_unlock() flush safe buffer printk() Then I'm fine with that. I have to look at the latest code. If this is indeed what we have, then I admit I misunderstood the problem you want to solve. I only want recursive printks (those that are actually triggered by doing a printk) to be allowed to be delayed. Make sense? -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue, 23 Jan 2018 07:43:47 -0800 Tejun Heowrote: > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. > Does it? >From what I understand is that there's an issue with one of the printk consoles, due to memory pressure or whatnot. Then a printk happens within a printk recursively. It gets put into the safe buffer and an irq is sent to printk this printk. The issue you are saying is that when the printk enables interrupts, the irq work triggers and loads the log buffer with the safe buffer, and then the printk sees the new data added and continues to print, and hence never leaves this printk. Your solution is to delay the flushing of the safe buffer to another thread (work queue), which I also have issues with, because you break the "get printks out ASAP mantra". Then the work queue comes in and flushes the printks. And since the printks cause printks, we continue to spam the machine, but hey, we are making forward progress. Again, this is treating the symptom and not solving the problem. I really hate delaying printks to another thread, unless we can guarantee that that thread is ready to go immediately (basically spinning on a run queue waiting to print). Because if the system is having issues (which is the main reason for printks to happen), there's no guarantee that a work queue or another thread will ever schedule, and the safe printk buffer never gets out to the consoles. I much rather have throttling when recursive printks are detected. Make it a 100 lines to print if you want, but then throttle. Because once you have 100 lines or so, you will know that printks are causing printks, and you don't give a crap about the repeated process. Allow one flushing of the printk safe buffers, and then if it happens again, throttle it. Both methods can lose important data. I believe the throttling of recursive printks, after 100 prints or whatever, will be the least likely to lose important data, because printks caused by printks will just keep repeating the same data, and we don't care about repeats. But delaying the flushing could very well lose important data that caused a lockup. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue, 23 Jan 2018 07:43:47 -0800 Tejun Heo wrote: > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. > Does it? >From what I understand is that there's an issue with one of the printk consoles, due to memory pressure or whatnot. Then a printk happens within a printk recursively. It gets put into the safe buffer and an irq is sent to printk this printk. The issue you are saying is that when the printk enables interrupts, the irq work triggers and loads the log buffer with the safe buffer, and then the printk sees the new data added and continues to print, and hence never leaves this printk. Your solution is to delay the flushing of the safe buffer to another thread (work queue), which I also have issues with, because you break the "get printks out ASAP mantra". Then the work queue comes in and flushes the printks. And since the printks cause printks, we continue to spam the machine, but hey, we are making forward progress. Again, this is treating the symptom and not solving the problem. I really hate delaying printks to another thread, unless we can guarantee that that thread is ready to go immediately (basically spinning on a run queue waiting to print). Because if the system is having issues (which is the main reason for printks to happen), there's no guarantee that a work queue or another thread will ever schedule, and the safe printk buffer never gets out to the consoles. I much rather have throttling when recursive printks are detected. Make it a 100 lines to print if you want, but then throttle. Because once you have 100 lines or so, you will know that printks are causing printks, and you don't give a crap about the repeated process. Allow one flushing of the printk safe buffers, and then if it happens again, throttle it. Both methods can lose important data. I believe the throttling of recursive printks, after 100 prints or whatever, will be the least likely to lose important data, because printks caused by printks will just keep repeating the same data, and we don't care about repeats. But delaying the flushing could very well lose important data that caused a lockup. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Tejun On (01/23/18 07:43), Tejun Heo wrote: > Hello, Steven. > > On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote: > > > I don't want to have heuristics in print_safe, I don't want to have a > > > magic > > > number controlled by a user-space visible knob, I don't want to have the > > > first 3 lines of a lockdep splat. > > > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. Would you, as the original bug reporter, be OK if we flush printk_safe (only printk_safe, not printk_nmi for the time being) via WQ? This should move that "uncontrolled" flush to a safe context. I don't think we can easily add kthread offloading to printk at the moment (this will result in a massive gun fight). Just in case, below is something like a patch. I think I worked around the possible wq deadlock scenario. But I haven't tested the patch yet. It's a bit late here and I guess I need some rest. Will try to look more at it tomorrow. From: Sergey SenozhatskySubject: [PATCH] printk/safe: split flush works --- kernel/printk/printk_safe.c | 75 + 1 file changed, 63 insertions(+), 12 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..54bc40ce3c34 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "internal.h" @@ -49,7 +50,10 @@ static int printk_safe_irq_ready __read_mostly; struct printk_safe_seq_buf { atomic_tlen;/* length of written data */ atomic_tmessage_lost; - struct irq_work work; /* IRQ work that flushes the buffer */ + /* IRQ work that flushes NMI buffer */ + struct irq_work irq_flush_work; + /* WQ work that flushes SAFE buffer */ + struct work_struct wq_flush_work; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,10 +65,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) - irq_work_queue(>work); + irq_work_queue(>irq_flush_work); +} + +static void queue_wq_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + >wq_flush_work); } /* @@ -89,7 +101,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(>message_lost); - queue_flush_work(s); + queue_irq_flush_work(s); return 0; } @@ -112,7 +124,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(>len, len, len + add) != len) goto again; - queue_flush_work(s); + queue_irq_flush_work(s); return add; } @@ -186,12 +198,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s) * Flush data from the associated per-CPU buffer. The function * can be called either via IRQ work or independently. */ -static void __printk_safe_flush(struct irq_work *work) +static void __printk_safe_flush(struct printk_safe_seq_buf *s) { static raw_spinlock_t read_lock = __RAW_SPIN_LOCK_INITIALIZER(read_lock); - struct printk_safe_seq_buf *s = - container_of(work, struct printk_safe_seq_buf, work); unsigned long flags; size_t len; int i; @@ -243,6 +253,46 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(_lock, flags); } +static void irq_flush_work_fn(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, irq_flush_work); + + __printk_safe_flush(s); +} + +/* + * We can't queue wq work directly from vprintk_safe(), because we can + * deadlock.
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Tejun On (01/23/18 07:43), Tejun Heo wrote: > Hello, Steven. > > On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote: > > > I don't want to have heuristics in print_safe, I don't want to have a > > > magic > > > number controlled by a user-space visible knob, I don't want to have the > > > first 3 lines of a lockdep splat. > > > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. Would you, as the original bug reporter, be OK if we flush printk_safe (only printk_safe, not printk_nmi for the time being) via WQ? This should move that "uncontrolled" flush to a safe context. I don't think we can easily add kthread offloading to printk at the moment (this will result in a massive gun fight). Just in case, below is something like a patch. I think I worked around the possible wq deadlock scenario. But I haven't tested the patch yet. It's a bit late here and I guess I need some rest. Will try to look more at it tomorrow. From: Sergey Senozhatsky Subject: [PATCH] printk/safe: split flush works --- kernel/printk/printk_safe.c | 75 + 1 file changed, 63 insertions(+), 12 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..54bc40ce3c34 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "internal.h" @@ -49,7 +50,10 @@ static int printk_safe_irq_ready __read_mostly; struct printk_safe_seq_buf { atomic_tlen;/* length of written data */ atomic_tmessage_lost; - struct irq_work work; /* IRQ work that flushes the buffer */ + /* IRQ work that flushes NMI buffer */ + struct irq_work irq_flush_work; + /* WQ work that flushes SAFE buffer */ + struct work_struct wq_flush_work; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,10 +65,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) - irq_work_queue(>work); + irq_work_queue(>irq_flush_work); +} + +static void queue_wq_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + >wq_flush_work); } /* @@ -89,7 +101,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(>message_lost); - queue_flush_work(s); + queue_irq_flush_work(s); return 0; } @@ -112,7 +124,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(>len, len, len + add) != len) goto again; - queue_flush_work(s); + queue_irq_flush_work(s); return add; } @@ -186,12 +198,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s) * Flush data from the associated per-CPU buffer. The function * can be called either via IRQ work or independently. */ -static void __printk_safe_flush(struct irq_work *work) +static void __printk_safe_flush(struct printk_safe_seq_buf *s) { static raw_spinlock_t read_lock = __RAW_SPIN_LOCK_INITIALIZER(read_lock); - struct printk_safe_seq_buf *s = - container_of(work, struct printk_safe_seq_buf, work); unsigned long flags; size_t len; int i; @@ -243,6 +253,46 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(_lock, flags); } +static void irq_flush_work_fn(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, irq_flush_work); + + __printk_safe_flush(s); +} + +/* + * We can't queue wq work directly from vprintk_safe(), because we can + * deadlock. For instance: + * + *
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 10:41), Steven Rostedt wrote: [..] > We can have more. But if printk is causing printks, that's a major bug. > And work queues are not going to fix it, it will just spread out the > pain. Have it be 100 printks, it needs to be fixed if it is happening. > And having all printks just generate more printks is not helpful. Even > if we slow them down. They will still never end. Dropping the messages is not the solution either. The original bug report report was - this "locks up my kernel". That's it. That's all people asked us to solve. With WQ we don't lockup the kernel, because we flush printk_safe in preemptible context. And people are very much expected to fix the misbehaving consoles. But that should not be printk_safe problem. > A printk causing a printk is a special case, and we need to just show > enough to let the user know that its happening, and why printks are > being throttled. Yes, we may lose data, but if every printk that goes > out causes another printk, then there's going to be so much noise that > we wont know what other things went wrong. Honestly, if someone showed > me a report where the logs were filled with printks that caused > printks, I'd stop right there and tell them that needs to be fixed > before we do anything else. And if that recursion is happening because > of another problem, I don't want to see the recursion printks. I want > to see the printks that show what is causing the recursions. I'll re-read this one tomorrow. Not quite following it. > > The problem is - we flush printk_safe too soon and printing CPU ends up > > in a lockup - it log_store()-s new messages while it's printing the pending > > No, the problem is that printks are causing more printks. Yes that will > make flushing them soon more likely to lock up the system. But that's > not the problem. The problem is printks causing printks. Yes. And ignoring those printk()-s by simply dropping them does not fix the problem by any means. > > ones. It's fine to do so when CPU is in preemptible context. Really, we > > should not care in printk_safe as long as we don't lockup the kernel. The > > misbehaving console must be fixed. If CPU is not in preemptible context then > > we do lockup the kernel. Because we flush printk_safe regardless of the > > current CPU context. If we will flush printk_safe via WQ then we > > automatically > > And if we can throttle recursive printks, then we should be able to > stop that from happening. pintk_safe was designed to be recursive. It was never designed to be used to troubleshoot or debug consoles. But it was designed to be recursive - because that's the sort of the problems it was meant to handle: recursive printks that would otherwise deadlock us. That's why we have it in the first place. > > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, > > we > > will not lockup it up." thing. Yes, we fill up the logbuf with probably > > needed > > and appreciated or unneeded messages. But we should not care in printk_safe. > > We don't lockup the kernel... And the misbehaving console must be fixed. > > I agree. Good. > > I disagree with "If we are having issues with irq_work, we are going to have > > issues with a work queue". There is a tremendous difference between irq_work > > on that CPU and queue_work_on(smp_proessor_id()). One does not care about > > CPU > > context, the other one does. > > But switching to work queue does not address the underlining problem > that printks are causing more printks. The only way to address those problems is to fix the console. That's the only. But that's not what I'm doing with my proposal. I fix the lockup scenario, the only reported problem so far. Whilst also keeping printk_safe around. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 10:41), Steven Rostedt wrote: [..] > We can have more. But if printk is causing printks, that's a major bug. > And work queues are not going to fix it, it will just spread out the > pain. Have it be 100 printks, it needs to be fixed if it is happening. > And having all printks just generate more printks is not helpful. Even > if we slow them down. They will still never end. Dropping the messages is not the solution either. The original bug report report was - this "locks up my kernel". That's it. That's all people asked us to solve. With WQ we don't lockup the kernel, because we flush printk_safe in preemptible context. And people are very much expected to fix the misbehaving consoles. But that should not be printk_safe problem. > A printk causing a printk is a special case, and we need to just show > enough to let the user know that its happening, and why printks are > being throttled. Yes, we may lose data, but if every printk that goes > out causes another printk, then there's going to be so much noise that > we wont know what other things went wrong. Honestly, if someone showed > me a report where the logs were filled with printks that caused > printks, I'd stop right there and tell them that needs to be fixed > before we do anything else. And if that recursion is happening because > of another problem, I don't want to see the recursion printks. I want > to see the printks that show what is causing the recursions. I'll re-read this one tomorrow. Not quite following it. > > The problem is - we flush printk_safe too soon and printing CPU ends up > > in a lockup - it log_store()-s new messages while it's printing the pending > > No, the problem is that printks are causing more printks. Yes that will > make flushing them soon more likely to lock up the system. But that's > not the problem. The problem is printks causing printks. Yes. And ignoring those printk()-s by simply dropping them does not fix the problem by any means. > > ones. It's fine to do so when CPU is in preemptible context. Really, we > > should not care in printk_safe as long as we don't lockup the kernel. The > > misbehaving console must be fixed. If CPU is not in preemptible context then > > we do lockup the kernel. Because we flush printk_safe regardless of the > > current CPU context. If we will flush printk_safe via WQ then we > > automatically > > And if we can throttle recursive printks, then we should be able to > stop that from happening. pintk_safe was designed to be recursive. It was never designed to be used to troubleshoot or debug consoles. But it was designed to be recursive - because that's the sort of the problems it was meant to handle: recursive printks that would otherwise deadlock us. That's why we have it in the first place. > > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, > > we > > will not lockup it up." thing. Yes, we fill up the logbuf with probably > > needed > > and appreciated or unneeded messages. But we should not care in printk_safe. > > We don't lockup the kernel... And the misbehaving console must be fixed. > > I agree. Good. > > I disagree with "If we are having issues with irq_work, we are going to have > > issues with a work queue". There is a tremendous difference between irq_work > > on that CPU and queue_work_on(smp_proessor_id()). One does not care about > > CPU > > context, the other one does. > > But switching to work queue does not address the underlining problem > that printks are causing more printks. The only way to address those problems is to fix the console. That's the only. But that's not what I'm doing with my proposal. I fix the lockup scenario, the only reported problem so far. Whilst also keeping printk_safe around. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Steven. On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote: > > I don't want to have heuristics in print_safe, I don't want to have a magic > > number controlled by a user-space visible knob, I don't want to have the > > first 3 lines of a lockdep splat. > > We can have more. But if printk is causing printks, that's a major bug. > And work queues are not going to fix it, it will just spread out the > pain. Have it be 100 printks, it needs to be fixed if it is happening. > And having all printks just generate more printks is not helpful. Even > if we slow them down. They will still never end. So, at least in the case that we were seeing, it isn't that black and white. printk keeps causing printks but only because printk buffer flushing is preventing the printk'ing context from making forward progress. The key problem there is that a flushing context may get pinned flushing indefinitely and using a separate context does solve the problem. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Steven. On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote: > > I don't want to have heuristics in print_safe, I don't want to have a magic > > number controlled by a user-space visible knob, I don't want to have the > > first 3 lines of a lockdep splat. > > We can have more. But if printk is causing printks, that's a major bug. > And work queues are not going to fix it, it will just spread out the > pain. Have it be 100 printks, it needs to be fixed if it is happening. > And having all printks just generate more printks is not helpful. Even > if we slow them down. They will still never end. So, at least in the case that we were seeing, it isn't that black and white. printk keeps causing printks but only because printk buffer flushing is preventing the printk'ing context from making forward progress. The key problem there is that a flushing context may get pinned flushing indefinitely and using a separate context does solve the problem. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 24 Jan 2018 00:21:30 +0900 Sergey Senozhatskywrote: > On (01/23/18 09:56), Steven Rostedt wrote: > [..] > > > Why do we even use irq_work for printk_safe? > > > > Why not? > > > > Really, I think you are trying to solve a symptom and not the problem. > > If we are having issues with irq_work, we are going to have issues with > > a work queue. It's just spreading out the problem instead of fixing it. > > I don't want to have heuristics in print_safe, I don't want to have a magic > number controlled by a user-space visible knob, I don't want to have the > first 3 lines of a lockdep splat. We can have more. But if printk is causing printks, that's a major bug. And work queues are not going to fix it, it will just spread out the pain. Have it be 100 printks, it needs to be fixed if it is happening. And having all printks just generate more printks is not helpful. Even if we slow them down. They will still never end. A printk causing a printk is a special case, and we need to just show enough to let the user know that its happening, and why printks are being throttled. Yes, we may lose data, but if every printk that goes out causes another printk, then there's going to be so much noise that we wont know what other things went wrong. Honestly, if someone showed me a report where the logs were filled with printks that caused printks, I'd stop right there and tell them that needs to be fixed before we do anything else. And if that recursion is happening because of another problem, I don't want to see the recursion printks. I want to see the printks that show what is causing the recursions. > The problem is - we flush printk_safe too soon and printing CPU ends up > in a lockup - it log_store()-s new messages while it's printing the pending No, the problem is that printks are causing more printks. Yes that will make flushing them soon more likely to lock up the system. But that's not the problem. The problem is printks causing printks. > ones. It's fine to do so when CPU is in preemptible context. Really, we > should not care in printk_safe as long as we don't lockup the kernel. The > misbehaving console must be fixed. If CPU is not in preemptible context then > we do lockup the kernel. Because we flush printk_safe regardless of the > current CPU context. If we will flush printk_safe via WQ then we automatically And if we can throttle recursive printks, then we should be able to stop that from happening. > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we > will not lockup it up." thing. Yes, we fill up the logbuf with probably needed > and appreciated or unneeded messages. But we should not care in printk_safe. > We don't lockup the kernel... And the misbehaving console must be fixed. I agree. > > I disagree with "If we are having issues with irq_work, we are going to have > issues with a work queue". There is a tremendous difference between irq_work > on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU > context, the other one does. But switching to work queue does not address the underlining problem that printks are causing more printks. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 24 Jan 2018 00:21:30 +0900 Sergey Senozhatsky wrote: > On (01/23/18 09:56), Steven Rostedt wrote: > [..] > > > Why do we even use irq_work for printk_safe? > > > > Why not? > > > > Really, I think you are trying to solve a symptom and not the problem. > > If we are having issues with irq_work, we are going to have issues with > > a work queue. It's just spreading out the problem instead of fixing it. > > I don't want to have heuristics in print_safe, I don't want to have a magic > number controlled by a user-space visible knob, I don't want to have the > first 3 lines of a lockdep splat. We can have more. But if printk is causing printks, that's a major bug. And work queues are not going to fix it, it will just spread out the pain. Have it be 100 printks, it needs to be fixed if it is happening. And having all printks just generate more printks is not helpful. Even if we slow them down. They will still never end. A printk causing a printk is a special case, and we need to just show enough to let the user know that its happening, and why printks are being throttled. Yes, we may lose data, but if every printk that goes out causes another printk, then there's going to be so much noise that we wont know what other things went wrong. Honestly, if someone showed me a report where the logs were filled with printks that caused printks, I'd stop right there and tell them that needs to be fixed before we do anything else. And if that recursion is happening because of another problem, I don't want to see the recursion printks. I want to see the printks that show what is causing the recursions. > The problem is - we flush printk_safe too soon and printing CPU ends up > in a lockup - it log_store()-s new messages while it's printing the pending No, the problem is that printks are causing more printks. Yes that will make flushing them soon more likely to lock up the system. But that's not the problem. The problem is printks causing printks. > ones. It's fine to do so when CPU is in preemptible context. Really, we > should not care in printk_safe as long as we don't lockup the kernel. The > misbehaving console must be fixed. If CPU is not in preemptible context then > we do lockup the kernel. Because we flush printk_safe regardless of the > current CPU context. If we will flush printk_safe via WQ then we automatically And if we can throttle recursive printks, then we should be able to stop that from happening. > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we > will not lockup it up." thing. Yes, we fill up the logbuf with probably needed > and appreciated or unneeded messages. But we should not care in printk_safe. > We don't lockup the kernel... And the misbehaving console must be fixed. I agree. > > I disagree with "If we are having issues with irq_work, we are going to have > issues with a work queue". There is a tremendous difference between irq_work > on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU > context, the other one does. But switching to work queue does not address the underlining problem that printks are causing more printks. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 09:56), Steven Rostedt wrote: [..] > > Why do we even use irq_work for printk_safe? > > Why not? > > Really, I think you are trying to solve a symptom and not the problem. > If we are having issues with irq_work, we are going to have issues with > a work queue. It's just spreading out the problem instead of fixing it. I don't want to have heuristics in print_safe, I don't want to have a magic number controlled by a user-space visible knob, I don't want to have the first 3 lines of a lockdep splat. The problem is - we flush printk_safe too soon and printing CPU ends up in a lockup - it log_store()-s new messages while it's printing the pending ones. It's fine to do so when CPU is in preemptible context. Really, we should not care in printk_safe as long as we don't lockup the kernel. The misbehaving console must be fixed. If CPU is not in preemptible context then we do lockup the kernel. Because we flush printk_safe regardless of the current CPU context. If we will flush printk_safe via WQ then we automatically add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we will not lockup it up." thing. Yes, we fill up the logbuf with probably needed and appreciated or unneeded messages. But we should not care in printk_safe. We don't lockup the kernel... And the misbehaving console must be fixed. I disagree with "If we are having issues with irq_work, we are going to have issues with a work queue". There is a tremendous difference between irq_work on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU context, the other one does. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 09:56), Steven Rostedt wrote: [..] > > Why do we even use irq_work for printk_safe? > > Why not? > > Really, I think you are trying to solve a symptom and not the problem. > If we are having issues with irq_work, we are going to have issues with > a work queue. It's just spreading out the problem instead of fixing it. I don't want to have heuristics in print_safe, I don't want to have a magic number controlled by a user-space visible knob, I don't want to have the first 3 lines of a lockdep splat. The problem is - we flush printk_safe too soon and printing CPU ends up in a lockup - it log_store()-s new messages while it's printing the pending ones. It's fine to do so when CPU is in preemptible context. Really, we should not care in printk_safe as long as we don't lockup the kernel. The misbehaving console must be fixed. If CPU is not in preemptible context then we do lockup the kernel. Because we flush printk_safe regardless of the current CPU context. If we will flush printk_safe via WQ then we automatically add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we will not lockup it up." thing. Yes, we fill up the logbuf with probably needed and appreciated or unneeded messages. But we should not care in printk_safe. We don't lockup the kernel... And the misbehaving console must be fixed. I disagree with "If we are having issues with irq_work, we are going to have issues with a work queue". There is a tremendous difference between irq_work on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU context, the other one does. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue, 23 Jan 2018 15:40:23 +0900 Sergey Senozhatskywrote: > Why do we even use irq_work for printk_safe? Why not? Really, I think you are trying to solve a symptom and not the problem. If we are having issues with irq_work, we are going to have issues with a work queue. It's just spreading out the problem instead of fixing it. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue, 23 Jan 2018 15:40:23 +0900 Sergey Senozhatsky wrote: > Why do we even use irq_work for printk_safe? Why not? Really, I think you are trying to solve a symptom and not the problem. If we are having issues with irq_work, we are going to have issues with a work queue. It's just spreading out the problem instead of fixing it. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 15:40), Sergey Senozhatsky wrote: > > Why do we even use irq_work for printk_safe? > ... perhaps because of wq: pool->lock -> printk -> call_console_drivers -> printk -> vprintk_safe -> wq: pool->lock Which is a "many things have gone wrong" type of scenario. Maybe we can workaround it somehow, hm. Tejun, can we have lockless WQ? ;) -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 15:40), Sergey Senozhatsky wrote: > > Why do we even use irq_work for printk_safe? > ... perhaps because of wq: pool->lock -> printk -> call_console_drivers -> printk -> vprintk_safe -> wq: pool->lock Which is a "many things have gone wrong" type of scenario. Maybe we can workaround it somehow, hm. Tejun, can we have lockless WQ? ;) -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 15:40), Sergey Senozhatsky wrote: [..] > Why do we even use irq_work for printk_safe? > > Okay... So, how about this. For printk_safe we use system_wq for flushing. > IOW, we flush from a task running exactly on the same CPU which hit printk > recursion, not from IRQ. From vprintk_safe() recursion, we queue work on > *that* CPU. Which gives us the following thing: if CPU stuck in > console_unlock() loop with preemption disabled, then system_wq does not > schedule on that CPU and we, thus, don't flush printk_safe buffer from that > CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe > buffer, printing extra messages from that CPU will not lock it up, because > it's in preemptible context. > > Thoughts? A slightly reworked version: a) Do not check console_locked b) Do not have irq_work fast path for printk_safe buffer c) Which lets to union WQ/IRQ work structs - we use only IRQ work for NMI buffers, and only WQ work for SAFE buffers d) And also to refactor the code From: Sergey SenozhatskySubject: [PATCH] printk/safe: use system_wq to flush printk_safe buffers --- kernel/printk/printk_safe.c | 52 ++--- 1 file changed, 40 insertions(+), 12 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..6c8c82cedccb 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "internal.h" @@ -49,7 +50,12 @@ static int printk_safe_irq_ready __read_mostly; struct printk_safe_seq_buf { atomic_tlen;/* length of written data */ atomic_tmessage_lost; - struct irq_work work; /* IRQ work that flushes the buffer */ + union { + /* IRQ work that flushes NMI buffer */ + struct irq_work irq_flush_work; + /* WQ work that flushes SAFE buffer */ + struct work_struct wq_flush_work; + }; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,10 +67,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) - irq_work_queue(>work); + irq_work_queue(>irq_flush_work); +} + +static void queue_wq_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + >wq_flush_work); } /* @@ -89,7 +103,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(>message_lost); - queue_flush_work(s); return 0; } @@ -112,7 +125,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(>len, len, len + add) != len) goto again; - queue_flush_work(s); return add; } @@ -186,12 +198,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s) * Flush data from the associated per-CPU buffer. The function * can be called either via IRQ work or independently. */ -static void __printk_safe_flush(struct irq_work *work) +static void __printk_safe_flush(struct printk_safe_seq_buf *s) { static raw_spinlock_t read_lock = __RAW_SPIN_LOCK_INITIALIZER(read_lock); - struct printk_safe_seq_buf *s = - container_of(work, struct printk_safe_seq_buf, work); unsigned long flags; size_t len; int i; @@ -243,6 +253,22 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(_lock, flags); } +static void irq_flush_work_fn(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, irq_flush_work); + + __printk_safe_flush(s); +} + +static void wq_flush_work_fn(struct work_struct *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, wq_flush_work); + + __printk_safe_flush(s); +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -256,9 +282,9 @@ void printk_safe_flush(void) for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI - __printk_safe_flush(_cpu(nmi_print_seq, cpu).work); + __printk_safe_flush(this_cpu_ptr(_print_seq)); #endif - __printk_safe_flush(_cpu(safe_print_seq, cpu).work); + __printk_safe_flush(this_cpu_ptr(_print_seq)); }
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/23/18 15:40), Sergey Senozhatsky wrote: [..] > Why do we even use irq_work for printk_safe? > > Okay... So, how about this. For printk_safe we use system_wq for flushing. > IOW, we flush from a task running exactly on the same CPU which hit printk > recursion, not from IRQ. From vprintk_safe() recursion, we queue work on > *that* CPU. Which gives us the following thing: if CPU stuck in > console_unlock() loop with preemption disabled, then system_wq does not > schedule on that CPU and we, thus, don't flush printk_safe buffer from that > CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe > buffer, printing extra messages from that CPU will not lock it up, because > it's in preemptible context. > > Thoughts? A slightly reworked version: a) Do not check console_locked b) Do not have irq_work fast path for printk_safe buffer c) Which lets to union WQ/IRQ work structs - we use only IRQ work for NMI buffers, and only WQ work for SAFE buffers d) And also to refactor the code From: Sergey Senozhatsky Subject: [PATCH] printk/safe: use system_wq to flush printk_safe buffers --- kernel/printk/printk_safe.c | 52 ++--- 1 file changed, 40 insertions(+), 12 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..6c8c82cedccb 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "internal.h" @@ -49,7 +50,12 @@ static int printk_safe_irq_ready __read_mostly; struct printk_safe_seq_buf { atomic_tlen;/* length of written data */ atomic_tmessage_lost; - struct irq_work work; /* IRQ work that flushes the buffer */ + union { + /* IRQ work that flushes NMI buffer */ + struct irq_work irq_flush_work; + /* WQ work that flushes SAFE buffer */ + struct work_struct wq_flush_work; + }; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,10 +67,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) - irq_work_queue(>work); + irq_work_queue(>irq_flush_work); +} + +static void queue_wq_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + >wq_flush_work); } /* @@ -89,7 +103,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(>message_lost); - queue_flush_work(s); return 0; } @@ -112,7 +125,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(>len, len, len + add) != len) goto again; - queue_flush_work(s); return add; } @@ -186,12 +198,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s) * Flush data from the associated per-CPU buffer. The function * can be called either via IRQ work or independently. */ -static void __printk_safe_flush(struct irq_work *work) +static void __printk_safe_flush(struct printk_safe_seq_buf *s) { static raw_spinlock_t read_lock = __RAW_SPIN_LOCK_INITIALIZER(read_lock); - struct printk_safe_seq_buf *s = - container_of(work, struct printk_safe_seq_buf, work); unsigned long flags; size_t len; int i; @@ -243,6 +253,22 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(_lock, flags); } +static void irq_flush_work_fn(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, irq_flush_work); + + __printk_safe_flush(s); +} + +static void wq_flush_work_fn(struct work_struct *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, wq_flush_work); + + __printk_safe_flush(s); +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -256,9 +282,9 @@ void printk_safe_flush(void) for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI - __printk_safe_flush(_cpu(nmi_print_seq, cpu).work); + __printk_safe_flush(this_cpu_ptr(_print_seq)); #endif - __printk_safe_flush(_cpu(safe_print_seq, cpu).work); + __printk_safe_flush(this_cpu_ptr(_print_seq)); } } @@ -300,6 +326,7 @@
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, On (01/21/18 23:15), Sergey Senozhatsky wrote: [..] > we have printk recursion from console drivers. it's redirected to > printk_safe and we queue an IRQ work to flush the buffer > > printk > console_unlock >call_console_drivers > net_console > printk > printk_save -> irq_work queue > > now console_unlock() enables local IRQs, we have the printk_safe > flush. but printk_safe flush does not call into the console_unlock(), > it uses printk_deferred() version of printk > > IRQ work > > prink_safe_flush > printk_deferred -> irq_work queue > > > so we schedule another IRQ work (deferred printk work), which eventually > tries to lock console_sem > > IRQ work > wake_up_klogd_work_func() > if (console_trylock()) >console_unlock() Why do we even use irq_work for printk_safe? Okay... So, how about this. For printk_safe we use system_wq for flushing. IOW, we flush from a task running exactly on the same CPU which hit printk recursion, not from IRQ. From vprintk_safe() recursion, we queue work on *that* CPU. Which gives us the following thing: if CPU stuck in console_unlock() loop with preemption disabled, then system_wq does not schedule on that CPU and we, thus, don't flush printk_safe buffer from that CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe buffer, printing extra messages from that CPU will not lock it up, because it's in preemptible context. Thoughts? Something like this: From: Sergey SenozhatskySubject: [PATCH] printk/safe: use slowpath flush for printk_safe --- kernel/printk/printk_safe.c | 53 - 1 file changed, 48 insertions(+), 5 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..c641853a5fa9 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,8 @@ #include #include #include +#include +#include #include "internal.h" @@ -50,6 +52,7 @@ struct printk_safe_seq_buf { atomic_tlen;/* length of written data */ atomic_tmessage_lost; struct irq_work work; /* IRQ work that flushes the buffer */ + struct work_struct slowpath_flush_work; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,12 +64,20 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) irq_work_queue(>work); } +static void queue_slowpath_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + >slowpath_flush_work); +} + /* * Add a message to per-CPU context-dependent buffer. NMI and printk-safe * have dedicated buffers, because otherwise printk-safe preempted by @@ -89,7 +100,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(>message_lost); - queue_flush_work(s); + queue_irq_flush_work(s); return 0; } @@ -112,7 +123,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(>len, len, len + add) != len) goto again; - queue_flush_work(s); return add; } @@ -243,6 +253,35 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(_lock, flags); } +/* NMI buffers are always flushed */ +static void flush_nmi_buffer(struct irq_work *work) +{ + __printk_safe_flush(work); +} + +/* printk_safe buffers flushing, on the contrary, can be postponed */ +static void flush_printk_safe_buffer(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, work); + + if (is_console_locked()) { + queue_slowpath_flush_work(s); + return; + } + + __printk_safe_flush(work); +} + +static void slowpath_flush_work_fn(struct work_struct *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, + slowpath_flush_work); + + __printk_safe_flush(>work); +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -300,6 +339,7 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(_print_seq); + queue_irq_flush_work(s); return printk_safe_log_store(s, fmt, args); } @@ -343,6 +383,7 @@
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, On (01/21/18 23:15), Sergey Senozhatsky wrote: [..] > we have printk recursion from console drivers. it's redirected to > printk_safe and we queue an IRQ work to flush the buffer > > printk > console_unlock >call_console_drivers > net_console > printk > printk_save -> irq_work queue > > now console_unlock() enables local IRQs, we have the printk_safe > flush. but printk_safe flush does not call into the console_unlock(), > it uses printk_deferred() version of printk > > IRQ work > > prink_safe_flush > printk_deferred -> irq_work queue > > > so we schedule another IRQ work (deferred printk work), which eventually > tries to lock console_sem > > IRQ work > wake_up_klogd_work_func() > if (console_trylock()) >console_unlock() Why do we even use irq_work for printk_safe? Okay... So, how about this. For printk_safe we use system_wq for flushing. IOW, we flush from a task running exactly on the same CPU which hit printk recursion, not from IRQ. From vprintk_safe() recursion, we queue work on *that* CPU. Which gives us the following thing: if CPU stuck in console_unlock() loop with preemption disabled, then system_wq does not schedule on that CPU and we, thus, don't flush printk_safe buffer from that CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe buffer, printing extra messages from that CPU will not lock it up, because it's in preemptible context. Thoughts? Something like this: From: Sergey Senozhatsky Subject: [PATCH] printk/safe: use slowpath flush for printk_safe --- kernel/printk/printk_safe.c | 53 - 1 file changed, 48 insertions(+), 5 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..c641853a5fa9 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,8 @@ #include #include #include +#include +#include #include "internal.h" @@ -50,6 +52,7 @@ struct printk_safe_seq_buf { atomic_tlen;/* length of written data */ atomic_tmessage_lost; struct irq_work work; /* IRQ work that flushes the buffer */ + struct work_struct slowpath_flush_work; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,12 +64,20 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) irq_work_queue(>work); } +static void queue_slowpath_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + >slowpath_flush_work); +} + /* * Add a message to per-CPU context-dependent buffer. NMI and printk-safe * have dedicated buffers, because otherwise printk-safe preempted by @@ -89,7 +100,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(>message_lost); - queue_flush_work(s); + queue_irq_flush_work(s); return 0; } @@ -112,7 +123,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(>len, len, len + add) != len) goto again; - queue_flush_work(s); return add; } @@ -243,6 +253,35 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(_lock, flags); } +/* NMI buffers are always flushed */ +static void flush_nmi_buffer(struct irq_work *work) +{ + __printk_safe_flush(work); +} + +/* printk_safe buffers flushing, on the contrary, can be postponed */ +static void flush_printk_safe_buffer(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, work); + + if (is_console_locked()) { + queue_slowpath_flush_work(s); + return; + } + + __printk_safe_flush(work); +} + +static void slowpath_flush_work_fn(struct work_struct *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, + slowpath_flush_work); + + __printk_safe_flush(>work); +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -300,6 +339,7 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(_print_seq); + queue_irq_flush_work(s); return printk_safe_log_store(s, fmt, args); } @@ -343,6 +383,7 @@ static __printf(1, 0) int
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/22/18 19:28), Sergey Senozhatsky wrote: > On (01/22/18 17:56), Sergey Senozhatsky wrote: > [..] > > Assume the following, > > But more importantly we are missing another huge thing - console_unlock(). IOW, not every console_unlock() is from vprintk_emit(). We can have console_trylock() -> console_unlock() being from non-preemptible context, etc. And then irq work to flush printk_safe -> printk_deferred all the time. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/22/18 19:28), Sergey Senozhatsky wrote: > On (01/22/18 17:56), Sergey Senozhatsky wrote: > [..] > > Assume the following, > > But more importantly we are missing another huge thing - console_unlock(). IOW, not every console_unlock() is from vprintk_emit(). We can have console_trylock() -> console_unlock() being from non-preemptible context, etc. And then irq work to flush printk_safe -> printk_deferred all the time. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/22/18 17:56), Sergey Senozhatsky wrote: [..] > Assume the following, But more importantly we are missing another huge thing - console_unlock(). Suppose: console_lock(); << preemption >> printk printk .. printk console_unlock() for (;;) { call_console_drivers() dump_stack queue IRQ work IRQ work >> flush_printk_safe printk_deferred() ... printk_deferred() << iret } This should explode: sleepable console_unlock() may reschedule, printk_safe flush bypasses recursion checks. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/22/18 17:56), Sergey Senozhatsky wrote: [..] > Assume the following, But more importantly we are missing another huge thing - console_unlock(). Suppose: console_lock(); << preemption >> printk printk .. printk console_unlock() for (;;) { call_console_drivers() dump_stack queue IRQ work IRQ work >> flush_printk_safe printk_deferred() ... printk_deferred() << iret } This should explode: sleepable console_unlock() may reschedule, printk_safe flush bypasses recursion checks. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/21/18 16:04), Steven Rostedt wrote: [..] > > The problem is that we flush printk_safe right when console_unlock() > > printing > > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs > > were enabled in the first place when the CPU went to console_unlock()]. > > This forces that CPU to loop in console_unlock() as long as we have > > printk-s coming from call_console_drivers(). But we probably can postpone > > printk_safe flush. Basically, we can declare a new rule - we don't flush > > printk_safe buffer as long as console_sem is locked. Because this is how > > that printing CPU stuck in the console_unlock() printing loop. printk_safe > > buffer is very important when it comes to storing a non-repetitive stuff, > > like > > a lockdep splat, which is a single shot event. But the more repetitive the > > message is, like millions of similar kmalloc() dump_stack()-s over and over > > again, the less value in it. We should have printk_safe buffer big enough > > for > > important info, like a lockdep splat, but millions of similar kmalloc() > > messages are pretty invaluable - one is already enough, we can drop the > > rest. > > And we should not flush new messages while there is a CPU looping in > > console_unlock(), because it already has messages to print, which were > > log_store()-ed the normal way. > > The above is really hard to read without any capitalization. Everything > seems to be a run-on sentence and gives me a head ache. So you lost me > there. Apologies. Will improve. > > This is where the "postpone thing" jumps in. so how do we postpone > > printk_safe > > flush. > > > > We can't console_trylock()/console_unlock() in printk_safe flush code. > > But there is a `console_locked' flag and is_console_locked() function which > > tell us if the console_sem is locked. As long as we are in console_unlock() > > printing loop that flag is set, even if we enabled local IRQs and > > printk_safe > > flush work arrived. So now printk_safe flush does extra check and does > > not flush printk_safe buffer content as long as someone is currently > > printing or soon will start printing. But we need to take extra step and > > to re-queue flush on CPUs that did postpone it [console_unlock() can > > reschedule]. So now we flush only when printing CPU printed all pending > > logbuf messages, hit the "console_seq == log_next_seq" and up() > > console_sem. This sets a boundary -- no matter how many times during the > > current printing loop we called console drivers and how many times those > > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. > > Another big paragraph with no capitals (besides macros and CPU ;-) I walked through it and mostly "fixed" your head ache :) > I guess this is what it is like when people listen to me talk too fast. Absolutely!!! > > IOW, what we have now, looks like this: > > > > a) printk_safe is for important stuff, we don't guarantee that a flood > >of messages will be preserved. > > > > b) we extend the previously existing "will flush messages later on from > >a safer context" and now we also consider console_unlock() printing loop > >as unsafe context. so the unsafe context it's not only the one that can > >deadlock, but also the one that can lockup CPU in a printing loop because > >of recursive printk messages. > > Sure. > > > > > > > so this > > > > printk > > console_unlock > > { > >for (;;) { > > call_console_drivers > > net_console > >printk > > printk_save -> irq_work queue > > > >IRQ work > > prink_safe_flush > >printk_deferred -> log_store() > >iret > > } > > up(); > > } > > > > > >// which can never break out, because we can always append new messages > >// from prink_safe_flush. > > > > becomes this > > > > printk > > console_unlock > > { > >for (;;) { > > call_console_drivers > > net_console > >printk > > printk_save -> irq_work queue > > > > } > > up(); > > > > IRQ work > >prink_safe_flush > > printk_deferred -> log_store() > > iret > > } > > But we do eventually send this data out to the consoles, and if the > consoles cause more printks, wouldn't this still never end? Right. But not immediately. We wait for all pending messages to be evicted first (and up()) and we limit the amount of data that we flush. So at least it's not exponential anymore: every line that we print does not log_store() a whole new dump_stack() of lines. Which is still miles away from "a perfect solution", tho. But limiting the number of lines we print recursive is not much better. First, we don't know how many lines we want to flush from printk_safe. And having a knob indicates that no one ever will do it right. Second, hand off can play games with it. Assume the following, - I set `recursion_max' to 200. Which looks reasonable to me. Then I
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/21/18 16:04), Steven Rostedt wrote: [..] > > The problem is that we flush printk_safe right when console_unlock() > > printing > > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs > > were enabled in the first place when the CPU went to console_unlock()]. > > This forces that CPU to loop in console_unlock() as long as we have > > printk-s coming from call_console_drivers(). But we probably can postpone > > printk_safe flush. Basically, we can declare a new rule - we don't flush > > printk_safe buffer as long as console_sem is locked. Because this is how > > that printing CPU stuck in the console_unlock() printing loop. printk_safe > > buffer is very important when it comes to storing a non-repetitive stuff, > > like > > a lockdep splat, which is a single shot event. But the more repetitive the > > message is, like millions of similar kmalloc() dump_stack()-s over and over > > again, the less value in it. We should have printk_safe buffer big enough > > for > > important info, like a lockdep splat, but millions of similar kmalloc() > > messages are pretty invaluable - one is already enough, we can drop the > > rest. > > And we should not flush new messages while there is a CPU looping in > > console_unlock(), because it already has messages to print, which were > > log_store()-ed the normal way. > > The above is really hard to read without any capitalization. Everything > seems to be a run-on sentence and gives me a head ache. So you lost me > there. Apologies. Will improve. > > This is where the "postpone thing" jumps in. so how do we postpone > > printk_safe > > flush. > > > > We can't console_trylock()/console_unlock() in printk_safe flush code. > > But there is a `console_locked' flag and is_console_locked() function which > > tell us if the console_sem is locked. As long as we are in console_unlock() > > printing loop that flag is set, even if we enabled local IRQs and > > printk_safe > > flush work arrived. So now printk_safe flush does extra check and does > > not flush printk_safe buffer content as long as someone is currently > > printing or soon will start printing. But we need to take extra step and > > to re-queue flush on CPUs that did postpone it [console_unlock() can > > reschedule]. So now we flush only when printing CPU printed all pending > > logbuf messages, hit the "console_seq == log_next_seq" and up() > > console_sem. This sets a boundary -- no matter how many times during the > > current printing loop we called console drivers and how many times those > > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. > > Another big paragraph with no capitals (besides macros and CPU ;-) I walked through it and mostly "fixed" your head ache :) > I guess this is what it is like when people listen to me talk too fast. Absolutely!!! > > IOW, what we have now, looks like this: > > > > a) printk_safe is for important stuff, we don't guarantee that a flood > >of messages will be preserved. > > > > b) we extend the previously existing "will flush messages later on from > >a safer context" and now we also consider console_unlock() printing loop > >as unsafe context. so the unsafe context it's not only the one that can > >deadlock, but also the one that can lockup CPU in a printing loop because > >of recursive printk messages. > > Sure. > > > > > > > so this > > > > printk > > console_unlock > > { > >for (;;) { > > call_console_drivers > > net_console > >printk > > printk_save -> irq_work queue > > > >IRQ work > > prink_safe_flush > >printk_deferred -> log_store() > >iret > > } > > up(); > > } > > > > > >// which can never break out, because we can always append new messages > >// from prink_safe_flush. > > > > becomes this > > > > printk > > console_unlock > > { > >for (;;) { > > call_console_drivers > > net_console > >printk > > printk_save -> irq_work queue > > > > } > > up(); > > > > IRQ work > >prink_safe_flush > > printk_deferred -> log_store() > > iret > > } > > But we do eventually send this data out to the consoles, and if the > consoles cause more printks, wouldn't this still never end? Right. But not immediately. We wait for all pending messages to be evicted first (and up()) and we limit the amount of data that we flush. So at least it's not exponential anymore: every line that we print does not log_store() a whole new dump_stack() of lines. Which is still miles away from "a perfect solution", tho. But limiting the number of lines we print recursive is not much better. First, we don't know how many lines we want to flush from printk_safe. And having a knob indicates that no one ever will do it right. Second, hand off can play games with it. Assume the following, - I set `recursion_max' to 200. Which looks reasonable to me. Then I
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Sun, 21 Jan 2018 23:15:21 +0900 Sergey Senozhatskywrote: > so fix the console drivers ;) Totally agree! > > > > > just kidding. ok... Darn it! ;-) > the problem is that we flush printk_safe right when console_unlock() printing > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs > were enabled in the first place when the CPU went to console_unlock()]. > this forces that CPU to loop in console_unlock() as long as we have > printk-s coming from call_console_drivers(). but we probably can postpone > printk_safe flush. basically, we can declare a new rule - we don't flush > printk_safe buffer as long as console_sem is locked. because this is how > that printing CPU stuck in the console_unlock() printing loop. printk_safe > buffer is very important when it comes to storing a non-repetitive stuff, like > a lockdep splat, which is a single shot event. but the more repetitive the > message is, like millions of similar kmalloc() dump_stack()-s over and over > again, the less value in it. we should have printk_safe buffer big enough for > important info, like a lockdep splat, but millions of similar kmalloc() > messages are pretty invaluable - one is already enough, we can drop the rest. > and we should not flush new messages while there is a CPU looping in > console_unlock(), because it already has messages to print, which were > log_store()-ed the normal way. The above is really hard to read without any capitalization. Everything seems to be a run-on sentence and gives me a head ache. So you lost me there. > > this is where the "postpone thing" jumps in. so how do we postpone printk_safe > flush. > > we can't console_trylock()/console_unlock() in printk_safe flush code. > but there is a `console_locked' flag and is_console_locked() function which > tell us if the console_sem is locked. as long as we are in console_unlock() > printing loop that flag is set, even if we enabled local IRQs and printk_safe > flush work arrived. so now printk_safe flush does extra check and does > not flush printk_safe buffer content as long as someone is currently > printing or soon will start printing. but we need to take extra step and > to re-queue flush on CPUs that did postpone it [console_unlock() can > reschedule]. so now we flush only when printing CPU printed all pending > logbuf messages, hit the "console_seq == log_next_seq" and up() > console_sem. this sets a boundary -- no matter how many times during the > current printing loop we called console drivers and how many times those > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. Another big paragraph with no capitals (besides macros and CPU ;-) I guess this is what it is like when people listen to me talk too fast. > > > IOW, what we have now, looks like this: > > a) printk_safe is for important stuff, we don't guarantee that a flood >of messages will be preserved. > > b) we extend the previously existing "will flush messages later on from >a safer context" and now we also consider console_unlock() printing loop >as unsafe context. so the unsafe context it's not only the one that can >deadlock, but also the one that can lockup CPU in a printing loop because >of recursive printk messages. Sure. > > > so this > > printk > console_unlock > { >for (;;) { > call_console_drivers > net_console >printk > printk_save -> irq_work queue > > IRQ work >prink_safe_flush > printk_deferred -> log_store() >iret > } > up(); > } > > >// which can never break out, because we can always append new messages >// from prink_safe_flush. > > becomes this > > printk > console_unlock > { >for (;;) { > call_console_drivers > net_console >printk > printk_save -> irq_work queue > > } > up(); > > IRQ work >prink_safe_flush > printk_deferred -> log_store() > iret > } But we do eventually send this data out to the consoles, and if the consoles cause more printks, wouldn't this still never end? > > > > something completely untested, sketchy and ugly. > > --- > > kernel/printk/internal.h| 2 ++ > kernel/printk/printk.c | 1 + > kernel/printk/printk_safe.c | 37 +++-- > 3 files changed, 38 insertions(+), 2 deletions(-) > > diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h > index 2a7d04049af4..e85517818a49 100644 > --- a/kernel/printk/internal.h > +++ b/kernel/printk/internal.h > @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list > args); > void __printk_safe_enter(void); > void __printk_safe_exit(void); > > +void printk_safe_requeue_flushing(void); > + > #define printk_safe_enter_irqsave(flags) \ > do {\ > local_irq_save(flags); \ > diff --git
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Sun, 21 Jan 2018 23:15:21 +0900 Sergey Senozhatsky wrote: > so fix the console drivers ;) Totally agree! > > > > > just kidding. ok... Darn it! ;-) > the problem is that we flush printk_safe right when console_unlock() printing > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs > were enabled in the first place when the CPU went to console_unlock()]. > this forces that CPU to loop in console_unlock() as long as we have > printk-s coming from call_console_drivers(). but we probably can postpone > printk_safe flush. basically, we can declare a new rule - we don't flush > printk_safe buffer as long as console_sem is locked. because this is how > that printing CPU stuck in the console_unlock() printing loop. printk_safe > buffer is very important when it comes to storing a non-repetitive stuff, like > a lockdep splat, which is a single shot event. but the more repetitive the > message is, like millions of similar kmalloc() dump_stack()-s over and over > again, the less value in it. we should have printk_safe buffer big enough for > important info, like a lockdep splat, but millions of similar kmalloc() > messages are pretty invaluable - one is already enough, we can drop the rest. > and we should not flush new messages while there is a CPU looping in > console_unlock(), because it already has messages to print, which were > log_store()-ed the normal way. The above is really hard to read without any capitalization. Everything seems to be a run-on sentence and gives me a head ache. So you lost me there. > > this is where the "postpone thing" jumps in. so how do we postpone printk_safe > flush. > > we can't console_trylock()/console_unlock() in printk_safe flush code. > but there is a `console_locked' flag and is_console_locked() function which > tell us if the console_sem is locked. as long as we are in console_unlock() > printing loop that flag is set, even if we enabled local IRQs and printk_safe > flush work arrived. so now printk_safe flush does extra check and does > not flush printk_safe buffer content as long as someone is currently > printing or soon will start printing. but we need to take extra step and > to re-queue flush on CPUs that did postpone it [console_unlock() can > reschedule]. so now we flush only when printing CPU printed all pending > logbuf messages, hit the "console_seq == log_next_seq" and up() > console_sem. this sets a boundary -- no matter how many times during the > current printing loop we called console drivers and how many times those > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. Another big paragraph with no capitals (besides macros and CPU ;-) I guess this is what it is like when people listen to me talk too fast. > > > IOW, what we have now, looks like this: > > a) printk_safe is for important stuff, we don't guarantee that a flood >of messages will be preserved. > > b) we extend the previously existing "will flush messages later on from >a safer context" and now we also consider console_unlock() printing loop >as unsafe context. so the unsafe context it's not only the one that can >deadlock, but also the one that can lockup CPU in a printing loop because >of recursive printk messages. Sure. > > > so this > > printk > console_unlock > { >for (;;) { > call_console_drivers > net_console >printk > printk_save -> irq_work queue > > IRQ work >prink_safe_flush > printk_deferred -> log_store() >iret > } > up(); > } > > >// which can never break out, because we can always append new messages >// from prink_safe_flush. > > becomes this > > printk > console_unlock > { >for (;;) { > call_console_drivers > net_console >printk > printk_save -> irq_work queue > > } > up(); > > IRQ work >prink_safe_flush > printk_deferred -> log_store() > iret > } But we do eventually send this data out to the consoles, and if the consoles cause more printks, wouldn't this still never end? > > > > something completely untested, sketchy and ugly. > > --- > > kernel/printk/internal.h| 2 ++ > kernel/printk/printk.c | 1 + > kernel/printk/printk_safe.c | 37 +++-- > 3 files changed, 38 insertions(+), 2 deletions(-) > > diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h > index 2a7d04049af4..e85517818a49 100644 > --- a/kernel/printk/internal.h > +++ b/kernel/printk/internal.h > @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list > args); > void __printk_safe_enter(void); > void __printk_safe_exit(void); > > +void printk_safe_requeue_flushing(void); > + > #define printk_safe_enter_irqsave(flags) \ > do {\ > local_irq_save(flags); \ > diff --git a/kernel/printk/printk.c
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/20/18 10:49), Steven Rostedt wrote: [..] > > printks from console_unlock()->call_console_drivers() are redirected > > to printk_safe buffer. we need irq_work on that CPU to flush its > > printk_safe buffer. > > So is the issue that we keep triggering this irq work then? Then this > solution does seem to be one that would work. Because after x amount of > recursive printks (printk called by printk) it would just stop printing > them, and end the irq work. > > Perhaps what Tejun is seeing is: > > printk() >net_console() > printk() --> redirected to irq work > > > printk > net_console() > printk() --> redirected to another irq work > > and so on and so on. it's a bit trickier than that, I think. we have printk recursion from console drivers. it's redirected to printk_safe and we queue an IRQ work to flush the buffer printk console_unlock call_console_drivers net_console printk printk_save -> irq_work queue now console_unlock() enables local IRQs, we have the printk_safe flush. but printk_safe flush does not call into the console_unlock(), it uses printk_deferred() version of printk IRQ work prink_safe_flush printk_deferred -> irq_work queue so we schedule another IRQ work (deferred printk work), which eventually tries to lock console_sem IRQ work wake_up_klogd_work_func() if (console_trylock()) console_unlock() if it succeeds then it goes to console_unlock(), where console driver can cause another printk recursion. but, once again, it will be redirected to printk_safe buffer first. if it fails then we have either the original CPU to print out those irq_work messages, which is sort of bad, or another CPU which already acquired the console_sem and will print out. > This solution would need to be tweaked to add a timer to allow only so > many nested printks in a given time. Otherwise it too would be an issue: [..] > > how are we going to distinguish between lockdep splats, for instance, > > or WARNs from call_console_drivers() -> foo_write(), which are valuable, > > and kmalloc() print outs, which might be less valuable? are we going to > > The problem is that printk causing more printks is extremely dangerous, > and ANY printk that is caused by a printk is of equal value, whether it > is a console driver running out of memory or a lockdep splat. And > the chances of having two hit at the same time is extremely low. so fix the console drivers ;) just kidding. ok... the problem is that we flush printk_safe right when console_unlock() printing loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs were enabled in the first place when the CPU went to console_unlock()]. this forces that CPU to loop in console_unlock() as long as we have printk-s coming from call_console_drivers(). but we probably can postpone printk_safe flush. basically, we can declare a new rule - we don't flush printk_safe buffer as long as console_sem is locked. because this is how that printing CPU stuck in the console_unlock() printing loop. printk_safe buffer is very important when it comes to storing a non-repetitive stuff, like a lockdep splat, which is a single shot event. but the more repetitive the message is, like millions of similar kmalloc() dump_stack()-s over and over again, the less value in it. we should have printk_safe buffer big enough for important info, like a lockdep splat, but millions of similar kmalloc() messages are pretty invaluable - one is already enough, we can drop the rest. and we should not flush new messages while there is a CPU looping in console_unlock(), because it already has messages to print, which were log_store()-ed the normal way. this is where the "postpone thing" jumps in. so how do we postpone printk_safe flush. we can't console_trylock()/console_unlock() in printk_safe flush code. but there is a `console_locked' flag and is_console_locked() function which tell us if the console_sem is locked. as long as we are in console_unlock() printing loop that flag is set, even if we enabled local IRQs and printk_safe flush work arrived. so now printk_safe flush does extra check and does not flush printk_safe buffer content as long as someone is currently printing or soon will start printing. but we need to take extra step and to re-queue flush on CPUs that did postpone it [console_unlock() can reschedule]. so now we flush only when printing CPU printed all pending logbuf messages, hit the "console_seq == log_next_seq" and up() console_sem. this sets a boundary -- no matter how many times during the current printing loop we called console drivers and how many times those drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. IOW, what we have now, looks like this: a) printk_safe is for important stuff, we don't guarantee that a flood of messages will be preserved. b) we extend the previously existing "will flush messages later on from a safer context" and now we also
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/20/18 10:49), Steven Rostedt wrote: [..] > > printks from console_unlock()->call_console_drivers() are redirected > > to printk_safe buffer. we need irq_work on that CPU to flush its > > printk_safe buffer. > > So is the issue that we keep triggering this irq work then? Then this > solution does seem to be one that would work. Because after x amount of > recursive printks (printk called by printk) it would just stop printing > them, and end the irq work. > > Perhaps what Tejun is seeing is: > > printk() >net_console() > printk() --> redirected to irq work > > > printk > net_console() > printk() --> redirected to another irq work > > and so on and so on. it's a bit trickier than that, I think. we have printk recursion from console drivers. it's redirected to printk_safe and we queue an IRQ work to flush the buffer printk console_unlock call_console_drivers net_console printk printk_save -> irq_work queue now console_unlock() enables local IRQs, we have the printk_safe flush. but printk_safe flush does not call into the console_unlock(), it uses printk_deferred() version of printk IRQ work prink_safe_flush printk_deferred -> irq_work queue so we schedule another IRQ work (deferred printk work), which eventually tries to lock console_sem IRQ work wake_up_klogd_work_func() if (console_trylock()) console_unlock() if it succeeds then it goes to console_unlock(), where console driver can cause another printk recursion. but, once again, it will be redirected to printk_safe buffer first. if it fails then we have either the original CPU to print out those irq_work messages, which is sort of bad, or another CPU which already acquired the console_sem and will print out. > This solution would need to be tweaked to add a timer to allow only so > many nested printks in a given time. Otherwise it too would be an issue: [..] > > how are we going to distinguish between lockdep splats, for instance, > > or WARNs from call_console_drivers() -> foo_write(), which are valuable, > > and kmalloc() print outs, which might be less valuable? are we going to > > The problem is that printk causing more printks is extremely dangerous, > and ANY printk that is caused by a printk is of equal value, whether it > is a console driver running out of memory or a lockdep splat. And > the chances of having two hit at the same time is extremely low. so fix the console drivers ;) just kidding. ok... the problem is that we flush printk_safe right when console_unlock() printing loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs were enabled in the first place when the CPU went to console_unlock()]. this forces that CPU to loop in console_unlock() as long as we have printk-s coming from call_console_drivers(). but we probably can postpone printk_safe flush. basically, we can declare a new rule - we don't flush printk_safe buffer as long as console_sem is locked. because this is how that printing CPU stuck in the console_unlock() printing loop. printk_safe buffer is very important when it comes to storing a non-repetitive stuff, like a lockdep splat, which is a single shot event. but the more repetitive the message is, like millions of similar kmalloc() dump_stack()-s over and over again, the less value in it. we should have printk_safe buffer big enough for important info, like a lockdep splat, but millions of similar kmalloc() messages are pretty invaluable - one is already enough, we can drop the rest. and we should not flush new messages while there is a CPU looping in console_unlock(), because it already has messages to print, which were log_store()-ed the normal way. this is where the "postpone thing" jumps in. so how do we postpone printk_safe flush. we can't console_trylock()/console_unlock() in printk_safe flush code. but there is a `console_locked' flag and is_console_locked() function which tell us if the console_sem is locked. as long as we are in console_unlock() printing loop that flag is set, even if we enabled local IRQs and printk_safe flush work arrived. so now printk_safe flush does extra check and does not flush printk_safe buffer content as long as someone is currently printing or soon will start printing. but we need to take extra step and to re-queue flush on CPUs that did postpone it [console_unlock() can reschedule]. so now we flush only when printing CPU printed all pending logbuf messages, hit the "console_seq == log_next_seq" and up() console_sem. this sets a boundary -- no matter how many times during the current printing loop we called console drivers and how many times those drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. IOW, what we have now, looks like this: a) printk_safe is for important stuff, we don't guarantee that a flood of messages will be preserved. b) we extend the previously existing "will flush messages later on from a safer context" and now we also
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Sat, 20 Jan 2018 16:14:02 +0900 Sergey Senozhatskywrote: > [..] > > asmlinkage int vprintk_emit(int facility, int level, > > const char *dict, size_t dictlen, > > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, > > > > /* This stops the holder of console_sem just where we want him */ > > logbuf_lock_irqsave(flags); > > + > > + if (recursion_check_test()) { > > + /* A printk happened within a printk at the same context */ > > + if (this_cpu_inc_return(recursion_count) > recursion_max) { > > + atomic_inc(_overflow); > > + logbuf_unlock_irqrestore(flags); > > + printed_len = 0; > > + goto out; > > + } > > + } > > didn't have time to look at this carefully, but is this possible? > > printks from console_unlock()->call_console_drivers() are redirected > to printk_safe buffer. we need irq_work on that CPU to flush its > printk_safe buffer. So is the issue that we keep triggering this irq work then? Then this solution does seem to be one that would work. Because after x amount of recursive printks (printk called by printk) it would just stop printing them, and end the irq work. Perhaps what Tejun is seeing is: printk() net_console() printk() --> redirected to irq work printk net_console() printk() --> redirected to another irq work and so on and so on. This solution would need to be tweaked to add a timer to allow only so many nested printks in a given time. Otherwise it too would be an issue: printk() net_console() printk() -> redirected printk() -> throttled But the first x printk()s would still be redirected. and that x gets reset in this current patch at he end of the outermost printk. Perhaps it shouldn't reset x, or it can flush the printk safe buffer first. Is there a reason that console_unlock() doesn't flush the printk_safe_buffer? With a throttle number and flushing the printk_safe_buffer, that should solve the issue Tejun explained. > > how are we going to distinguish between lockdep splats, for instance, > or WARNs from call_console_drivers() -> foo_write(), which are valuable, > and kmalloc() print outs, which might be less valuable? are we going to The problem is that printk causing more printks is extremely dangerous, and ANY printk that is caused by a printk is of equal value, whether it is a console driver running out of memory or a lockdep splat. And the chances of having two hit at the same time is extremely low. > lose all of them now? then we can do a much simpler thing - steal one > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which > will be set around call_console_drivers(). vprintk_func() would redirect > printks to vprintk_noop(fmt, args), which will do nothing. Not sure what you mean here. Have some pseudo code to demonstrate with? -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Sat, 20 Jan 2018 16:14:02 +0900 Sergey Senozhatsky wrote: > [..] > > asmlinkage int vprintk_emit(int facility, int level, > > const char *dict, size_t dictlen, > > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, > > > > /* This stops the holder of console_sem just where we want him */ > > logbuf_lock_irqsave(flags); > > + > > + if (recursion_check_test()) { > > + /* A printk happened within a printk at the same context */ > > + if (this_cpu_inc_return(recursion_count) > recursion_max) { > > + atomic_inc(_overflow); > > + logbuf_unlock_irqrestore(flags); > > + printed_len = 0; > > + goto out; > > + } > > + } > > didn't have time to look at this carefully, but is this possible? > > printks from console_unlock()->call_console_drivers() are redirected > to printk_safe buffer. we need irq_work on that CPU to flush its > printk_safe buffer. So is the issue that we keep triggering this irq work then? Then this solution does seem to be one that would work. Because after x amount of recursive printks (printk called by printk) it would just stop printing them, and end the irq work. Perhaps what Tejun is seeing is: printk() net_console() printk() --> redirected to irq work printk net_console() printk() --> redirected to another irq work and so on and so on. This solution would need to be tweaked to add a timer to allow only so many nested printks in a given time. Otherwise it too would be an issue: printk() net_console() printk() -> redirected printk() -> throttled But the first x printk()s would still be redirected. and that x gets reset in this current patch at he end of the outermost printk. Perhaps it shouldn't reset x, or it can flush the printk safe buffer first. Is there a reason that console_unlock() doesn't flush the printk_safe_buffer? With a throttle number and flushing the printk_safe_buffer, that should solve the issue Tejun explained. > > how are we going to distinguish between lockdep splats, for instance, > or WARNs from call_console_drivers() -> foo_write(), which are valuable, > and kmalloc() print outs, which might be less valuable? are we going to The problem is that printk causing more printks is extremely dangerous, and ANY printk that is caused by a printk is of equal value, whether it is a console driver running out of memory or a lockdep splat. And the chances of having two hit at the same time is extremely low. > lose all of them now? then we can do a much simpler thing - steal one > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which > will be set around call_console_drivers(). vprintk_func() would redirect > printks to vprintk_noop(fmt, args), which will do nothing. Not sure what you mean here. Have some pseudo code to demonstrate with? -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Sat, 20 Jan 2018 04:19:53 -0800 Tejun Heowrote: > I'm a bit worried tho because this essentially seems like "detect > recursion, ignore messages" approach. netcons can have a very large > surface for bugs. Suppressing those messages would make them > difficult to debug. For example, all our machines have both serial > console (thus the slowness) and netconsole hooked up and netcons code > has had its fair share of issues. This would likely make tracking > down those problems more challenging. Well, it's not totally ignoring them. There's a variable that tells printk how many to print before it starts ignoring them. I picked 3, but that could very well be 5 or 10. Probably 10 is the best, because then it would give us enough idea why printk is recursing on itself without overloading the buffer. And I made it a variable to easily make it a knob for userspace to tweak if need be. > > Can we discuss pros and cons of this approach against offloading > before committing to this? I'm open. I was just thinking about the scenario that you mentioned and how what the best way to solve it would be. We need to define the exact problem(s) we are dealing with before we offer a solution. The one thing I don't want is a solution looking for a problem. I want a full understanding of what the problem exactly is and then we can discuss various solutions, and how they solve the problem(s). Otherwise we are just doing (to quote Linus) code masturbation. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Sat, 20 Jan 2018 04:19:53 -0800 Tejun Heo wrote: > I'm a bit worried tho because this essentially seems like "detect > recursion, ignore messages" approach. netcons can have a very large > surface for bugs. Suppressing those messages would make them > difficult to debug. For example, all our machines have both serial > console (thus the slowness) and netconsole hooked up and netcons code > has had its fair share of issues. This would likely make tracking > down those problems more challenging. Well, it's not totally ignoring them. There's a variable that tells printk how many to print before it starts ignoring them. I picked 3, but that could very well be 5 or 10. Probably 10 is the best, because then it would give us enough idea why printk is recursing on itself without overloading the buffer. And I made it a variable to easily make it a knob for userspace to tweak if need be. > > Can we discuss pros and cons of this approach against offloading > before committing to this? I'm open. I was just thinking about the scenario that you mentioned and how what the best way to solve it would be. We need to define the exact problem(s) we are dealing with before we offer a solution. The one thing I don't want is a solution looking for a problem. I want a full understanding of what the problem exactly is and then we can discuss various solutions, and how they solve the problem(s). Otherwise we are just doing (to quote Linus) code masturbation. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Steven. On Fri, Jan 19, 2018 at 01:20:52PM -0500, Steven Rostedt wrote: > I was thinking about this a bit more, and instead of offloading a > recursive printk, perhaps its best to simply throttle it. Because the > problem may not go away if a printk thread takes over, because the bug > is really the printk infrastructure filling the printk buffer keeping > printk from ever stopping. > > This patch detects that printk is causing itself to print more and > throttles it after 3 messages have printed due to recursion. Could you > see if this helps your test cases? Sure, if this is the approach we're gonna take, I can try it with the silly test code and also try to reproduce the original problem and see whether this helps. I'm a bit worried tho because this essentially seems like "detect recursion, ignore messages" approach. netcons can have a very large surface for bugs. Suppressing those messages would make them difficult to debug. For example, all our machines have both serial console (thus the slowness) and netconsole hooked up and netcons code has had its fair share of issues. This would likely make tracking down those problems more challenging. Can we discuss pros and cons of this approach against offloading before committing to this? Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Steven. On Fri, Jan 19, 2018 at 01:20:52PM -0500, Steven Rostedt wrote: > I was thinking about this a bit more, and instead of offloading a > recursive printk, perhaps its best to simply throttle it. Because the > problem may not go away if a printk thread takes over, because the bug > is really the printk infrastructure filling the printk buffer keeping > printk from ever stopping. > > This patch detects that printk is causing itself to print more and > throttles it after 3 messages have printed due to recursion. Could you > see if this helps your test cases? Sure, if this is the approach we're gonna take, I can try it with the silly test code and also try to reproduce the original problem and see whether this helps. I'm a bit worried tho because this essentially seems like "detect recursion, ignore messages" approach. netcons can have a very large surface for bugs. Suppressing those messages would make them difficult to debug. For example, all our machines have both serial console (thus the slowness) and netconsole hooked up and netcons code has had its fair share of issues. This would likely make tracking down those problems more challenging. Can we discuss pros and cons of this approach against offloading before committing to this? Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/19/18 13:20), Steven Rostedt wrote: [..] > I was thinking about this a bit more, and instead of offloading a > recursive printk, perhaps its best to simply throttle it. Because the > problem may not go away if a printk thread takes over, because the bug > is really the printk infrastructure filling the printk buffer keeping > printk from ever stopping. right. I didn't quite got it how that would help. if we would kick_offload every time we add new printks after call_console_drivers(), then we can just end up in a kick_offload loop traveling across all CPUs. [..] > asmlinkage int vprintk_emit(int facility, int level, > const char *dict, size_t dictlen, > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, > > /* This stops the holder of console_sem just where we want him */ > logbuf_lock_irqsave(flags); > + > + if (recursion_check_test()) { > + /* A printk happened within a printk at the same context */ > + if (this_cpu_inc_return(recursion_count) > recursion_max) { > + atomic_inc(_overflow); > + logbuf_unlock_irqrestore(flags); > + printed_len = 0; > + goto out; > + } > + } didn't have time to look at this carefully, but is this possible? printks from console_unlock()->call_console_drivers() are redirected to printk_safe buffer. we need irq_work on that CPU to flush its printk_safe buffer. > EXPORT_SYMBOL(vprintk_emit); > @@ -2343,9 +2428,14 @@ void console_unlock(void) > seen_seq = log_next_seq; > } > > - if (console_seq < log_first_seq) { > + if (console_seq < log_first_seq || > atomic_read(_overflow)) { > + size_t missed; > + > + missed = atomic_xchg(_overflow, 0); > + missed += log_first_seq - console_seq; > + > len = sprintf(text, "** %u printk messages dropped > **\n", > - (unsigned)(log_first_seq - console_seq)); > + (unsigned)missed); > > /* messages are gone, move to first one */ > console_seq = log_first_seq; how are we going to distinguish between lockdep splats, for instance, or WARNs from call_console_drivers() -> foo_write(), which are valuable, and kmalloc() print outs, which might be less valuable? are we going to lose all of them now? then we can do a much simpler thing - steal one bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which will be set around call_console_drivers(). vprintk_func() would redirect printks to vprintk_noop(fmt, args), which will do nothing. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/19/18 13:20), Steven Rostedt wrote: [..] > I was thinking about this a bit more, and instead of offloading a > recursive printk, perhaps its best to simply throttle it. Because the > problem may not go away if a printk thread takes over, because the bug > is really the printk infrastructure filling the printk buffer keeping > printk from ever stopping. right. I didn't quite got it how that would help. if we would kick_offload every time we add new printks after call_console_drivers(), then we can just end up in a kick_offload loop traveling across all CPUs. [..] > asmlinkage int vprintk_emit(int facility, int level, > const char *dict, size_t dictlen, > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, > > /* This stops the holder of console_sem just where we want him */ > logbuf_lock_irqsave(flags); > + > + if (recursion_check_test()) { > + /* A printk happened within a printk at the same context */ > + if (this_cpu_inc_return(recursion_count) > recursion_max) { > + atomic_inc(_overflow); > + logbuf_unlock_irqrestore(flags); > + printed_len = 0; > + goto out; > + } > + } didn't have time to look at this carefully, but is this possible? printks from console_unlock()->call_console_drivers() are redirected to printk_safe buffer. we need irq_work on that CPU to flush its printk_safe buffer. > EXPORT_SYMBOL(vprintk_emit); > @@ -2343,9 +2428,14 @@ void console_unlock(void) > seen_seq = log_next_seq; > } > > - if (console_seq < log_first_seq) { > + if (console_seq < log_first_seq || > atomic_read(_overflow)) { > + size_t missed; > + > + missed = atomic_xchg(_overflow, 0); > + missed += log_first_seq - console_seq; > + > len = sprintf(text, "** %u printk messages dropped > **\n", > - (unsigned)(log_first_seq - console_seq)); > + (unsigned)missed); > > /* messages are gone, move to first one */ > console_seq = log_first_seq; how are we going to distinguish between lockdep splats, for instance, or WARNs from call_console_drivers() -> foo_write(), which are valuable, and kmalloc() print outs, which might be less valuable? are we going to lose all of them now? then we can do a much simpler thing - steal one bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which will be set around call_console_drivers(). vprintk_func() would redirect printks to vprintk_noop(fmt, args), which will do nothing. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Tejun, I was thinking about this a bit more, and instead of offloading a recursive printk, perhaps its best to simply throttle it. Because the problem may not go away if a printk thread takes over, because the bug is really the printk infrastructure filling the printk buffer keeping printk from ever stopping. This patch detects that printk is causing itself to print more and throttles it after 3 messages have printed due to recursion. Could you see if this helps your test cases? I built this on top of linux-next (yesterday's branch). It compiles and boots, but I didn't do any other tests on it. Thanks! -- Steve diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..2c7f18876224 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1826,6 +1826,75 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c /* Store it in the record log */ return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len); } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); +static DEFINE_PER_CPU(int, recursion_count); +static atomic_t recursion_overflow; +static const int recursion_max = 3; + +static int __recursion_check_test(int val) +{ + unsigned long pc = preempt_count(); + int bit; + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + return val & (1 << bit); +} + +static bool recursion_check_test(void) +{ + int val = this_cpu_read(recursion_bits); + + return __recursion_check_test(val); +} + +static bool recursion_check_start(void) +{ + int val = this_cpu_read(recursion_bits); + int set; + + set = __recursion_check_test(val); + + if (unlikely(set)) + return true; + + val |= set; + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool recursion) +{ + int val = this_cpu_read(recursion_bits); + + if (recursion) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, /* This stops the holder of console_sem just where we want him */ logbuf_lock_irqsave(flags); + + if (recursion_check_test()) { + /* A printk happened within a printk at the same context */ + if (this_cpu_inc_return(recursion_count) > recursion_max) { + atomic_inc(_overflow); + logbuf_unlock_irqrestore(flags); + printed_len = 0; + goto out; + } + } + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. @@ -1895,12 +1975,14 @@ asmlinkage int vprintk_emit(int facility, int level, /* If called from the scheduler, we can not call up(). */ if (!in_sched) { + bool recursion; /* * Disable preemption to avoid being preempted while holding * console_sem which would prevent anyone from printing to * console */ preempt_disable(); + recursion = recursion_check_start(); /* * Try to acquire and then immediately release the console * semaphore. The release will print out buffers and wake up @@ -1908,9 +1990,12 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (console_trylock_spinning()) console_unlock(); + + recursion_check_finish(recursion); + this_cpu_write(recursion_count, 0); preempt_enable(); } - +out: return printed_len; } EXPORT_SYMBOL(vprintk_emit); @@ -2343,9 +2428,14 @@ void console_unlock(void) seen_seq = log_next_seq; } - if (console_seq < log_first_seq) { + if (console_seq < log_first_seq || atomic_read(_overflow)) { + size_t missed; + + missed = atomic_xchg(_overflow, 0); + missed += log_first_seq - console_seq; + len = sprintf(text, "** %u printk messages
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Tejun, I was thinking about this a bit more, and instead of offloading a recursive printk, perhaps its best to simply throttle it. Because the problem may not go away if a printk thread takes over, because the bug is really the printk infrastructure filling the printk buffer keeping printk from ever stopping. This patch detects that printk is causing itself to print more and throttles it after 3 messages have printed due to recursion. Could you see if this helps your test cases? I built this on top of linux-next (yesterday's branch). It compiles and boots, but I didn't do any other tests on it. Thanks! -- Steve diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..2c7f18876224 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1826,6 +1826,75 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c /* Store it in the record log */ return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len); } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); +static DEFINE_PER_CPU(int, recursion_count); +static atomic_t recursion_overflow; +static const int recursion_max = 3; + +static int __recursion_check_test(int val) +{ + unsigned long pc = preempt_count(); + int bit; + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + return val & (1 << bit); +} + +static bool recursion_check_test(void) +{ + int val = this_cpu_read(recursion_bits); + + return __recursion_check_test(val); +} + +static bool recursion_check_start(void) +{ + int val = this_cpu_read(recursion_bits); + int set; + + set = __recursion_check_test(val); + + if (unlikely(set)) + return true; + + val |= set; + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool recursion) +{ + int val = this_cpu_read(recursion_bits); + + if (recursion) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, /* This stops the holder of console_sem just where we want him */ logbuf_lock_irqsave(flags); + + if (recursion_check_test()) { + /* A printk happened within a printk at the same context */ + if (this_cpu_inc_return(recursion_count) > recursion_max) { + atomic_inc(_overflow); + logbuf_unlock_irqrestore(flags); + printed_len = 0; + goto out; + } + } + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. @@ -1895,12 +1975,14 @@ asmlinkage int vprintk_emit(int facility, int level, /* If called from the scheduler, we can not call up(). */ if (!in_sched) { + bool recursion; /* * Disable preemption to avoid being preempted while holding * console_sem which would prevent anyone from printing to * console */ preempt_disable(); + recursion = recursion_check_start(); /* * Try to acquire and then immediately release the console * semaphore. The release will print out buffers and wake up @@ -1908,9 +1990,12 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (console_trylock_spinning()) console_unlock(); + + recursion_check_finish(recursion); + this_cpu_write(recursion_count, 0); preempt_enable(); } - +out: return printed_len; } EXPORT_SYMBOL(vprintk_emit); @@ -2343,9 +2428,14 @@ void console_unlock(void) seen_seq = log_next_seq; } - if (console_seq < log_first_seq) { + if (console_seq < log_first_seq || atomic_read(_overflow)) { + size_t missed; + + missed = atomic_xchg(_overflow, 0); + missed += log_first_seq - console_seq; + len = sprintf(text, "** %u printk messages
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Thu, 18 Jan 2018 13:31:16 +0900 Sergey Senozhatskywrote: > d'oh... indeed, I copy-pasted the wrong URL... it should > have been lkml.kernel.org/r/ [and it actually was]. I've learned to do a copy after entering the lkml.kernel.org link into the browser url, and before hitting enter. The redirection kills you. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Thu, 18 Jan 2018 13:31:16 +0900 Sergey Senozhatsky wrote: > d'oh... indeed, I copy-pasted the wrong URL... it should > have been lkml.kernel.org/r/ [and it actually was]. I've learned to do a copy after entering the lkml.kernel.org link into the browser url, and before hitting enter. The redirection kills you. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed 2018-01-17 12:05:51, Tejun Heo wrote: > Hello, Steven. > > On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote: > > From what I gathered, you said an OOM would trigger, and then the > > network console would not be able to allocate memory and it would > > trigger a printk too, and cause an infinite amount of printks. > > Yeah, it falls into back-and-forth loop between the OOM code and > netconsole path. > > > This could very well be a great place to force offloading. If a printk > > is called from within a printk, at the same context (normal, softirq, > > irq or NMI), then we should trigger the offloading. > > I was thinking more of a timeout based approach (ie. if stuck for > longer than X or X messages, offload), but if local feedback loop is > the only thing we're missing after your improvements, detecting that > specific condition definitely works and is likely a better approach in > terms of message delivery guarantee. I think that we could combine both. The recursion can be detected rather easily and immediately so there is no reason to wait. Once we have the code for offloading from recursion then we could kick_offload_thread() also from other reasons, e.g. when console_unlock() takes too long. I think that Sergey is already playing with this. It seems that we all could be happy in the end. Best Regards, Petr PS: I am sorry for the answer yesterday. Tejun's mail did not mention any details about the problem. I evidently forgot them. I have OOM and printk issues associated with Tetsuo. So I messed it. Believe me. It is a big relief to realize that we are not in the cycle again.
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed 2018-01-17 12:05:51, Tejun Heo wrote: > Hello, Steven. > > On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote: > > From what I gathered, you said an OOM would trigger, and then the > > network console would not be able to allocate memory and it would > > trigger a printk too, and cause an infinite amount of printks. > > Yeah, it falls into back-and-forth loop between the OOM code and > netconsole path. > > > This could very well be a great place to force offloading. If a printk > > is called from within a printk, at the same context (normal, softirq, > > irq or NMI), then we should trigger the offloading. > > I was thinking more of a timeout based approach (ie. if stuck for > longer than X or X messages, offload), but if local feedback loop is > the only thing we're missing after your improvements, detecting that > specific condition definitely works and is likely a better approach in > terms of message delivery guarantee. I think that we could combine both. The recursion can be detected rather easily and immediately so there is no reason to wait. Once we have the code for offloading from recursion then we could kick_offload_thread() also from other reasons, e.g. when console_unlock() takes too long. I think that Sergey is already playing with this. It seems that we all could be happy in the end. Best Regards, Petr PS: I am sorry for the answer yesterday. Tejun's mail did not mention any details about the problem. I evidently forgot them. I have OOM and printk issues associated with Tetsuo. So I messed it. Believe me. It is a big relief to realize that we are not in the cycle again.
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/17/18 12:05), Tejun Heo wrote: [..] > > This could very well be a great place to force offloading. If a printk > > is called from within a printk, at the same context (normal, softirq, > > irq or NMI), then we should trigger the offloading. > > I was thinking more of a timeout based approach (ie. if stuck for > longer than X or X messages, offload) yep, that's what I want. for a whole bunch of different reasons. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/17/18 12:05), Tejun Heo wrote: [..] > > This could very well be a great place to force offloading. If a printk > > is called from within a printk, at the same context (normal, softirq, > > irq or NMI), then we should trigger the offloading. > > I was thinking more of a timeout based approach (ie. if stuck for > longer than X or X messages, offload) yep, that's what I want. for a whole bunch of different reasons. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/17/18 12:12), Steven Rostedt wrote: [..] > /* > * Can we actually use the console at this time on this cpu? > @@ -2333,6 +2390,7 @@ void console_unlock(void) > > for (;;) { > struct printk_log *msg; > + bool offload; > size_t ext_len = 0; > size_t len; > > @@ -2393,15 +2451,20 @@ void console_unlock(void) >* waiter waiting to take over. >*/ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings();/* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); > > printk_safe_exit_irqrestore(flags); but we call console drivers in printk_safe. printk -> console_drivers -> printk will be redirected to this-CPU printk_safe buffer. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/17/18 12:12), Steven Rostedt wrote: [..] > /* > * Can we actually use the console at this time on this cpu? > @@ -2333,6 +2390,7 @@ void console_unlock(void) > > for (;;) { > struct printk_log *msg; > + bool offload; > size_t ext_len = 0; > size_t len; > > @@ -2393,15 +2451,20 @@ void console_unlock(void) >* waiter waiting to take over. >*/ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings();/* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); > > printk_safe_exit_irqrestore(flags); but we call console drivers in printk_safe. printk -> console_drivers -> printk will be redirected to this-CPU printk_safe buffer. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/17/18 14:04), Petr Mladek wrote: > On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > > On (01/16/18 10:45), Steven Rostedt wrote: > > [..] > > > > [1] https://marc.info/?l=linux-mm=145692016122716 > > > > > > Especially since Konstantin is working on pulling in all LKML archives, > > > the above should be denoted as: > > > > > > Link: > > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > > > hm, may I ask why? is there a new rule now to percent-encode commit > > messages? > > IMHO, the most important thing is that Steven's link is based > on the Message-ID and the stable redirector > https://lkml.kernel.org/. It has a better chance to work > even in the future. d'oh... indeed, I copy-pasted the wrong URL... it should have been lkml.kernel.org/r/ [and it actually was]. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/17/18 14:04), Petr Mladek wrote: > On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > > On (01/16/18 10:45), Steven Rostedt wrote: > > [..] > > > > [1] https://marc.info/?l=linux-mm=145692016122716 > > > > > > Especially since Konstantin is working on pulling in all LKML archives, > > > the above should be denoted as: > > > > > > Link: > > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > > > hm, may I ask why? is there a new rule now to percent-encode commit > > messages? > > IMHO, the most important thing is that Steven's link is based > on the Message-ID and the stable redirector > https://lkml.kernel.org/. It has a better chance to work > even in the future. d'oh... indeed, I copy-pasted the wrong URL... it should have been lkml.kernel.org/r/ [and it actually was]. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Steven. On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote: > From what I gathered, you said an OOM would trigger, and then the > network console would not be able to allocate memory and it would > trigger a printk too, and cause an infinite amount of printks. Yeah, it falls into back-and-forth loop between the OOM code and netconsole path. > This could very well be a great place to force offloading. If a printk > is called from within a printk, at the same context (normal, softirq, > irq or NMI), then we should trigger the offloading. I was thinking more of a timeout based approach (ie. if stuck for longer than X or X messages, offload), but if local feedback loop is the only thing we're missing after your improvements, detecting that specific condition definitely works and is likely a better approach in terms of message delivery guarantee. > +static void kick_offload_thread(void) > +{ > + /* > + * Consoles are triggering printks, offload the printks > + * to another CPU to hopefully avoid a lockup. > + */ > +} ... > @@ -2333,6 +2390,7 @@ void console_unlock(void) > > for (;;) { > struct printk_log *msg; > + bool offload; > size_t ext_len = 0; > size_t len; > > @@ -2393,15 +2451,20 @@ void console_unlock(void) >* waiter waiting to take over. >*/ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings();/* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); Yeah, something like this would definitely work. Thanks a lot. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Steven. On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote: > From what I gathered, you said an OOM would trigger, and then the > network console would not be able to allocate memory and it would > trigger a printk too, and cause an infinite amount of printks. Yeah, it falls into back-and-forth loop between the OOM code and netconsole path. > This could very well be a great place to force offloading. If a printk > is called from within a printk, at the same context (normal, softirq, > irq or NMI), then we should trigger the offloading. I was thinking more of a timeout based approach (ie. if stuck for longer than X or X messages, offload), but if local feedback loop is the only thing we're missing after your improvements, detecting that specific condition definitely works and is likely a better approach in terms of message delivery guarantee. > +static void kick_offload_thread(void) > +{ > + /* > + * Consoles are triggering printks, offload the printks > + * to another CPU to hopefully avoid a lockup. > + */ > +} ... > @@ -2333,6 +2390,7 @@ void console_unlock(void) > > for (;;) { > struct printk_log *msg; > + bool offload; > size_t ext_len = 0; > size_t len; > > @@ -2393,15 +2451,20 @@ void console_unlock(void) >* waiter waiting to take over. >*/ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings();/* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); Yeah, something like this would definitely work. Thanks a lot. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 17 Jan 2018 12:12:51 -0500 Steven Rostedtwrote: > @@ -2393,15 +2451,20 @@ void console_unlock(void) >* waiter waiting to take over. >*/ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings();/* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); > Ah, major flaw in this code. The recursion check needs to be in printk() itself around the trylock. -- Steve diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..31df145cc4d7 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1826,6 +1826,63 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c /* Store it in the record log */ return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len); } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); + +static bool recursion_check_start(void) +{ + unsigned long pc = preempt_count(); + int val = this_cpu_read(recursion_bits); + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + if (unlikely(val & (1 << bit))) + return true; + + val |= (1 << bit); + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool offload) +{ + int val = this_cpu_read(recursion_bits); + + if (offload) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} + +static void kick_offload_thread(void) +{ + /* +* Consoles are triggering printks, offload the printks +* to another CPU to hopefully avoid a lockup. +*/ +} asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, @@ -1895,12 +1952,14 @@ asmlinkage int vprintk_emit(int facility, int level, /* If called from the scheduler, we can not call up(). */ if (!in_sched) { + bool offload; /* * Disable preemption to avoid being preempted while holding * console_sem which would prevent anyone from printing to * console */ preempt_disable(); + offload = recursion_check_start(); /* * Try to acquire and then immediately release the console * semaphore. The release will print out buffers and wake up @@ -1908,7 +1967,12 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (console_trylock_spinning()) console_unlock(); + + recursion_check_finish(offload); preempt_enable(); + + if (offload) + kick_offload_thread(); } return printed_len;
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 17 Jan 2018 12:12:51 -0500 Steven Rostedt wrote: > @@ -2393,15 +2451,20 @@ void console_unlock(void) >* waiter waiting to take over. >*/ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings();/* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); > Ah, major flaw in this code. The recursion check needs to be in printk() itself around the trylock. -- Steve diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..31df145cc4d7 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1826,6 +1826,63 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c /* Store it in the record log */ return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len); } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); + +static bool recursion_check_start(void) +{ + unsigned long pc = preempt_count(); + int val = this_cpu_read(recursion_bits); + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + if (unlikely(val & (1 << bit))) + return true; + + val |= (1 << bit); + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool offload) +{ + int val = this_cpu_read(recursion_bits); + + if (offload) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} + +static void kick_offload_thread(void) +{ + /* +* Consoles are triggering printks, offload the printks +* to another CPU to hopefully avoid a lockup. +*/ +} asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, @@ -1895,12 +1952,14 @@ asmlinkage int vprintk_emit(int facility, int level, /* If called from the scheduler, we can not call up(). */ if (!in_sched) { + bool offload; /* * Disable preemption to avoid being preempted while holding * console_sem which would prevent anyone from printing to * console */ preempt_disable(); + offload = recursion_check_start(); /* * Try to acquire and then immediately release the console * semaphore. The release will print out buffers and wake up @@ -1908,7 +1967,12 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (console_trylock_spinning()) console_unlock(); + + recursion_check_finish(offload); preempt_enable(); + + if (offload) + kick_offload_thread(); } return printed_len;
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 17 Jan 2018 07:15:09 -0800 Tejun Heowrote: > It's great that Steven's patches solve a good number of problems. It > is also true that there's a class of problems that it doesn't solve, > which other approaches do. The productive thing to do here is trying > to solve the unsolved one too, especially given that it doesn't seem > too difficuilt to do so on top of what's proposed. OK, let's talk about the other problems, as this is no longer related to my patch. >From your previous email: > 1. Console is IPMI emulated serial console. Super slow. Also >netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack >/ driver tries to allocate memory and then fail, which in turn >triggers allocation failure or other warning messages. printk was >already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never >shrinking. Because OOM handler is trapped in printk flushing, it >never manages to free memory and no one else can enter OOM path >either, so the system is trapped in this state. >From what I gathered, you said an OOM would trigger, and then the network console would not be able to allocate memory and it would trigger a printk too, and cause an infinite amount of printks. This could very well be a great place to force offloading. If a printk is called from within a printk, at the same context (normal, softirq, irq or NMI), then we should trigger the offloading. My ftrace ring buffer has a context level recursion check, we could use that, and even tie it into my previous patch: With something like this (not compiled tested or anything, and kick_offload_thread() would need to be implemented). diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..b80b23a0ca13 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2261,6 +2261,63 @@ static int have_callable_console(void) return 0; } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); + +static bool recursion_check_start(void) +{ + unsigned long pc = preempt_count(); + int val = this_cpu_read(recursion_bits); + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + if (unlikely(val & (1 << bit))) + return true; + + val |= (1 << bit); + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool offload) +{ + int val = this_cpu_read(recursion_bits); + + if (offload) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} + +static void kick_offload_thread(void) +{ + /* +* Consoles are triggering printks, offload the printks +* to another CPU to hopefully avoid a lockup. +*/ +} /* * Can we actually use the console at this time on this cpu? @@ -2333,6 +2390,7 @@ void console_unlock(void) for (;;) { struct printk_log *msg; + bool offload; size_t ext_len = 0; size_t len; @@ -2393,15 +2451,20 @@ void console_unlock(void) * waiter waiting to take over. */ console_lock_spinning_enable(); + offload = recursion_check_start(); stop_critical_timings();/* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); + recursion_check_finish(offload); + if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); return; } + if (offload) + kick_offload_thread(); printk_safe_exit_irqrestore(flags); -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 17 Jan 2018 07:15:09 -0800 Tejun Heo wrote: > It's great that Steven's patches solve a good number of problems. It > is also true that there's a class of problems that it doesn't solve, > which other approaches do. The productive thing to do here is trying > to solve the unsolved one too, especially given that it doesn't seem > too difficuilt to do so on top of what's proposed. OK, let's talk about the other problems, as this is no longer related to my patch. >From your previous email: > 1. Console is IPMI emulated serial console. Super slow. Also >netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack >/ driver tries to allocate memory and then fail, which in turn >triggers allocation failure or other warning messages. printk was >already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never >shrinking. Because OOM handler is trapped in printk flushing, it >never manages to free memory and no one else can enter OOM path >either, so the system is trapped in this state. >From what I gathered, you said an OOM would trigger, and then the network console would not be able to allocate memory and it would trigger a printk too, and cause an infinite amount of printks. This could very well be a great place to force offloading. If a printk is called from within a printk, at the same context (normal, softirq, irq or NMI), then we should trigger the offloading. My ftrace ring buffer has a context level recursion check, we could use that, and even tie it into my previous patch: With something like this (not compiled tested or anything, and kick_offload_thread() would need to be implemented). diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..b80b23a0ca13 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2261,6 +2261,63 @@ static int have_callable_console(void) return 0; } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); + +static bool recursion_check_start(void) +{ + unsigned long pc = preempt_count(); + int val = this_cpu_read(recursion_bits); + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + if (unlikely(val & (1 << bit))) + return true; + + val |= (1 << bit); + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool offload) +{ + int val = this_cpu_read(recursion_bits); + + if (offload) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} + +static void kick_offload_thread(void) +{ + /* +* Consoles are triggering printks, offload the printks +* to another CPU to hopefully avoid a lockup. +*/ +} /* * Can we actually use the console at this time on this cpu? @@ -2333,6 +2390,7 @@ void console_unlock(void) for (;;) { struct printk_log *msg; + bool offload; size_t ext_len = 0; size_t len; @@ -2393,15 +2451,20 @@ void console_unlock(void) * waiter waiting to take over. */ console_lock_spinning_enable(); + offload = recursion_check_start(); stop_critical_timings();/* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); + recursion_check_finish(offload); + if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); return; } + if (offload) + kick_offload_thread(); printk_safe_exit_irqrestore(flags); -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 17 Jan 2018 14:04:07 +0100 Petr Mladekwrote: > On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > > On (01/16/18 10:45), Steven Rostedt wrote: > > [..] > > > > [1] https://marc.info/?l=linux-mm=145692016122716 > > > > > > Especially since Konstantin is working on pulling in all LKML archives, > > > the above should be denoted as: > > > > > > Link: > > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > > > > > > hm, may I ask why? is there a new rule now to percent-encode commit > > messages? > > IMHO, the most important thing is that Steven's link is based > on the Message-ID and the stable redirector > https://lkml.kernel.org/. It has a better chance to work > even in the future. Exactly. There's an effort to avoid any outside link dependencies in the Linux git history. No one expected gmane to end (although it appears to be making a comeback), but we don't want to get stuck if marc.info disappears one day. > > > > > > Should we Cc sta...@vger.kernel.org? > > > > that's a good question... maybe yes, maybe no... I'd say this > > change is "safer" when we have hand-off. > > I would keep it as is in stable kernels unless there are > many bug reports. Agreed. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed, 17 Jan 2018 14:04:07 +0100 Petr Mladek wrote: > On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > > On (01/16/18 10:45), Steven Rostedt wrote: > > [..] > > > > [1] https://marc.info/?l=linux-mm=145692016122716 > > > > > > Especially since Konstantin is working on pulling in all LKML archives, > > > the above should be denoted as: > > > > > > Link: > > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > > > > > > hm, may I ask why? is there a new rule now to percent-encode commit > > messages? > > IMHO, the most important thing is that Steven's link is based > on the Message-ID and the stable redirector > https://lkml.kernel.org/. It has a better chance to work > even in the future. Exactly. There's an effort to avoid any outside link dependencies in the Linux git history. No one expected gmane to end (although it appears to be making a comeback), but we don't want to get stuck if marc.info disappears one day. > > > > > > Should we Cc sta...@vger.kernel.org? > > > > that's a good question... maybe yes, maybe no... I'd say this > > change is "safer" when we have hand-off. > > I would keep it as is in stable kernels unless there are > many bug reports. Agreed. -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, On Wed, Jan 17, 2018 at 10:12:08AM +0100, Petr Mladek wrote: > IMHO, the bad scenario with OOM was that any printk() called in > the OOM report became console_lock owner and was responsible > for pushing all new messages to the console. There was a possible > livelock because OOM Killer was blocked in console_unlock() while > other CPUs repeatedly complained about failed allocations. I don't know why we're constantly back into this same loop on this topic but that's not the problem we've been seeing. There are no other CPUs involved. It's great that Steven's patches solve a good number of problems. It is also true that there's a class of problems that it doesn't solve, which other approaches do. The productive thing to do here is trying to solve the unsolved one too, especially given that it doesn't seem too difficuilt to do so on top of what's proposed. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, On Wed, Jan 17, 2018 at 10:12:08AM +0100, Petr Mladek wrote: > IMHO, the bad scenario with OOM was that any printk() called in > the OOM report became console_lock owner and was responsible > for pushing all new messages to the console. There was a possible > livelock because OOM Killer was blocked in console_unlock() while > other CPUs repeatedly complained about failed allocations. I don't know why we're constantly back into this same loop on this topic but that's not the problem we've been seeing. There are no other CPUs involved. It's great that Steven's patches solve a good number of problems. It is also true that there's a class of problems that it doesn't solve, which other approaches do. The productive thing to do here is trying to solve the unsolved one too, especially given that it doesn't seem too difficuilt to do so on top of what's proposed. Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > On (01/16/18 10:45), Steven Rostedt wrote: > [..] > > > [1] https://marc.info/?l=linux-mm=145692016122716 > > > > Especially since Konstantin is working on pulling in all LKML archives, > > the above should be denoted as: > > > > Link: > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > hm, may I ask why? is there a new rule now to percent-encode commit messages? IMHO, the most important thing is that Steven's link is based on the Message-ID and the stable redirector https://lkml.kernel.org/. It has a better chance to work even in the future. I have been asked by other people to use this type of links as well. > > Although the above is for linux-mm and not LKML (it still works), I > > should ask Konstantin if he will be pulling in any of the other > > archives. Perhaps have both? (in case marc.info goes away). > > > > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of > > > console_trylock() callers") > > > > Should we Cc sta...@vger.kernel.org? > > that's a good question... maybe yes, maybe no... I'd say this > change is "safer" when we have hand-off. I would keep it as is in stable kernels unless there are many bug reports. Best Regards, Petr
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > On (01/16/18 10:45), Steven Rostedt wrote: > [..] > > > [1] https://marc.info/?l=linux-mm=145692016122716 > > > > Especially since Konstantin is working on pulling in all LKML archives, > > the above should be denoted as: > > > > Link: > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > hm, may I ask why? is there a new rule now to percent-encode commit messages? IMHO, the most important thing is that Steven's link is based on the Message-ID and the stable redirector https://lkml.kernel.org/. It has a better chance to work even in the future. I have been asked by other people to use this type of links as well. > > Although the above is for linux-mm and not LKML (it still works), I > > should ask Konstantin if he will be pulling in any of the other > > archives. Perhaps have both? (in case marc.info goes away). > > > > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of > > > console_trylock() callers") > > > > Should we Cc sta...@vger.kernel.org? > > that's a good question... maybe yes, maybe no... I'd say this > change is "safer" when we have hand-off. I would keep it as is in stable kernels unless there are many bug reports. Best Regards, Petr
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue 2018-01-16 11:44:56, Tejun Heo wrote: > Hello, Steven. > > On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote: > > All I did was start off a work queue on each CPU, and each CPU does one > > printk() followed by a millisecond sleep. No 10,000 printks, nothing > > in an interrupt handler. Preemption is disabled while the printk > > happens, but that's normal. > > > > This is much closer to an OOM happening all over the system, where OOMs > > stack dumps are occurring on different CPUS. > > OOMs can't happen all over the system. It can only happen on a single > CPU at a time. If you're printing from multiple CPUs, your solution > would work great. That is the situation your patches are designed to > address to begin with. That isn't the problem that I reported tho. I > understand that your solution works for that class of problems and > that is great. I really wish that it could address the other class of > problems too tho, and it doesn't seem like it would be that difficult > to cover both cases, right? IMHO, the bad scenario with OOM was that any printk() called in the OOM report became console_lock owner and was responsible for pushing all new messages to the console. There was a possible livelock because OOM Killer was blocked in console_unlock() while other CPUs repeatedly complained about failed allocations. Even the current patch should help. It allows to hand off the console_lock to another CPU and OOM killer could eventually continue. Of course, it is possible that it might not be enough. For example, there might still be too many messages to print when the memory is freed. Therefore there will be no more complains, no more hand offs and the last console_lock owner might still cause softlockup. But it still will be better than the livelockup. Of course, we will need to address the softlockup. But let's see how this works in practice. Best Regards, Petr
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue 2018-01-16 11:44:56, Tejun Heo wrote: > Hello, Steven. > > On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote: > > All I did was start off a work queue on each CPU, and each CPU does one > > printk() followed by a millisecond sleep. No 10,000 printks, nothing > > in an interrupt handler. Preemption is disabled while the printk > > happens, but that's normal. > > > > This is much closer to an OOM happening all over the system, where OOMs > > stack dumps are occurring on different CPUS. > > OOMs can't happen all over the system. It can only happen on a single > CPU at a time. If you're printing from multiple CPUs, your solution > would work great. That is the situation your patches are designed to > address to begin with. That isn't the problem that I reported tho. I > understand that your solution works for that class of problems and > that is great. I really wish that it could address the other class of > problems too tho, and it doesn't seem like it would be that difficult > to cover both cases, right? IMHO, the bad scenario with OOM was that any printk() called in the OOM report became console_lock owner and was responsible for pushing all new messages to the console. There was a possible livelock because OOM Killer was blocked in console_unlock() while other CPUs repeatedly complained about failed allocations. Even the current patch should help. It allows to hand off the console_lock to another CPU and OOM killer could eventually continue. Of course, it is possible that it might not be enough. For example, there might still be too many messages to print when the memory is freed. Therefore there will be no more complains, no more hand offs and the last console_lock owner might still cause softlockup. But it still will be better than the livelockup. Of course, we will need to address the softlockup. But let's see how this works in practice. Best Regards, Petr
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/16/18 11:13), Petr Mladek wrote: [..] > IMHO, it would make sense if flushing the printk buffer behaves > the same when called either from printk() or from any other path. > I mean that it should be aggressive and allow an effective > hand off. > > It should be safe as long as foo_specific_work() does not take > too much time. > > From other side. The cond_resched() in console_unlock() should > be obsoleted by the hand-shake code. hm, let's not have too optimistic expectations. hand off works in very specific conditions. console is not exclusively owned by printk, and console_sem is not printk's own lock. even things like systemd -> n_tty_write -> do_output_char -> con_write involves console_lock() and console_unlock(). IOW user space logging/debugging can cause printk stalls, and vice versa. by the way, do_con_write() explicitly calls console_conditional_schedule() under console_sem, before it goes to console_unlock(). so the scope of "situation normal, console_sem locked, the owner scheduled out" is much bigger than just vprintk_emit() -> console_unlock(). IMHO. and there are even more things there. personally, I don't think that hand off is enough to obsolete anything in that area. [...] > They both were obvious solutions that helped to reduce the risk > of soft-lockups. The first one handled evidently safe scenarios. > The second one was even more aggressive. I would say that > they both were more or less add-hoc solutions that did not > take into account the other side effects (delaying output, > even loosing messages). agreed. > I would not say that one is a diametric difference between them. > Therefore if we remove one for a reason, we should think about > reverting the other as well. But again. I am fine if we remove > only one now. > > Does this make any sense? I see cond_resched() as a mirroring of console_lock()->console_unlock() behaviour on PREEMPT systems, and as such it looks valid to me, so we probably better keep it there. IMHO. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/16/18 11:13), Petr Mladek wrote: [..] > IMHO, it would make sense if flushing the printk buffer behaves > the same when called either from printk() or from any other path. > I mean that it should be aggressive and allow an effective > hand off. > > It should be safe as long as foo_specific_work() does not take > too much time. > > From other side. The cond_resched() in console_unlock() should > be obsoleted by the hand-shake code. hm, let's not have too optimistic expectations. hand off works in very specific conditions. console is not exclusively owned by printk, and console_sem is not printk's own lock. even things like systemd -> n_tty_write -> do_output_char -> con_write involves console_lock() and console_unlock(). IOW user space logging/debugging can cause printk stalls, and vice versa. by the way, do_con_write() explicitly calls console_conditional_schedule() under console_sem, before it goes to console_unlock(). so the scope of "situation normal, console_sem locked, the owner scheduled out" is much bigger than just vprintk_emit() -> console_unlock(). IMHO. and there are even more things there. personally, I don't think that hand off is enough to obsolete anything in that area. [...] > They both were obvious solutions that helped to reduce the risk > of soft-lockups. The first one handled evidently safe scenarios. > The second one was even more aggressive. I would say that > they both were more or less add-hoc solutions that did not > take into account the other side effects (delaying output, > even loosing messages). agreed. > I would not say that one is a diametric difference between them. > Therefore if we remove one for a reason, we should think about > reverting the other as well. But again. I am fine if we remove > only one now. > > Does this make any sense? I see cond_resched() as a mirroring of console_lock()->console_unlock() behaviour on PREEMPT systems, and as such it looks valid to me, so we probably better keep it there. IMHO. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/16/18 11:19), Petr Mladek wrote: [..] > > [1] https://marc.info/?l=linux-mm=145692016122716 > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of > > console_trylock() callers") > > Signed-off-by: Sergey Senozhatsky> > Reported-by: Tetsuo Handa > > IMHO, this is a step in the right direction. > > Reviewed-by: Petr Mladek > > I'll wait for Steven's review and push this into printk.git. > I'll also add your Acks for the other patches. > > Thanks for the patch and the various observations. thanks! a side note, our console output is still largely preemptible. a typical system acquires console_sem via console_lock() all the time, so we still can have "where is my printk output?" cases. for instance, my IDLE PREEMPT x86 box, has the following stats uptime 15 min # of console_lock() calls: 10981 // can sleep under console_sem # of vprintk_emit() calls: 825// cannot sleep under console_sem -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/16/18 11:19), Petr Mladek wrote: [..] > > [1] https://marc.info/?l=linux-mm=145692016122716 > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of > > console_trylock() callers") > > Signed-off-by: Sergey Senozhatsky > > Reported-by: Tetsuo Handa > > IMHO, this is a step in the right direction. > > Reviewed-by: Petr Mladek > > I'll wait for Steven's review and push this into printk.git. > I'll also add your Acks for the other patches. > > Thanks for the patch and the various observations. thanks! a side note, our console output is still largely preemptible. a typical system acquires console_sem via console_lock() all the time, so we still can have "where is my printk output?" cases. for instance, my IDLE PREEMPT x86 box, has the following stats uptime 15 min # of console_lock() calls: 10981 // can sleep under console_sem # of vprintk_emit() calls: 825// cannot sleep under console_sem -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/16/18 10:45), Steven Rostedt wrote: [..] > > [1] https://marc.info/?l=linux-mm=145692016122716 > > Especially since Konstantin is working on pulling in all LKML archives, > the above should be denoted as: > > Link: > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp hm, may I ask why? is there a new rule now to percent-encode commit messages? > Although the above is for linux-mm and not LKML (it still works), I > should ask Konstantin if he will be pulling in any of the other > archives. Perhaps have both? (in case marc.info goes away). > > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of > > console_trylock() callers") > > Should we Cc sta...@vger.kernel.org? that's a good question... maybe yes, maybe no... I'd say this change is "safer" when we have hand-off. > Reviewed-by: Steven Rostedt (VMware)> > Thanks Sergey! thanks. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On (01/16/18 10:45), Steven Rostedt wrote: [..] > > [1] https://marc.info/?l=linux-mm=145692016122716 > > Especially since Konstantin is working on pulling in all LKML archives, > the above should be denoted as: > > Link: > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp hm, may I ask why? is there a new rule now to percent-encode commit messages? > Although the above is for linux-mm and not LKML (it still works), I > should ask Konstantin if he will be pulling in any of the other > archives. Perhaps have both? (in case marc.info goes away). > > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of > > console_trylock() callers") > > Should we Cc sta...@vger.kernel.org? that's a good question... maybe yes, maybe no... I'd say this change is "safer" when we have hand-off. > Reviewed-by: Steven Rostedt (VMware) > > Thanks Sergey! thanks. -ss
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Steven. On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote: > All I did was start off a work queue on each CPU, and each CPU does one > printk() followed by a millisecond sleep. No 10,000 printks, nothing > in an interrupt handler. Preemption is disabled while the printk > happens, but that's normal. > > This is much closer to an OOM happening all over the system, where OOMs > stack dumps are occurring on different CPUS. OOMs can't happen all over the system. It can only happen on a single CPU at a time. If you're printing from multiple CPUs, your solution would work great. That is the situation your patches are designed to address to begin with. That isn't the problem that I reported tho. I understand that your solution works for that class of problems and that is great. I really wish that it could address the other class of problems too tho, and it doesn't seem like it would be that difficult to cover both cases, right? Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello, Steven. On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote: > All I did was start off a work queue on each CPU, and each CPU does one > printk() followed by a millisecond sleep. No 10,000 printks, nothing > in an interrupt handler. Preemption is disabled while the printk > happens, but that's normal. > > This is much closer to an OOM happening all over the system, where OOMs > stack dumps are occurring on different CPUS. OOMs can't happen all over the system. It can only happen on a single CPU at a time. If you're printing from multiple CPUs, your solution would work great. That is the situation your patches are designed to address to begin with. That isn't the problem that I reported tho. I understand that your solution works for that class of problems and that is great. I really wish that it could address the other class of problems too tho, and it doesn't seem like it would be that difficult to cover both cases, right? Thanks. -- tejun
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue, 16 Jan 2018 15:10:13 +0900 Sergey Senozhatskywrote: > overall that's very close to what I have in one of my private branches. > console_trylock_spinning() for some reason does not perform really > well on my made-up internal printk torture tests. it seems that I One thing I noticed in my test with the module that does printks on all cpus, was that the patch spreads out the processing of the consoles. Before my patch, one printk user would be doing all the work, and all the other printks only had to load their data into the logbuf then exit. The majority of printks took a few microseconds, which looks great if you ignore the one worker that is taking milliseconds to complete. After my patch, since a printk that comes in while another one was running would block, then it would start printing, it did lengthen the time for individual printks to finish. Worst case it would double the time to do printk. But it removed the burden of a single printk doing all the work for all new printks that came in. In other words, I would expect this to make printk on average slower. But no longer unlimited. -- Steve > have a much better stability (no lockups and so on) when I also let > printk_kthread to sleep on console_sem(). but I will look further.
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue, 16 Jan 2018 15:10:13 +0900 Sergey Senozhatsky wrote: > overall that's very close to what I have in one of my private branches. > console_trylock_spinning() for some reason does not perform really > well on my made-up internal printk torture tests. it seems that I One thing I noticed in my test with the module that does printks on all cpus, was that the patch spreads out the processing of the consoles. Before my patch, one printk user would be doing all the work, and all the other printks only had to load their data into the logbuf then exit. The majority of printks took a few microseconds, which looks great if you ignore the one worker that is taking milliseconds to complete. After my patch, since a printk that comes in while another one was running would block, then it would start printing, it did lengthen the time for individual printks to finish. Worst case it would double the time to do printk. But it removed the burden of a single printk doing all the work for all new printks that came in. In other words, I would expect this to make printk on average slower. But no longer unlimited. -- Steve > have a much better stability (no lockups and so on) when I also let > printk_kthread to sleep on console_sem(). but I will look further.
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue, 16 Jan 2018 13:47:16 +0900 Sergey Senozhatskywrote: > From: Sergey Senozhatsky > Subject: [PATCH] printk: never set console_may_schedule in console_trylock() > > This patch, basically, reverts commit 6b97a20d3a79 ("printk: > set may_schedule for some of console_trylock() callers"). > That commit was a mistake, it introduced a big dependency > on the scheduler, by enabling preemption under console_sem > in printk()->console_unlock() path, which is rather too > critical. The patch did not significantly reduce the > possibilities of printk() lockups, but made it possible to > stall printk(), as has been reported by Tetsuo Handa [1]. > > Another issues is that preemption under console_sem also > messes up with Steven Rostedt's hand off scheme, by making > it possible to sleep with console_sem both in console_unlock() > and in vprintk_emit(), after acquiring the console_sem > ownership (anywhere between printk_safe_exit_irqrestore() in > console_trylock_spinning() and printk_safe_enter_irqsave() > in console_unlock()). This makes hand off less likely and, > at the same time, may result in a significant amount of > pending logbuf messages. Preempted console_sem owner makes > it impossible for other CPUs to emit logbuf messages, but > does not make it impossible for other CPUs to append new > messages to the logbuf. > > Reinstate the old behavior and make printk() non-preemptible. > Should any printk() lockup reports arrive they must be handled > in a different way. > > [1] https://marc.info/?l=linux-mm=145692016122716 Especially since Konstantin is working on pulling in all LKML archives, the above should be denoted as: Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp Although the above is for linux-mm and not LKML (it still works), I should ask Konstantin if he will be pulling in any of the other archives. Perhaps have both? (in case marc.info goes away). > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() > callers") Should we Cc sta...@vger.kernel.org? > Signed-off-by: Sergey Senozhatsky > Reported-by: Tetsuo Handa > --- > kernel/printk/printk.c | 22 -- > 1 file changed, 8 insertions(+), 14 deletions(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index ffe05024c622..9cb943c90d98 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1895,6 +1895,12 @@ asmlinkage int vprintk_emit(int facility, int level, > > /* If called from the scheduler, we can not call up(). */ > if (!in_sched) { > + /* > + * Disable preemption to avoid being preempted while holding > + * console_sem which would prevent anyone from printing to > + * console > + */ > + preempt_disable(); > /* >* Try to acquire and then immediately release the console >* semaphore. The release will print out buffers and wake up > @@ -1902,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level, >*/ > if (console_trylock_spinning()) > console_unlock(); > + preempt_enable(); > } > > return printed_len; > @@ -2229,20 +2236,7 @@ int console_trylock(void) > return 0; > } > console_locked = 1; > - /* > - * When PREEMPT_COUNT disabled we can't reliably detect if it's > - * safe to schedule (e.g. calling printk while holding a spin_lock), > - * because preempt_disable()/preempt_enable() are just barriers there > - * and preempt_count() is always 0. > - * > - * RCU read sections have a separate preemption counter when > - * PREEMPT_RCU enabled thus we must take extra care and check > - * rcu_preempt_depth(), otherwise RCU read sections modify > - * preempt_count(). > - */ > - console_may_schedule = !oops_in_progress && > - preemptible() && > - !rcu_preempt_depth(); > + console_may_schedule = 0; > return 1; > } > EXPORT_SYMBOL(console_trylock); Reviewed-by: Steven Rostedt (VMware) Thanks Sergey! -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue, 16 Jan 2018 13:47:16 +0900 Sergey Senozhatsky wrote: > From: Sergey Senozhatsky > Subject: [PATCH] printk: never set console_may_schedule in console_trylock() > > This patch, basically, reverts commit 6b97a20d3a79 ("printk: > set may_schedule for some of console_trylock() callers"). > That commit was a mistake, it introduced a big dependency > on the scheduler, by enabling preemption under console_sem > in printk()->console_unlock() path, which is rather too > critical. The patch did not significantly reduce the > possibilities of printk() lockups, but made it possible to > stall printk(), as has been reported by Tetsuo Handa [1]. > > Another issues is that preemption under console_sem also > messes up with Steven Rostedt's hand off scheme, by making > it possible to sleep with console_sem both in console_unlock() > and in vprintk_emit(), after acquiring the console_sem > ownership (anywhere between printk_safe_exit_irqrestore() in > console_trylock_spinning() and printk_safe_enter_irqsave() > in console_unlock()). This makes hand off less likely and, > at the same time, may result in a significant amount of > pending logbuf messages. Preempted console_sem owner makes > it impossible for other CPUs to emit logbuf messages, but > does not make it impossible for other CPUs to append new > messages to the logbuf. > > Reinstate the old behavior and make printk() non-preemptible. > Should any printk() lockup reports arrive they must be handled > in a different way. > > [1] https://marc.info/?l=linux-mm=145692016122716 Especially since Konstantin is working on pulling in all LKML archives, the above should be denoted as: Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp Although the above is for linux-mm and not LKML (it still works), I should ask Konstantin if he will be pulling in any of the other archives. Perhaps have both? (in case marc.info goes away). > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() > callers") Should we Cc sta...@vger.kernel.org? > Signed-off-by: Sergey Senozhatsky > Reported-by: Tetsuo Handa > --- > kernel/printk/printk.c | 22 -- > 1 file changed, 8 insertions(+), 14 deletions(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index ffe05024c622..9cb943c90d98 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1895,6 +1895,12 @@ asmlinkage int vprintk_emit(int facility, int level, > > /* If called from the scheduler, we can not call up(). */ > if (!in_sched) { > + /* > + * Disable preemption to avoid being preempted while holding > + * console_sem which would prevent anyone from printing to > + * console > + */ > + preempt_disable(); > /* >* Try to acquire and then immediately release the console >* semaphore. The release will print out buffers and wake up > @@ -1902,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level, >*/ > if (console_trylock_spinning()) > console_unlock(); > + preempt_enable(); > } > > return printed_len; > @@ -2229,20 +2236,7 @@ int console_trylock(void) > return 0; > } > console_locked = 1; > - /* > - * When PREEMPT_COUNT disabled we can't reliably detect if it's > - * safe to schedule (e.g. calling printk while holding a spin_lock), > - * because preempt_disable()/preempt_enable() are just barriers there > - * and preempt_count() is always 0. > - * > - * RCU read sections have a separate preemption counter when > - * PREEMPT_RCU enabled thus we must take extra care and check > - * rcu_preempt_depth(), otherwise RCU read sections modify > - * preempt_count(). > - */ > - console_may_schedule = !oops_in_progress && > - preemptible() && > - !rcu_preempt_depth(); > + console_may_schedule = 0; > return 1; > } > EXPORT_SYMBOL(console_trylock); Reviewed-by: Steven Rostedt (VMware) Thanks Sergey! -- Steve
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue 2018-01-16 13:47:16, Sergey Senozhatsky wrote: > if you don't mind, let me fix the thing that I broke. > that would be responsible. I believe I also must say the following: > Tetsuo, many thanks for reporting the issues for song long, and > sorry that it took quite a while to revert that change. > > 8< > > From: Sergey Senozhatsky> Subject: [PATCH] printk: never set console_may_schedule in console_trylock() > > This patch, basically, reverts commit 6b97a20d3a79 ("printk: > set may_schedule for some of console_trylock() callers"). > That commit was a mistake, it introduced a big dependency > on the scheduler, by enabling preemption under console_sem > in printk()->console_unlock() path, which is rather too > critical. The patch did not significantly reduce the > possibilities of printk() lockups, but made it possible to > stall printk(), as has been reported by Tetsuo Handa [1]. > > Another issues is that preemption under console_sem also > messes up with Steven Rostedt's hand off scheme, by making > it possible to sleep with console_sem both in console_unlock() > and in vprintk_emit(), after acquiring the console_sem > ownership (anywhere between printk_safe_exit_irqrestore() in > console_trylock_spinning() and printk_safe_enter_irqsave() > in console_unlock()). This makes hand off less likely and, > at the same time, may result in a significant amount of > pending logbuf messages. Preempted console_sem owner makes > it impossible for other CPUs to emit logbuf messages, but > does not make it impossible for other CPUs to append new > messages to the logbuf. > > Reinstate the old behavior and make printk() non-preemptible. > Should any printk() lockup reports arrive they must be handled > in a different way. > > [1] https://marc.info/?l=linux-mm=145692016122716 > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() > callers") > Signed-off-by: Sergey Senozhatsky > Reported-by: Tetsuo Handa IMHO, this is a step in the right direction. Reviewed-by: Petr Mladek I'll wait for Steven's review and push this into printk.git. I'll also add your Acks for the other patches. Thanks for the patch and the various observations. Best Regards, Petr
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue 2018-01-16 13:47:16, Sergey Senozhatsky wrote: > if you don't mind, let me fix the thing that I broke. > that would be responsible. I believe I also must say the following: > Tetsuo, many thanks for reporting the issues for song long, and > sorry that it took quite a while to revert that change. > > 8< > > From: Sergey Senozhatsky > Subject: [PATCH] printk: never set console_may_schedule in console_trylock() > > This patch, basically, reverts commit 6b97a20d3a79 ("printk: > set may_schedule for some of console_trylock() callers"). > That commit was a mistake, it introduced a big dependency > on the scheduler, by enabling preemption under console_sem > in printk()->console_unlock() path, which is rather too > critical. The patch did not significantly reduce the > possibilities of printk() lockups, but made it possible to > stall printk(), as has been reported by Tetsuo Handa [1]. > > Another issues is that preemption under console_sem also > messes up with Steven Rostedt's hand off scheme, by making > it possible to sleep with console_sem both in console_unlock() > and in vprintk_emit(), after acquiring the console_sem > ownership (anywhere between printk_safe_exit_irqrestore() in > console_trylock_spinning() and printk_safe_enter_irqsave() > in console_unlock()). This makes hand off less likely and, > at the same time, may result in a significant amount of > pending logbuf messages. Preempted console_sem owner makes > it impossible for other CPUs to emit logbuf messages, but > does not make it impossible for other CPUs to append new > messages to the logbuf. > > Reinstate the old behavior and make printk() non-preemptible. > Should any printk() lockup reports arrive they must be handled > in a different way. > > [1] https://marc.info/?l=linux-mm=145692016122716 > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() > callers") > Signed-off-by: Sergey Senozhatsky > Reported-by: Tetsuo Handa IMHO, this is a step in the right direction. Reviewed-by: Petr Mladek I'll wait for Steven's review and push this into printk.git. I'll also add your Acks for the other patches. Thanks for the patch and the various observations. Best Regards, Petr
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue 2018-01-16 11:23:49, Sergey Senozhatsky wrote: > On (01/15/18 15:45), Petr Mladek wrote: > > > I think adding the preempt_disable() would fix printk() but let non > > > printk console_unlock() still preempt. > > > > I would personally remove cond_resched() from console_unlock() > > completely. > > hmm, not so sure. I think it's there for !PREEMPT systems which have > to print a lot of messages. the case I'm speaking about in particular > is when we register a CON_PRINTBUFFER console and need to console_unlock() > (flush) all of the messages we currently have in the logbuf. we better > have that cond_resched() there, I think. Good point. I agree that we should keep the cond_resched() there at least for now. > > Sleeping in console_unlock() increases the chance that more messages > > would need to be handled. And more importantly it reduces the chance > > of a successful handover. > > > > As a result, the caller might spend there very long time, it might > > be getting increasingly far behind. There is higher risk of lost > > messages. Also the eventual taker might have too much to proceed > > in preemption disabled context. > > yes. > > > Removing cond_resched() is in sync with printk() priorities. > > hmm, not sure. we have sleeping console_lock()->console_unlock() path > for PREEMPT kernels, that cond_resched() makes the !PREEMPT kernels to > have the same sleeping console_lock()->console_unlock(). > > printk()->console_unlock() seems to be a pretty independent thing, > unfortunately (!), yet sleeping console_lock()->console_unlock() > messes up with it a lot. IMHO, the problem here is that console_lock is used to synchronize too many things. It would be great to separate printk() duties into a separate lock in the long term. Anyway, I see it the following way. Most console_lock() callers do the following things: void foo() { console_lock() foo_specific_work(); console_unlock(); } where console_unlock() flushes the printk buffer before actually releasing the lock. IMHO, it would make sense if flushing the printk buffer behaves the same when called either from printk() or from any other path. I mean that it should be aggressive and allow an effective hand off. It should be safe as long as foo_specific_work() does not take too much time. >From other side. The cond_resched() in console_unlock() should be obsoleted by the hand-shake code. > > The highest one is to get the messages out. > > > > Finally, removing cond_resched() should make the behavior more > > predictable (never preempted) > > but we are always preempted in PREEMPT kernels when the current > console_sem owner acquired the lock via console_lock(), not via > console_trylock(). cond_resched() does the same, but for !PREEMPT. I agree that the situation is more complicated for cond_resched() called after console_lock(). I do not resist on removing it now. Just one more thing. The time axe looks like: + cond_resched added into console_unlock in v4.5-rc1, Jan 15, 2016 (commit 8d91f8b15361dfb438ab6) + preemtion enabled in printk in, v4.6-rc1, Mar 17, 2016 (commit 6b97a20d3a7909daa0662) They both were obvious solutions that helped to reduce the risk of soft-lockups. The first one handled evidently safe scenarios. The second one was even more aggressive. I would say that they both were more or less add-hoc solutions that did not take into account the other side effects (delaying output, even loosing messages). I would not say that one is a diametric difference between them. Therefore if we remove one for a reason, we should think about reverting the other as well. But again. I am fine if we remove only one now. Does this make any sense? Best Regard, Petr
Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue 2018-01-16 11:23:49, Sergey Senozhatsky wrote: > On (01/15/18 15:45), Petr Mladek wrote: > > > I think adding the preempt_disable() would fix printk() but let non > > > printk console_unlock() still preempt. > > > > I would personally remove cond_resched() from console_unlock() > > completely. > > hmm, not so sure. I think it's there for !PREEMPT systems which have > to print a lot of messages. the case I'm speaking about in particular > is when we register a CON_PRINTBUFFER console and need to console_unlock() > (flush) all of the messages we currently have in the logbuf. we better > have that cond_resched() there, I think. Good point. I agree that we should keep the cond_resched() there at least for now. > > Sleeping in console_unlock() increases the chance that more messages > > would need to be handled. And more importantly it reduces the chance > > of a successful handover. > > > > As a result, the caller might spend there very long time, it might > > be getting increasingly far behind. There is higher risk of lost > > messages. Also the eventual taker might have too much to proceed > > in preemption disabled context. > > yes. > > > Removing cond_resched() is in sync with printk() priorities. > > hmm, not sure. we have sleeping console_lock()->console_unlock() path > for PREEMPT kernels, that cond_resched() makes the !PREEMPT kernels to > have the same sleeping console_lock()->console_unlock(). > > printk()->console_unlock() seems to be a pretty independent thing, > unfortunately (!), yet sleeping console_lock()->console_unlock() > messes up with it a lot. IMHO, the problem here is that console_lock is used to synchronize too many things. It would be great to separate printk() duties into a separate lock in the long term. Anyway, I see it the following way. Most console_lock() callers do the following things: void foo() { console_lock() foo_specific_work(); console_unlock(); } where console_unlock() flushes the printk buffer before actually releasing the lock. IMHO, it would make sense if flushing the printk buffer behaves the same when called either from printk() or from any other path. I mean that it should be aggressive and allow an effective hand off. It should be safe as long as foo_specific_work() does not take too much time. >From other side. The cond_resched() in console_unlock() should be obsoleted by the hand-shake code. > > The highest one is to get the messages out. > > > > Finally, removing cond_resched() should make the behavior more > > predictable (never preempted) > > but we are always preempted in PREEMPT kernels when the current > console_sem owner acquired the lock via console_lock(), not via > console_trylock(). cond_resched() does the same, but for !PREEMPT. I agree that the situation is more complicated for cond_resched() called after console_lock(). I do not resist on removing it now. Just one more thing. The time axe looks like: + cond_resched added into console_unlock in v4.5-rc1, Jan 15, 2016 (commit 8d91f8b15361dfb438ab6) + preemtion enabled in printk in, v4.6-rc1, Mar 17, 2016 (commit 6b97a20d3a7909daa0662) They both were obvious solutions that helped to reduce the risk of soft-lockups. The first one handled evidently safe scenarios. The second one was even more aggressive. I would say that they both were more or less add-hoc solutions that did not take into account the other side effects (delaying output, even loosing messages). I would not say that one is a diametric difference between them. Therefore if we remove one for a reason, we should think about reverting the other as well. But again. I am fine if we remove only one now. Does this make any sense? Best Regard, Petr