Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Fri 2018-02-02 13:17:08, Petr Mladek wrote: > On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote: > > On (01/30/18 13:23), Petr Mladek wrote: > > [..] > > > > If the system is in "big troubles" then what makes irq_work more > > > > possible? Local IRQs can stay disabled, just like preemption. I > > > > guess when the troubles are really big our strategy is the same > > > > for both wq and irq_work solutions - we keep the printk_safe buffer > > > > and wait for panic()->flush. > > > > > > But the patch still uses irq work because queue_work_on() could not > > > be safely called from printk_safe(). By other words, it requires > > > both irq_work and workqueues to be functional. > > > > Right, that's all true. The reason it's done this way is because buffers can > > be big and we still flush under console_sem in console_unlock() loop, which > > can in theory be problematic. In other words, I wanted to remove the root > > cause - irq flush of printk_safe while we are still in printing > > loop. > > Good point! We know that we would eventually push non-trivial amount > of messages and it would make sense to do it from non-atomic context. > > On the other hand, this does not solve the same problem with printk > NMI buffer. And I guess that we do not want to risk offloading to > workqueues for NMI messages. By the above I want to say that I am not sure if it is worth having different solution for flushing printk_safe and nmi buffer. We might be forced to debug why a solution does not work. Shared solution and shared issues might safe us some work. I know that my alternative solution needed some extra code for printk_safe as well. But it is pretty straightforward. It basically just a check of a counter. Note that the WQ-based solution is kind of tricky. There is the journey printk() -> irq_work -> wq. Also it relies on the fact that console_unlock() is called with disabled preemption. > > > > I guess I'm OK with the wq dependency after all, but I may be mistaken. > > > > printk_safe was never about "immediately flush the buffer", it was about > > > > "avoid deadlocks", which was extended to "flush from any context which > > > > will let us to avoid deadlock". It just happened that it inherited > > > > irq_work dependency from printk_nmi. > > > > > > I see the point. But if I remember correctly, it was also designed > > > before we started to be concerned about a sudden death and "get > > > printks out ASAP" mantra. > > > > Can you elaborate a bit? You probably knew the dates. Anyway, I want to say. Even when the messages stored in the printk_safe buffer might be less important, they still might be interesting and we might want to get them out rather sooner than later. All in all, it seems that we are basically down to the dilemma between "get messages out ASAP" vs. "avoid softlockups". We need to take into account: + printk_safe messages are more or less printk-subsystem specific. They might be "less" important for others. But who knows for sure? + the code complexity With the above in mind, I still prefer the "easier" solution based on the counter. If people had problems with softlockups, we would need to solve them. But then they probably will have this problem also with normal printk or in NMI and we would probably need a solution that is not using workqueues. Does this makes any sense? Best Regards, Petr PS: I am sorry for so many mails today. Sometimes (often?), it is so hard to understand and say things clearly. And it is Friday.
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Fri 2018-02-02 13:17:08, Petr Mladek wrote: > On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote: > > On (01/30/18 13:23), Petr Mladek wrote: > > [..] > > > > If the system is in "big troubles" then what makes irq_work more > > > > possible? Local IRQs can stay disabled, just like preemption. I > > > > guess when the troubles are really big our strategy is the same > > > > for both wq and irq_work solutions - we keep the printk_safe buffer > > > > and wait for panic()->flush. > > > > > > But the patch still uses irq work because queue_work_on() could not > > > be safely called from printk_safe(). By other words, it requires > > > both irq_work and workqueues to be functional. > > > > Right, that's all true. The reason it's done this way is because buffers can > > be big and we still flush under console_sem in console_unlock() loop, which > > can in theory be problematic. In other words, I wanted to remove the root > > cause - irq flush of printk_safe while we are still in printing > > loop. > > Good point! We know that we would eventually push non-trivial amount > of messages and it would make sense to do it from non-atomic context. > > On the other hand, this does not solve the same problem with printk > NMI buffer. And I guess that we do not want to risk offloading to > workqueues for NMI messages. By the above I want to say that I am not sure if it is worth having different solution for flushing printk_safe and nmi buffer. We might be forced to debug why a solution does not work. Shared solution and shared issues might safe us some work. I know that my alternative solution needed some extra code for printk_safe as well. But it is pretty straightforward. It basically just a check of a counter. Note that the WQ-based solution is kind of tricky. There is the journey printk() -> irq_work -> wq. Also it relies on the fact that console_unlock() is called with disabled preemption. > > > > I guess I'm OK with the wq dependency after all, but I may be mistaken. > > > > printk_safe was never about "immediately flush the buffer", it was about > > > > "avoid deadlocks", which was extended to "flush from any context which > > > > will let us to avoid deadlock". It just happened that it inherited > > > > irq_work dependency from printk_nmi. > > > > > > I see the point. But if I remember correctly, it was also designed > > > before we started to be concerned about a sudden death and "get > > > printks out ASAP" mantra. > > > > Can you elaborate a bit? You probably knew the dates. Anyway, I want to say. Even when the messages stored in the printk_safe buffer might be less important, they still might be interesting and we might want to get them out rather sooner than later. All in all, it seems that we are basically down to the dilemma between "get messages out ASAP" vs. "avoid softlockups". We need to take into account: + printk_safe messages are more or less printk-subsystem specific. They might be "less" important for others. But who knows for sure? + the code complexity With the above in mind, I still prefer the "easier" solution based on the counter. If people had problems with softlockups, we would need to solve them. But then they probably will have this problem also with normal printk or in NMI and we would probably need a solution that is not using workqueues. Does this makes any sense? Best Regards, Petr PS: I am sorry for so many mails today. Sometimes (often?), it is so hard to understand and say things clearly. And it is Friday.
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote: > On (01/30/18 13:23), Petr Mladek wrote: > [..] > > > If the system is in "big troubles" then what makes irq_work more > > > possible? Local IRQs can stay disabled, just like preemption. I > > > guess when the troubles are really big our strategy is the same > > > for both wq and irq_work solutions - we keep the printk_safe buffer > > > and wait for panic()->flush. > > > > But the patch still uses irq work because queue_work_on() could not > > be safely called from printk_safe(). By other words, it requires > > both irq_work and workqueues to be functional. > > Right, that's all true. The reason it's done this way is because buffers can > be big and we still flush under console_sem in console_unlock() loop, which > can in theory be problematic. In other words, I wanted to remove the root > cause - irq flush of printk_safe while we are still in printing > loop. Good point! We know that we would eventually push non-trivial amount of messages and it would make sense to do it from non-atomic context. On the other hand, this does not solve the same problem with printk NMI buffer. And I guess that we do not want to risk offloading to workqueues for NMI messages. > > > I guess I'm OK with the wq dependency after all, but I may be mistaken. > > > printk_safe was never about "immediately flush the buffer", it was about > > > "avoid deadlocks", which was extended to "flush from any context which > > > will let us to avoid deadlock". It just happened that it inherited > > > irq_work dependency from printk_nmi. > > > > I see the point. But if I remember correctly, it was also designed > > before we started to be concerned about a sudden death and "get > > printks out ASAP" mantra. > > Can you elaborate a bit? The pull request with printk_safe was sent on February 22, 2017, see https://lkml.kernel.org/r/20170222114705.ga30...@linux.suse The printk softlockup was still being solved by an immediate offload from vprintk_emit() on March 29, 2017, see https://lkml.kernel.org/r/20170329092511.3958-3-sergey.senozhat...@gmail.com I believe that it was the mail from Pavel Machek that made us thinking about the sudden death. It was sent on April 7, 2017, see https://lkml.kernel.org/r/20170407120642.GB4756@amd The first version with the offload from console_unlock was sent on May 9, 2017, see https://lkml.kernel.org/r/20170509082859.854-3-sergey.senozhat...@gmail.com I am not exactly sure when the "get printks out ASAP" mantra started but I cannot forget the mail from June 30, 2017, see https://lkml.kernel.org/r/20170630070131.GA474@jagdpanzerIV.localdomain Best Regards, Petr
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote: > On (01/30/18 13:23), Petr Mladek wrote: > [..] > > > If the system is in "big troubles" then what makes irq_work more > > > possible? Local IRQs can stay disabled, just like preemption. I > > > guess when the troubles are really big our strategy is the same > > > for both wq and irq_work solutions - we keep the printk_safe buffer > > > and wait for panic()->flush. > > > > But the patch still uses irq work because queue_work_on() could not > > be safely called from printk_safe(). By other words, it requires > > both irq_work and workqueues to be functional. > > Right, that's all true. The reason it's done this way is because buffers can > be big and we still flush under console_sem in console_unlock() loop, which > can in theory be problematic. In other words, I wanted to remove the root > cause - irq flush of printk_safe while we are still in printing > loop. Good point! We know that we would eventually push non-trivial amount of messages and it would make sense to do it from non-atomic context. On the other hand, this does not solve the same problem with printk NMI buffer. And I guess that we do not want to risk offloading to workqueues for NMI messages. > > > I guess I'm OK with the wq dependency after all, but I may be mistaken. > > > printk_safe was never about "immediately flush the buffer", it was about > > > "avoid deadlocks", which was extended to "flush from any context which > > > will let us to avoid deadlock". It just happened that it inherited > > > irq_work dependency from printk_nmi. > > > > I see the point. But if I remember correctly, it was also designed > > before we started to be concerned about a sudden death and "get > > printks out ASAP" mantra. > > Can you elaborate a bit? The pull request with printk_safe was sent on February 22, 2017, see https://lkml.kernel.org/r/20170222114705.ga30...@linux.suse The printk softlockup was still being solved by an immediate offload from vprintk_emit() on March 29, 2017, see https://lkml.kernel.org/r/20170329092511.3958-3-sergey.senozhat...@gmail.com I believe that it was the mail from Pavel Machek that made us thinking about the sudden death. It was sent on April 7, 2017, see https://lkml.kernel.org/r/20170407120642.GB4756@amd The first version with the offload from console_unlock was sent on May 9, 2017, see https://lkml.kernel.org/r/20170509082859.854-3-sergey.senozhat...@gmail.com I am not exactly sure when the "get printks out ASAP" mantra started but I cannot forget the mail from June 30, 2017, see https://lkml.kernel.org/r/20170630070131.GA474@jagdpanzerIV.localdomain Best Regards, Petr
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Fri 2018-02-02 10:07:20, Sergey Senozhatsky wrote: > On (02/01/18 13:00), Steven Rostedt wrote: > > On Mon, 29 Jan 2018 11:29:18 +0900 > > Sergey Senozhatskywrote: > [..] > > > If the system is in "big troubles" then what makes irq_work more > > > possible? Local IRQs can stay disabled, just like preemption. I > > > guess when the troubles are really big our strategy is the same > > > for both wq and irq_work solutions - we keep the printk_safe buffer > > > and wait for panic()->flush. > > > > Working on the RT kernel, I would tell you there's a huge difference > > getting an irq_work to trigger than to expect something to schedule. And this is not only about scheduling. It is also about having a worker ready to handle the work. So, there is yet another level that can eventually fail. > > But if printk_safe() is just for recursion protection, how important is > > it to get out? Good question! > Well, it depends. printk_safe can protect us against... what should I > call it... let's call it first order, or direct, printk recursion. The > one which involve locks internal to print. For instance, > > vprintk_emit() >spin_lock_irqsave(_lock) > spin_lock_debug(...) > spin_dump() > printk() >vprintk_emit() > spin_lock_irqsave(_lock) << deadlock > > printk_safe will save us by redirecting spin_dump()->printk(). > > So printk_safe output is in general of some interest, but we don't > have guarantees that it will be printed: if it was the direct printk > recursion - then it's all good, if indirect - then it may not be good. IMHO, the question is if the information about printk recursion would help to understand the situation on the system or if it would just add a noise to the original problem. I would personally prefer to know about it. But I do not feel experienced enough to make a generic decision. Best Regards, Petr
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Fri 2018-02-02 10:07:20, Sergey Senozhatsky wrote: > On (02/01/18 13:00), Steven Rostedt wrote: > > On Mon, 29 Jan 2018 11:29:18 +0900 > > Sergey Senozhatsky wrote: > [..] > > > If the system is in "big troubles" then what makes irq_work more > > > possible? Local IRQs can stay disabled, just like preemption. I > > > guess when the troubles are really big our strategy is the same > > > for both wq and irq_work solutions - we keep the printk_safe buffer > > > and wait for panic()->flush. > > > > Working on the RT kernel, I would tell you there's a huge difference > > getting an irq_work to trigger than to expect something to schedule. And this is not only about scheduling. It is also about having a worker ready to handle the work. So, there is yet another level that can eventually fail. > > But if printk_safe() is just for recursion protection, how important is > > it to get out? Good question! > Well, it depends. printk_safe can protect us against... what should I > call it... let's call it first order, or direct, printk recursion. The > one which involve locks internal to print. For instance, > > vprintk_emit() >spin_lock_irqsave(_lock) > spin_lock_debug(...) > spin_dump() > printk() >vprintk_emit() > spin_lock_irqsave(_lock) << deadlock > > printk_safe will save us by redirecting spin_dump()->printk(). > > So printk_safe output is in general of some interest, but we don't > have guarantees that it will be printed: if it was the direct printk > recursion - then it's all good, if indirect - then it may not be good. IMHO, the question is if the information about printk recursion would help to understand the situation on the system or if it would just add a noise to the original problem. I would personally prefer to know about it. But I do not feel experienced enough to make a generic decision. Best Regards, Petr
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On (02/01/18 13:00), Steven Rostedt wrote: > On Mon, 29 Jan 2018 11:29:18 +0900 > Sergey Senozhatskywrote: [..] > > If the system is in "big troubles" then what makes irq_work more > > possible? Local IRQs can stay disabled, just like preemption. I > > guess when the troubles are really big our strategy is the same > > for both wq and irq_work solutions - we keep the printk_safe buffer > > and wait for panic()->flush. > > Working on the RT kernel, I would tell you there's a huge difference > getting an irq_work to trigger than to expect something to schedule. > > It takes much less to disable the systems scheduling than it does to > disable interrupts. If interrupts are disabled, then so is scheduling. > But if scheduling is disabled, interrupts may still trigger. Sure, I understand those points and don't object any of those. Just weighing the risks and benefits. > But if printk_safe() is just for recursion protection, how important is > it to get out? Well, it depends. printk_safe can protect us against... what should I call it... let's call it first order, or direct, printk recursion. The one which involve locks internal to print. For instance, vprintk_emit() spin_lock_irqsave(_lock) spin_lock_debug(...) spin_dump() printk() vprintk_emit() spin_lock_irqsave(_lock) << deadlock printk_safe will save us by redirecting spin_dump()->printk(). printk_safe cannot protect us against second order, or indirect, printk recursion. The one that involves locks which are external to printk. In case of indirect recursion we can store lockdep report in printk_safe, so we will not immediately recurse back to printk, but in general it does not mean that we will be actually safe from any deadlocks. For instance, timekeeping spin_lock_irqsave timer base printk call_console_drivers foo_console_driver mod_timer spin_lock_irqsave timer base << deadlock printk_safe will not save us here, and most likely irq_work will not happen on this CPU, because we deadlock with IRQs disabled. So printk_safe output is in general of some interest, but we don't have guarantees that it will be printed: if it was the direct printk recursion - then it's all good, if indirect - then it may not be good. -ss
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On (02/01/18 13:00), Steven Rostedt wrote: > On Mon, 29 Jan 2018 11:29:18 +0900 > Sergey Senozhatsky wrote: [..] > > If the system is in "big troubles" then what makes irq_work more > > possible? Local IRQs can stay disabled, just like preemption. I > > guess when the troubles are really big our strategy is the same > > for both wq and irq_work solutions - we keep the printk_safe buffer > > and wait for panic()->flush. > > Working on the RT kernel, I would tell you there's a huge difference > getting an irq_work to trigger than to expect something to schedule. > > It takes much less to disable the systems scheduling than it does to > disable interrupts. If interrupts are disabled, then so is scheduling. > But if scheduling is disabled, interrupts may still trigger. Sure, I understand those points and don't object any of those. Just weighing the risks and benefits. > But if printk_safe() is just for recursion protection, how important is > it to get out? Well, it depends. printk_safe can protect us against... what should I call it... let's call it first order, or direct, printk recursion. The one which involve locks internal to print. For instance, vprintk_emit() spin_lock_irqsave(_lock) spin_lock_debug(...) spin_dump() printk() vprintk_emit() spin_lock_irqsave(_lock) << deadlock printk_safe will save us by redirecting spin_dump()->printk(). printk_safe cannot protect us against second order, or indirect, printk recursion. The one that involves locks which are external to printk. In case of indirect recursion we can store lockdep report in printk_safe, so we will not immediately recurse back to printk, but in general it does not mean that we will be actually safe from any deadlocks. For instance, timekeeping spin_lock_irqsave timer base printk call_console_drivers foo_console_driver mod_timer spin_lock_irqsave timer base << deadlock printk_safe will not save us here, and most likely irq_work will not happen on this CPU, because we deadlock with IRQs disabled. So printk_safe output is in general of some interest, but we don't have guarantees that it will be printed: if it was the direct printk recursion - then it's all good, if indirect - then it may not be good. -ss
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Mon, 29 Jan 2018 11:29:18 +0900 Sergey Senozhatskywrote: > On (01/26/18 16:26), Petr Mladek wrote: > [..] > > First, this delays showing eventually valuable information until > > the preemption is enabled. It might never happen if the system > > is in big troubles. In each case, it might be much longer delay > > than it was before. > > If the system is in "big troubles" then what makes irq_work more > possible? Local IRQs can stay disabled, just like preemption. I > guess when the troubles are really big our strategy is the same > for both wq and irq_work solutions - we keep the printk_safe buffer > and wait for panic()->flush. Working on the RT kernel, I would tell you there's a huge difference getting an irq_work to trigger than to expect something to schedule. It takes much less to disable the systems scheduling than it does to disable interrupts. If interrupts are disabled, then so is scheduling. But if scheduling is disabled, interrupts may still trigger. But if printk_safe() is just for recursion protection, how important is it to get out? -- Steve
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Mon, 29 Jan 2018 11:29:18 +0900 Sergey Senozhatsky wrote: > On (01/26/18 16:26), Petr Mladek wrote: > [..] > > First, this delays showing eventually valuable information until > > the preemption is enabled. It might never happen if the system > > is in big troubles. In each case, it might be much longer delay > > than it was before. > > If the system is in "big troubles" then what makes irq_work more > possible? Local IRQs can stay disabled, just like preemption. I > guess when the troubles are really big our strategy is the same > for both wq and irq_work solutions - we keep the printk_safe buffer > and wait for panic()->flush. Working on the RT kernel, I would tell you there's a huge difference getting an irq_work to trigger than to expect something to schedule. It takes much less to disable the systems scheduling than it does to disable interrupts. If interrupts are disabled, then so is scheduling. But if scheduling is disabled, interrupts may still trigger. But if printk_safe() is just for recursion protection, how important is it to get out? -- Steve
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On (01/30/18 13:23), Petr Mladek wrote: [..] > > If the system is in "big troubles" then what makes irq_work more > > possible? Local IRQs can stay disabled, just like preemption. I > > guess when the troubles are really big our strategy is the same > > for both wq and irq_work solutions - we keep the printk_safe buffer > > and wait for panic()->flush. > > But the patch still uses irq work because queue_work_on() could not > be safely called from printk_safe(). By other words, it requires > both irq_work and workqueues to be functional. Right, that's all true. The reason it's done this way is because buffers can be big and we still flush under console_sem in console_unlock() loop, which can in theory be problematic. In other words, I wanted to remove the root cause - irq flush of printk_safe while we are still in printing loop. Technically, we minimize the probability by throttling down printk_safe flush, but we don't eliminate the possibility entirely. Maybe it is good enough, maybe not. Opinions? [..] > > `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT > > a bit useless and hard to understand - despite its value we will > > store only 100 lines. > > > > We probably can replace `console_recursion_limit' with the following: > > - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN > > chars to be stored in printk-safe buffer and, once we reached the limit, > > don't append any new messages until we are out of `console_recursion' > > context. Which is somewhat close to wq solution, the difference is that > > printk_safe can happen earlier if local IRQs are enabled. ^ printk_safe flush > I like this idea. It would actually make perfect sense to use the same > limit for PRINTK_SAFE buffer size and for the printk recursion. Yes, we probably can do it that way, but this thing " They both should be big enough to " is a bit of a concern. The "big enough to" can lead to different things. > > I guess I'm OK with the wq dependency after all, but I may be mistaken. > > printk_safe was never about "immediately flush the buffer", it was about > > "avoid deadlocks", which was extended to "flush from any context which > > will let us to avoid deadlock". It just happened that it inherited > > irq_work dependency from printk_nmi. > > I see the point. But if I remember correctly, it was also designed > before we started to be concerned about a sudden death and "get > printks out ASAP" mantra. Can you elaborate a bit? -ss
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On (01/30/18 13:23), Petr Mladek wrote: [..] > > If the system is in "big troubles" then what makes irq_work more > > possible? Local IRQs can stay disabled, just like preemption. I > > guess when the troubles are really big our strategy is the same > > for both wq and irq_work solutions - we keep the printk_safe buffer > > and wait for panic()->flush. > > But the patch still uses irq work because queue_work_on() could not > be safely called from printk_safe(). By other words, it requires > both irq_work and workqueues to be functional. Right, that's all true. The reason it's done this way is because buffers can be big and we still flush under console_sem in console_unlock() loop, which can in theory be problematic. In other words, I wanted to remove the root cause - irq flush of printk_safe while we are still in printing loop. Technically, we minimize the probability by throttling down printk_safe flush, but we don't eliminate the possibility entirely. Maybe it is good enough, maybe not. Opinions? [..] > > `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT > > a bit useless and hard to understand - despite its value we will > > store only 100 lines. > > > > We probably can replace `console_recursion_limit' with the following: > > - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN > > chars to be stored in printk-safe buffer and, once we reached the limit, > > don't append any new messages until we are out of `console_recursion' > > context. Which is somewhat close to wq solution, the difference is that > > printk_safe can happen earlier if local IRQs are enabled. ^ printk_safe flush > I like this idea. It would actually make perfect sense to use the same > limit for PRINTK_SAFE buffer size and for the printk recursion. Yes, we probably can do it that way, but this thing " They both should be big enough to " is a bit of a concern. The "big enough to" can lead to different things. > > I guess I'm OK with the wq dependency after all, but I may be mistaken. > > printk_safe was never about "immediately flush the buffer", it was about > > "avoid deadlocks", which was extended to "flush from any context which > > will let us to avoid deadlock". It just happened that it inherited > > irq_work dependency from printk_nmi. > > I see the point. But if I remember correctly, it was also designed > before we started to be concerned about a sudden death and "get > printks out ASAP" mantra. Can you elaborate a bit? -ss
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Mon 2018-01-29 11:29:18, Sergey Senozhatsky wrote: > On (01/26/18 16:26), Petr Mladek wrote: > [..] > > First, this delays showing eventually valuable information until > > the preemption is enabled. It might never happen if the system > > is in big troubles. In each case, it might be much longer delay > > than it was before. > > If the system is in "big troubles" then what makes irq_work more > possible? Local IRQs can stay disabled, just like preemption. I > guess when the troubles are really big our strategy is the same > for both wq and irq_work solutions - we keep the printk_safe buffer > and wait for panic()->flush. But the patch still uses irq work because queue_work_on() could not be safely called from printk_safe(). By other words, it requires both irq_work and workqueues to be functional. Note that there might be deadlock or livelock in the workqueues subsystem. It is just another non-trivial thingy that might get wrong. Also interrupts are handled immediately when they are enabled. On the other hand, a workqueue work is proceed only when the worker is scheduled and the work is first in the queue. It might take ages if there is a high load on the CPU or on the given workqueue. > > Second, it makes printk() dependent on another non-trivial subsystem. > > I mean workqueues. > [..] > > The following, a bit ugly, solution has came to my mind. We could > > think about it like extending the printk_context. It counts > > printks called in this context and does nothing when we reach > > the limit. The difference is that the context is task-specific > > instead of CPU-specific. > [..] > > +int console_recursion_count; > > +int console_recursion_limit = 100; > > Hm... I'm not entirely happy with magic constants, to be honest. > Why 100? One of the printk_safe lockdep reports I saw was ~270 > lines long: https://marc.info/?l=linux-kernel=150659041411473=2 I am not happy with this constant either. It was used just for a simplicity of the RFC. > `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT > a bit useless and hard to understand - despite its value we will > store only 100 lines. > > We probably can replace `console_recursion_limit' with the following: > - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN > chars to be stored in printk-safe buffer and, once we reached the limit, > don't append any new messages until we are out of `console_recursion' > context. Which is somewhat close to wq solution, the difference is that > printk_safe can happen earlier if local IRQs are enabled. I like this idea. It would actually make perfect sense to use the same limit for PRINTK_SAFE buffer size and for the printk recursion. They both should be big enough to allow a meaningful report. On the other hand, they both should be relatively small. One because of memory constrains, the other because of reducing redundancy. In each case, there is a direct dependency. The recursive messages are stored into the printk_safe buffer. > same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to > still cause troubles, just because printk-deadlock errors sound scary > and important enough. We could always make it more complicated if people come up with a reasonable use case. IMHO, most people will not care about these limits. > I guess I'm OK with the wq dependency after all, but I may be mistaken. > printk_safe was never about "immediately flush the buffer", it was about > "avoid deadlocks", which was extended to "flush from any context which > will let us to avoid deadlock". It just happened that it inherited > irq_work dependency from printk_nmi. I see the point. But if I remember correctly, it was also designed before we started to be concerned about a sudden death and "get printks out ASAP" mantra. Best Regards, Petr
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Mon 2018-01-29 11:29:18, Sergey Senozhatsky wrote: > On (01/26/18 16:26), Petr Mladek wrote: > [..] > > First, this delays showing eventually valuable information until > > the preemption is enabled. It might never happen if the system > > is in big troubles. In each case, it might be much longer delay > > than it was before. > > If the system is in "big troubles" then what makes irq_work more > possible? Local IRQs can stay disabled, just like preemption. I > guess when the troubles are really big our strategy is the same > for both wq and irq_work solutions - we keep the printk_safe buffer > and wait for panic()->flush. But the patch still uses irq work because queue_work_on() could not be safely called from printk_safe(). By other words, it requires both irq_work and workqueues to be functional. Note that there might be deadlock or livelock in the workqueues subsystem. It is just another non-trivial thingy that might get wrong. Also interrupts are handled immediately when they are enabled. On the other hand, a workqueue work is proceed only when the worker is scheduled and the work is first in the queue. It might take ages if there is a high load on the CPU or on the given workqueue. > > Second, it makes printk() dependent on another non-trivial subsystem. > > I mean workqueues. > [..] > > The following, a bit ugly, solution has came to my mind. We could > > think about it like extending the printk_context. It counts > > printks called in this context and does nothing when we reach > > the limit. The difference is that the context is task-specific > > instead of CPU-specific. > [..] > > +int console_recursion_count; > > +int console_recursion_limit = 100; > > Hm... I'm not entirely happy with magic constants, to be honest. > Why 100? One of the printk_safe lockdep reports I saw was ~270 > lines long: https://marc.info/?l=linux-kernel=150659041411473=2 I am not happy with this constant either. It was used just for a simplicity of the RFC. > `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT > a bit useless and hard to understand - despite its value we will > store only 100 lines. > > We probably can replace `console_recursion_limit' with the following: > - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN > chars to be stored in printk-safe buffer and, once we reached the limit, > don't append any new messages until we are out of `console_recursion' > context. Which is somewhat close to wq solution, the difference is that > printk_safe can happen earlier if local IRQs are enabled. I like this idea. It would actually make perfect sense to use the same limit for PRINTK_SAFE buffer size and for the printk recursion. They both should be big enough to allow a meaningful report. On the other hand, they both should be relatively small. One because of memory constrains, the other because of reducing redundancy. In each case, there is a direct dependency. The recursive messages are stored into the printk_safe buffer. > same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to > still cause troubles, just because printk-deadlock errors sound scary > and important enough. We could always make it more complicated if people come up with a reasonable use case. IMHO, most people will not care about these limits. > I guess I'm OK with the wq dependency after all, but I may be mistaken. > printk_safe was never about "immediately flush the buffer", it was about > "avoid deadlocks", which was extended to "flush from any context which > will let us to avoid deadlock". It just happened that it inherited > irq_work dependency from printk_nmi. I see the point. But if I remember correctly, it was also designed before we started to be concerned about a sudden death and "get printks out ASAP" mantra. Best Regards, Petr
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On (01/26/18 16:26), Petr Mladek wrote: [..] > First, this delays showing eventually valuable information until > the preemption is enabled. It might never happen if the system > is in big troubles. In each case, it might be much longer delay > than it was before. If the system is in "big troubles" then what makes irq_work more possible? Local IRQs can stay disabled, just like preemption. I guess when the troubles are really big our strategy is the same for both wq and irq_work solutions - we keep the printk_safe buffer and wait for panic()->flush. > Second, it makes printk() dependent on another non-trivial subsystem. > I mean workqueues. [..] > The following, a bit ugly, solution has came to my mind. We could > think about it like extending the printk_context. It counts > printks called in this context and does nothing when we reach > the limit. The difference is that the context is task-specific > instead of CPU-specific. [..] > +int console_recursion_count; > +int console_recursion_limit = 100; Hm... I'm not entirely happy with magic constants, to be honest. Why 100? One of the printk_safe lockdep reports I saw was ~270 lines long: https://marc.info/?l=linux-kernel=150659041411473=2 `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT a bit useless and hard to understand - despite its value we will store only 100 lines. We probably can replace `console_recursion_limit' with the following: - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN chars to be stored in printk-safe buffer and, once we reached the limit, don't append any new messages until we are out of `console_recursion' context. Which is somewhat close to wq solution, the difference is that printk_safe can happen earlier if local IRQs are enabled. But at the same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to still cause troubles, just because printk-deadlock errors sound scary and important enough. I guess I'm OK with the wq dependency after all, but I may be mistaken. printk_safe was never about "immediately flush the buffer", it was about "avoid deadlocks", which was extended to "flush from any context which will let us to avoid deadlock". It just happened that it inherited irq_work dependency from printk_nmi. We also probably can add printk_safe flush to some sysrq handler may be. > PS: I answered this mail because the original discussion[1] has > already been way too long and covered many other issues. Yep, that's I started it as a new discussion. -ss
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On (01/26/18 16:26), Petr Mladek wrote: [..] > First, this delays showing eventually valuable information until > the preemption is enabled. It might never happen if the system > is in big troubles. In each case, it might be much longer delay > than it was before. If the system is in "big troubles" then what makes irq_work more possible? Local IRQs can stay disabled, just like preemption. I guess when the troubles are really big our strategy is the same for both wq and irq_work solutions - we keep the printk_safe buffer and wait for panic()->flush. > Second, it makes printk() dependent on another non-trivial subsystem. > I mean workqueues. [..] > The following, a bit ugly, solution has came to my mind. We could > think about it like extending the printk_context. It counts > printks called in this context and does nothing when we reach > the limit. The difference is that the context is task-specific > instead of CPU-specific. [..] > +int console_recursion_count; > +int console_recursion_limit = 100; Hm... I'm not entirely happy with magic constants, to be honest. Why 100? One of the printk_safe lockdep reports I saw was ~270 lines long: https://marc.info/?l=linux-kernel=150659041411473=2 `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT a bit useless and hard to understand - despite its value we will store only 100 lines. We probably can replace `console_recursion_limit' with the following: - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN chars to be stored in printk-safe buffer and, once we reached the limit, don't append any new messages until we are out of `console_recursion' context. Which is somewhat close to wq solution, the difference is that printk_safe can happen earlier if local IRQs are enabled. But at the same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to still cause troubles, just because printk-deadlock errors sound scary and important enough. I guess I'm OK with the wq dependency after all, but I may be mistaken. printk_safe was never about "immediately flush the buffer", it was about "avoid deadlocks", which was extended to "flush from any context which will let us to avoid deadlock". It just happened that it inherited irq_work dependency from printk_nmi. We also probably can add printk_safe flush to some sysrq handler may be. > PS: I answered this mail because the original discussion[1] has > already been way too long and covered many other issues. Yep, that's I started it as a new discussion. -ss
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Wed 2018-01-24 18:37:23, Sergey Senozhatsky wrote: > Tejun Heo reported that net_console can cause recursive printk()-s > from call_console_drivers() (under OOM/etc.) and this can lockup that > CPU, because CPU calls call_console_drivers() to print pending logbufs, > but recursive printk() issued by one of console drivers adds several > new messages to the logbuf via recursive printk(). > Note, this is not a printk_safe problem. In fact printk_safe helps > us to address the issue here and to rate-limit recursive printk()-s. > > Schematically, what we have at the moment is: > > == > printk() > preempt_disable() > console_unlock() > { > for (;;) { > local_irq_save() > printk_safe_enter() > > call_console_drivers() > printk() > vprintk_safe() >irq_work_queue() > > printk_safe_exit() > local_irq_restore() > > << irq work >> >printk_safe_flush() > printk() >} > } > preempt_enable() > == > > So CPU triggers printk() recursion in call_console_drivers(), > immediately after local_irq_restore() it executes printk_safe > flush irq_work, which adds even more pending messages to the > logbuf which it has to print. CPU hits printk() recursion again > while it prints pending messages, which results in even more > pending messages flushed from printk_safe via irq_work. And so > on. > > The idea is to delay printk_safe flush until CPU is in preemptible > context, so we won't lockup it up. > > The new behaviour is: > > == > printk() > preempt_disable() > console_unlock() > { > for (;;) { > local_irq_save() > printk_safe_enter() > > call_console_drivers() > printk() > vprintk_safe() >queue_work_on(smp_processor_id()) > > printk_safe_exit() > local_irq_restore() >} > } > preempt_enable() > > << work queue >> >printk_safe_flush() > printk() > == > > Since console_unlock() is called under preemption_disabled() now > we cannot schedule flush work as long as CPU is printing messages. > We only can flush printk_safe messages after CPU returns from > console_unlock() and enables preemption. > > This might look like with the delayed flush we increase the possibility > of lost printk_safe messages (printk_safe per-CPU buffers are pretty > limited in size). But prinkt_safe was never meant to be used for huge > amounts of data which can generate e.g. debugging of serial consoles, > etc. Its goal is to make recursive printk()-s less deadlock prone. > With this patch we add one more thing: avoid CPU lockup which might > be caused by printk_safe_flush(). I have two concerns about this approach: First, this delays showing eventually valuable information until the preemption is enabled. It might never happen if the system is in big troubles. In each case, it might be much longer delay than it was before. Second, it makes printk() dependent on another non-trivial subsystem. I mean workqueues. If I got it right, we want to limit the number of recursively added messages that might be printed() from inside console_unlock(), especially from console_drivers(). The following, a bit ugly, solution has came to my mind. We could think about it like extending the printk_context. It counts printks called in this context and does nothing when we reach the limit. The difference is that the context is task-specific instead of CPU-specific. I used only one compiler barrier which looks strange. Well, it is late Friday here and it is just a POC. I wonder what you think about this approach. Just a compile tested code: diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 2a7d04049af4..747098310179 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -23,6 +23,9 @@ #define PRINTK_NMI_CONTEXT_MASK 0x8000 extern raw_spinlock_t logbuf_lock; +extern struct task_struct *console_unlock_owner; +extern int console_recursion_count; +extern int console_recursion_limit; __printf(1, 0) int vprintk_default(const char *fmt, va_list args); __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index db4b9b8929eb..4508bb579c5b 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -81,6 +81,9 @@ EXPORT_SYMBOL(oops_in_progress); static DEFINE_SEMAPHORE(console_sem); struct console *console_drivers; EXPORT_SYMBOL_GPL(console_drivers); +struct task_struct *console_unlock_owner; +int console_recursion_count; +int console_recursion_limit = 100; #ifdef CONFIG_LOCKDEP static struct lockdep_map console_lock_dep_map = { @@ -2293,6 +2296,7 @@ void console_unlock(void) unsigned long flags; bool wake_klogd = false;
Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work
On Wed 2018-01-24 18:37:23, Sergey Senozhatsky wrote: > Tejun Heo reported that net_console can cause recursive printk()-s > from call_console_drivers() (under OOM/etc.) and this can lockup that > CPU, because CPU calls call_console_drivers() to print pending logbufs, > but recursive printk() issued by one of console drivers adds several > new messages to the logbuf via recursive printk(). > Note, this is not a printk_safe problem. In fact printk_safe helps > us to address the issue here and to rate-limit recursive printk()-s. > > Schematically, what we have at the moment is: > > == > printk() > preempt_disable() > console_unlock() > { > for (;;) { > local_irq_save() > printk_safe_enter() > > call_console_drivers() > printk() > vprintk_safe() >irq_work_queue() > > printk_safe_exit() > local_irq_restore() > > << irq work >> >printk_safe_flush() > printk() >} > } > preempt_enable() > == > > So CPU triggers printk() recursion in call_console_drivers(), > immediately after local_irq_restore() it executes printk_safe > flush irq_work, which adds even more pending messages to the > logbuf which it has to print. CPU hits printk() recursion again > while it prints pending messages, which results in even more > pending messages flushed from printk_safe via irq_work. And so > on. > > The idea is to delay printk_safe flush until CPU is in preemptible > context, so we won't lockup it up. > > The new behaviour is: > > == > printk() > preempt_disable() > console_unlock() > { > for (;;) { > local_irq_save() > printk_safe_enter() > > call_console_drivers() > printk() > vprintk_safe() >queue_work_on(smp_processor_id()) > > printk_safe_exit() > local_irq_restore() >} > } > preempt_enable() > > << work queue >> >printk_safe_flush() > printk() > == > > Since console_unlock() is called under preemption_disabled() now > we cannot schedule flush work as long as CPU is printing messages. > We only can flush printk_safe messages after CPU returns from > console_unlock() and enables preemption. > > This might look like with the delayed flush we increase the possibility > of lost printk_safe messages (printk_safe per-CPU buffers are pretty > limited in size). But prinkt_safe was never meant to be used for huge > amounts of data which can generate e.g. debugging of serial consoles, > etc. Its goal is to make recursive printk()-s less deadlock prone. > With this patch we add one more thing: avoid CPU lockup which might > be caused by printk_safe_flush(). I have two concerns about this approach: First, this delays showing eventually valuable information until the preemption is enabled. It might never happen if the system is in big troubles. In each case, it might be much longer delay than it was before. Second, it makes printk() dependent on another non-trivial subsystem. I mean workqueues. If I got it right, we want to limit the number of recursively added messages that might be printed() from inside console_unlock(), especially from console_drivers(). The following, a bit ugly, solution has came to my mind. We could think about it like extending the printk_context. It counts printks called in this context and does nothing when we reach the limit. The difference is that the context is task-specific instead of CPU-specific. I used only one compiler barrier which looks strange. Well, it is late Friday here and it is just a POC. I wonder what you think about this approach. Just a compile tested code: diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 2a7d04049af4..747098310179 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -23,6 +23,9 @@ #define PRINTK_NMI_CONTEXT_MASK 0x8000 extern raw_spinlock_t logbuf_lock; +extern struct task_struct *console_unlock_owner; +extern int console_recursion_count; +extern int console_recursion_limit; __printf(1, 0) int vprintk_default(const char *fmt, va_list args); __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index db4b9b8929eb..4508bb579c5b 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -81,6 +81,9 @@ EXPORT_SYMBOL(oops_in_progress); static DEFINE_SEMAPHORE(console_sem); struct console *console_drivers; EXPORT_SYMBOL_GPL(console_drivers); +struct task_struct *console_unlock_owner; +int console_recursion_count; +int console_recursion_limit = 100; #ifdef CONFIG_LOCKDEP static struct lockdep_map console_lock_dep_map = { @@ -2293,6 +2296,7 @@ void console_unlock(void) unsigned long flags; bool wake_klogd = false;