Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-02 Thread Petr Mladek
On Fri 2018-02-02 13:17:08, Petr Mladek wrote:
> On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote:
> > On (01/30/18 13:23), Petr Mladek wrote:
> > [..]
> > > > If the system is in "big troubles" then what makes irq_work more
> > > > possible? Local IRQs can stay disabled, just like preemption. I
> > > > guess when the troubles are really big our strategy is the same
> > > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > > and wait for panic()->flush.
> > > 
> > > But the patch still uses irq work because queue_work_on() could not
> > > be safely called from printk_safe(). By other words, it requires
> > > both irq_work and workqueues to be functional.
> > 
> > Right, that's all true. The reason it's done this way is because buffers can
> > be big and we still flush under console_sem in console_unlock() loop, which
> > can in theory be problematic. In other words, I wanted to remove the root
> > cause - irq flush of printk_safe while we are still in printing
> > loop.
> 
> Good point! We know that we would eventually push non-trivial amount
> of messages and it would make sense to do it from non-atomic context.
> 
> On the other hand, this does not solve the same problem with printk
> NMI buffer. And I guess that we do not want to risk offloading to
> workqueues for NMI messages.

By the above I want to say that I am not sure if it is worth having
different solution for flushing printk_safe and nmi buffer. We might
be forced to debug why a solution does not work. Shared solution and
shared issues might safe us some work.

I know that my alternative solution needed some extra code for
printk_safe as well. But it is pretty straightforward. It basically
just a check of a counter.

Note that the WQ-based solution is kind of tricky. There is
the journey printk() -> irq_work -> wq. Also it relies on
the fact that console_unlock() is called with disabled preemption.


> > > > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > > > printk_safe was never about "immediately flush the buffer", it was about
> > > > "avoid deadlocks", which was extended to "flush from any context which
> > > > will let us to avoid deadlock". It just happened that it inherited
> > > > irq_work dependency from printk_nmi.
> > > 
> > > I see the point. But if I remember correctly, it was also designed
> > > before we started to be concerned about a sudden death and "get
> > > printks out ASAP" mantra.
> > 
> > Can you elaborate a bit?

You probably knew the dates. Anyway, I want to say. Even when
the messages stored in the printk_safe buffer might be less
important, they still might be interesting and we might want
to get them out rather sooner than later.

All in all, it seems that we are basically down to the dilemma
between "get messages out ASAP" vs. "avoid softlockups".
We need to take into account:

   + printk_safe messages are more or less printk-subsystem
 specific. They might be "less" important for others.
 But who knows for sure?

   + the code complexity


With the above in mind, I still prefer the "easier" solution based
on the counter. If people had problems with softlockups,
we would need to solve them. But then they probably will have
this problem also with normal printk or in NMI and we would
probably need a solution that is not using workqueues.

Does this makes any sense?

Best Regards,
Petr


PS: I am sorry for so many mails today. Sometimes (often?), it is so
hard to understand and say things clearly. And it is Friday.


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-02 Thread Petr Mladek
On Fri 2018-02-02 13:17:08, Petr Mladek wrote:
> On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote:
> > On (01/30/18 13:23), Petr Mladek wrote:
> > [..]
> > > > If the system is in "big troubles" then what makes irq_work more
> > > > possible? Local IRQs can stay disabled, just like preemption. I
> > > > guess when the troubles are really big our strategy is the same
> > > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > > and wait for panic()->flush.
> > > 
> > > But the patch still uses irq work because queue_work_on() could not
> > > be safely called from printk_safe(). By other words, it requires
> > > both irq_work and workqueues to be functional.
> > 
> > Right, that's all true. The reason it's done this way is because buffers can
> > be big and we still flush under console_sem in console_unlock() loop, which
> > can in theory be problematic. In other words, I wanted to remove the root
> > cause - irq flush of printk_safe while we are still in printing
> > loop.
> 
> Good point! We know that we would eventually push non-trivial amount
> of messages and it would make sense to do it from non-atomic context.
> 
> On the other hand, this does not solve the same problem with printk
> NMI buffer. And I guess that we do not want to risk offloading to
> workqueues for NMI messages.

By the above I want to say that I am not sure if it is worth having
different solution for flushing printk_safe and nmi buffer. We might
be forced to debug why a solution does not work. Shared solution and
shared issues might safe us some work.

I know that my alternative solution needed some extra code for
printk_safe as well. But it is pretty straightforward. It basically
just a check of a counter.

Note that the WQ-based solution is kind of tricky. There is
the journey printk() -> irq_work -> wq. Also it relies on
the fact that console_unlock() is called with disabled preemption.


> > > > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > > > printk_safe was never about "immediately flush the buffer", it was about
> > > > "avoid deadlocks", which was extended to "flush from any context which
> > > > will let us to avoid deadlock". It just happened that it inherited
> > > > irq_work dependency from printk_nmi.
> > > 
> > > I see the point. But if I remember correctly, it was also designed
> > > before we started to be concerned about a sudden death and "get
> > > printks out ASAP" mantra.
> > 
> > Can you elaborate a bit?

You probably knew the dates. Anyway, I want to say. Even when
the messages stored in the printk_safe buffer might be less
important, they still might be interesting and we might want
to get them out rather sooner than later.

All in all, it seems that we are basically down to the dilemma
between "get messages out ASAP" vs. "avoid softlockups".
We need to take into account:

   + printk_safe messages are more or less printk-subsystem
 specific. They might be "less" important for others.
 But who knows for sure?

   + the code complexity


With the above in mind, I still prefer the "easier" solution based
on the counter. If people had problems with softlockups,
we would need to solve them. But then they probably will have
this problem also with normal printk or in NMI and we would
probably need a solution that is not using workqueues.

Does this makes any sense?

Best Regards,
Petr


PS: I am sorry for so many mails today. Sometimes (often?), it is so
hard to understand and say things clearly. And it is Friday.


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-02 Thread Petr Mladek
On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote:
> On (01/30/18 13:23), Petr Mladek wrote:
> [..]
> > > If the system is in "big troubles" then what makes irq_work more
> > > possible? Local IRQs can stay disabled, just like preemption. I
> > > guess when the troubles are really big our strategy is the same
> > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > and wait for panic()->flush.
> > 
> > But the patch still uses irq work because queue_work_on() could not
> > be safely called from printk_safe(). By other words, it requires
> > both irq_work and workqueues to be functional.
> 
> Right, that's all true. The reason it's done this way is because buffers can
> be big and we still flush under console_sem in console_unlock() loop, which
> can in theory be problematic. In other words, I wanted to remove the root
> cause - irq flush of printk_safe while we are still in printing
> loop.

Good point! We know that we would eventually push non-trivial amount
of messages and it would make sense to do it from non-atomic context.

On the other hand, this does not solve the same problem with printk
NMI buffer. And I guess that we do not want to risk offloading to
workqueues for NMI messages.


> > > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > > printk_safe was never about "immediately flush the buffer", it was about
> > > "avoid deadlocks", which was extended to "flush from any context which
> > > will let us to avoid deadlock". It just happened that it inherited
> > > irq_work dependency from printk_nmi.
> > 
> > I see the point. But if I remember correctly, it was also designed
> > before we started to be concerned about a sudden death and "get
> > printks out ASAP" mantra.
> 
> Can you elaborate a bit?

The pull request with printk_safe was sent on February 22, 2017, see
https://lkml.kernel.org/r/20170222114705.ga30...@linux.suse

The printk softlockup was still being solved by an immediate offload
from vprintk_emit() on March 29, 2017, see
https://lkml.kernel.org/r/20170329092511.3958-3-sergey.senozhat...@gmail.com

I believe that it was the mail from Pavel Machek that made us
thinking about the sudden death. It was sent on April 7, 2017,
see https://lkml.kernel.org/r/20170407120642.GB4756@amd

The first version with the offload from console_unlock was
sent on May 9, 2017, see
https://lkml.kernel.org/r/20170509082859.854-3-sergey.senozhat...@gmail.com

I am not exactly sure when the "get printks out ASAP" mantra started
but I cannot forget the mail from June 30, 2017, see
https://lkml.kernel.org/r/20170630070131.GA474@jagdpanzerIV.localdomain

Best Regards,
Petr


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-02 Thread Petr Mladek
On Thu 2018-02-01 11:46:47, Sergey Senozhatsky wrote:
> On (01/30/18 13:23), Petr Mladek wrote:
> [..]
> > > If the system is in "big troubles" then what makes irq_work more
> > > possible? Local IRQs can stay disabled, just like preemption. I
> > > guess when the troubles are really big our strategy is the same
> > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > and wait for panic()->flush.
> > 
> > But the patch still uses irq work because queue_work_on() could not
> > be safely called from printk_safe(). By other words, it requires
> > both irq_work and workqueues to be functional.
> 
> Right, that's all true. The reason it's done this way is because buffers can
> be big and we still flush under console_sem in console_unlock() loop, which
> can in theory be problematic. In other words, I wanted to remove the root
> cause - irq flush of printk_safe while we are still in printing
> loop.

Good point! We know that we would eventually push non-trivial amount
of messages and it would make sense to do it from non-atomic context.

On the other hand, this does not solve the same problem with printk
NMI buffer. And I guess that we do not want to risk offloading to
workqueues for NMI messages.


> > > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > > printk_safe was never about "immediately flush the buffer", it was about
> > > "avoid deadlocks", which was extended to "flush from any context which
> > > will let us to avoid deadlock". It just happened that it inherited
> > > irq_work dependency from printk_nmi.
> > 
> > I see the point. But if I remember correctly, it was also designed
> > before we started to be concerned about a sudden death and "get
> > printks out ASAP" mantra.
> 
> Can you elaborate a bit?

The pull request with printk_safe was sent on February 22, 2017, see
https://lkml.kernel.org/r/20170222114705.ga30...@linux.suse

The printk softlockup was still being solved by an immediate offload
from vprintk_emit() on March 29, 2017, see
https://lkml.kernel.org/r/20170329092511.3958-3-sergey.senozhat...@gmail.com

I believe that it was the mail from Pavel Machek that made us
thinking about the sudden death. It was sent on April 7, 2017,
see https://lkml.kernel.org/r/20170407120642.GB4756@amd

The first version with the offload from console_unlock was
sent on May 9, 2017, see
https://lkml.kernel.org/r/20170509082859.854-3-sergey.senozhat...@gmail.com

I am not exactly sure when the "get printks out ASAP" mantra started
but I cannot forget the mail from June 30, 2017, see
https://lkml.kernel.org/r/20170630070131.GA474@jagdpanzerIV.localdomain

Best Regards,
Petr


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-02 Thread Petr Mladek
On Fri 2018-02-02 10:07:20, Sergey Senozhatsky wrote:
> On (02/01/18 13:00), Steven Rostedt wrote:
> > On Mon, 29 Jan 2018 11:29:18 +0900
> > Sergey Senozhatsky  wrote:
> [..]
> > > If the system is in "big troubles" then what makes irq_work more
> > > possible? Local IRQs can stay disabled, just like preemption. I
> > > guess when the troubles are really big our strategy is the same
> > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > and wait for panic()->flush.
> > 
> > Working on the RT kernel, I would tell you there's a huge difference
> > getting an irq_work to trigger than to expect something to schedule.

And this is not only about scheduling. It is also about having
a worker ready to handle the work. So, there is yet another
level that can eventually fail.


> > But if printk_safe() is just for recursion protection, how important is
> > it to get out?

Good question!

> Well, it depends. printk_safe can protect us against... what should I
> call it... let's call it first order, or direct, printk recursion. The
> one which involve locks internal to print. For instance,
> 
>   vprintk_emit()
>spin_lock_irqsave(_lock)
> spin_lock_debug(...)
>  spin_dump()
>   printk()
>vprintk_emit()
> spin_lock_irqsave(_lock)  << deadlock
> 
> printk_safe will save us by redirecting spin_dump()->printk().
> 
> So printk_safe output is in general of some interest, but we don't
> have guarantees that it will be printed: if it was the direct printk
> recursion - then it's all good, if indirect - then it may not be good.

IMHO, the question is if the information about printk recursion would
help to understand the situation on the system or if it would just
add a noise to the original problem.

I would personally prefer to know about it. But I do not feel
experienced enough to make a generic decision.

Best Regards,
Petr


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-02 Thread Petr Mladek
On Fri 2018-02-02 10:07:20, Sergey Senozhatsky wrote:
> On (02/01/18 13:00), Steven Rostedt wrote:
> > On Mon, 29 Jan 2018 11:29:18 +0900
> > Sergey Senozhatsky  wrote:
> [..]
> > > If the system is in "big troubles" then what makes irq_work more
> > > possible? Local IRQs can stay disabled, just like preemption. I
> > > guess when the troubles are really big our strategy is the same
> > > for both wq and irq_work solutions - we keep the printk_safe buffer
> > > and wait for panic()->flush.
> > 
> > Working on the RT kernel, I would tell you there's a huge difference
> > getting an irq_work to trigger than to expect something to schedule.

And this is not only about scheduling. It is also about having
a worker ready to handle the work. So, there is yet another
level that can eventually fail.


> > But if printk_safe() is just for recursion protection, how important is
> > it to get out?

Good question!

> Well, it depends. printk_safe can protect us against... what should I
> call it... let's call it first order, or direct, printk recursion. The
> one which involve locks internal to print. For instance,
> 
>   vprintk_emit()
>spin_lock_irqsave(_lock)
> spin_lock_debug(...)
>  spin_dump()
>   printk()
>vprintk_emit()
> spin_lock_irqsave(_lock)  << deadlock
> 
> printk_safe will save us by redirecting spin_dump()->printk().
> 
> So printk_safe output is in general of some interest, but we don't
> have guarantees that it will be printed: if it was the direct printk
> recursion - then it's all good, if indirect - then it may not be good.

IMHO, the question is if the information about printk recursion would
help to understand the situation on the system or if it would just
add a noise to the original problem.

I would personally prefer to know about it. But I do not feel
experienced enough to make a generic decision.

Best Regards,
Petr


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-01 Thread Sergey Senozhatsky
On (02/01/18 13:00), Steven Rostedt wrote:
> On Mon, 29 Jan 2018 11:29:18 +0900
> Sergey Senozhatsky  wrote:
[..]
> > If the system is in "big troubles" then what makes irq_work more
> > possible? Local IRQs can stay disabled, just like preemption. I
> > guess when the troubles are really big our strategy is the same
> > for both wq and irq_work solutions - we keep the printk_safe buffer
> > and wait for panic()->flush.
> 
> Working on the RT kernel, I would tell you there's a huge difference
> getting an irq_work to trigger than to expect something to schedule.
> 
> It takes much less to disable the systems scheduling than it does to
> disable interrupts. If interrupts are disabled, then so is scheduling.
> But if scheduling is disabled, interrupts may still trigger.

Sure, I understand those points and don't object any of those. Just
weighing the risks and benefits.

> But if printk_safe() is just for recursion protection, how important is
> it to get out?

Well, it depends. printk_safe can protect us against... what should I
call it... let's call it first order, or direct, printk recursion. The
one which involve locks internal to print. For instance,

vprintk_emit()
 spin_lock_irqsave(_lock)
  spin_lock_debug(...)
   spin_dump()
printk()
 vprintk_emit()
  spin_lock_irqsave(_lock)  << deadlock

printk_safe will save us by redirecting spin_dump()->printk().

printk_safe cannot protect us against second order, or indirect,
printk recursion. The one that involves locks which are external
to printk. In case of indirect recursion we can store lockdep report
in printk_safe, so we will not immediately recurse back to printk,
but in general it does not mean that we will be actually safe from
any deadlocks. For instance,

   timekeeping
spin_lock_irqsave timer base
 printk
  call_console_drivers
   foo_console_driver
mod_timer
 spin_lock_irqsave timer base   << deadlock

printk_safe will not save us here, and most likely irq_work will
not happen on this CPU, because we deadlock with IRQs disabled.

So printk_safe output is in general of some interest, but we don't
have guarantees that it will be printed: if it was the direct printk
recursion - then it's all good, if indirect - then it may not be good.

-ss


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-01 Thread Sergey Senozhatsky
On (02/01/18 13:00), Steven Rostedt wrote:
> On Mon, 29 Jan 2018 11:29:18 +0900
> Sergey Senozhatsky  wrote:
[..]
> > If the system is in "big troubles" then what makes irq_work more
> > possible? Local IRQs can stay disabled, just like preemption. I
> > guess when the troubles are really big our strategy is the same
> > for both wq and irq_work solutions - we keep the printk_safe buffer
> > and wait for panic()->flush.
> 
> Working on the RT kernel, I would tell you there's a huge difference
> getting an irq_work to trigger than to expect something to schedule.
> 
> It takes much less to disable the systems scheduling than it does to
> disable interrupts. If interrupts are disabled, then so is scheduling.
> But if scheduling is disabled, interrupts may still trigger.

Sure, I understand those points and don't object any of those. Just
weighing the risks and benefits.

> But if printk_safe() is just for recursion protection, how important is
> it to get out?

Well, it depends. printk_safe can protect us against... what should I
call it... let's call it first order, or direct, printk recursion. The
one which involve locks internal to print. For instance,

vprintk_emit()
 spin_lock_irqsave(_lock)
  spin_lock_debug(...)
   spin_dump()
printk()
 vprintk_emit()
  spin_lock_irqsave(_lock)  << deadlock

printk_safe will save us by redirecting spin_dump()->printk().

printk_safe cannot protect us against second order, or indirect,
printk recursion. The one that involves locks which are external
to printk. In case of indirect recursion we can store lockdep report
in printk_safe, so we will not immediately recurse back to printk,
but in general it does not mean that we will be actually safe from
any deadlocks. For instance,

   timekeeping
spin_lock_irqsave timer base
 printk
  call_console_drivers
   foo_console_driver
mod_timer
 spin_lock_irqsave timer base   << deadlock

printk_safe will not save us here, and most likely irq_work will
not happen on this CPU, because we deadlock with IRQs disabled.

So printk_safe output is in general of some interest, but we don't
have guarantees that it will be printed: if it was the direct printk
recursion - then it's all good, if indirect - then it may not be good.

-ss


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-01 Thread Steven Rostedt
On Mon, 29 Jan 2018 11:29:18 +0900
Sergey Senozhatsky  wrote:

> On (01/26/18 16:26), Petr Mladek wrote:
> [..]
> > First, this delays showing eventually valuable information until
> > the preemption is enabled. It might never happen if the system
> > is in big troubles. In each case, it might be much longer delay
> > than it was before.  
> 
> If the system is in "big troubles" then what makes irq_work more
> possible? Local IRQs can stay disabled, just like preemption. I
> guess when the troubles are really big our strategy is the same
> for both wq and irq_work solutions - we keep the printk_safe buffer
> and wait for panic()->flush.

Working on the RT kernel, I would tell you there's a huge difference
getting an irq_work to trigger than to expect something to schedule.

It takes much less to disable the systems scheduling than it does to
disable interrupts. If interrupts are disabled, then so is scheduling.
But if scheduling is disabled, interrupts may still trigger.

But if printk_safe() is just for recursion protection, how important is
it to get out?

-- Steve


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-02-01 Thread Steven Rostedt
On Mon, 29 Jan 2018 11:29:18 +0900
Sergey Senozhatsky  wrote:

> On (01/26/18 16:26), Petr Mladek wrote:
> [..]
> > First, this delays showing eventually valuable information until
> > the preemption is enabled. It might never happen if the system
> > is in big troubles. In each case, it might be much longer delay
> > than it was before.  
> 
> If the system is in "big troubles" then what makes irq_work more
> possible? Local IRQs can stay disabled, just like preemption. I
> guess when the troubles are really big our strategy is the same
> for both wq and irq_work solutions - we keep the printk_safe buffer
> and wait for panic()->flush.

Working on the RT kernel, I would tell you there's a huge difference
getting an irq_work to trigger than to expect something to schedule.

It takes much less to disable the systems scheduling than it does to
disable interrupts. If interrupts are disabled, then so is scheduling.
But if scheduling is disabled, interrupts may still trigger.

But if printk_safe() is just for recursion protection, how important is
it to get out?

-- Steve


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-01-31 Thread Sergey Senozhatsky
On (01/30/18 13:23), Petr Mladek wrote:
[..]
> > If the system is in "big troubles" then what makes irq_work more
> > possible? Local IRQs can stay disabled, just like preemption. I
> > guess when the troubles are really big our strategy is the same
> > for both wq and irq_work solutions - we keep the printk_safe buffer
> > and wait for panic()->flush.
> 
> But the patch still uses irq work because queue_work_on() could not
> be safely called from printk_safe(). By other words, it requires
> both irq_work and workqueues to be functional.

Right, that's all true. The reason it's done this way is because buffers can
be big and we still flush under console_sem in console_unlock() loop, which
can in theory be problematic. In other words, I wanted to remove the root
cause - irq flush of printk_safe while we are still in printing loop.
Technically, we minimize the probability by throttling down printk_safe flush,
but we don't eliminate the possibility entirely. Maybe it is good enough,
maybe not. Opinions?

[..]
> > `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
> > a bit useless and hard to understand - despite its value we will
> > store only 100 lines.
> >
> > We probably can replace `console_recursion_limit' with the following:
> > - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
> >   chars to be stored in printk-safe buffer and, once we reached the limit,
> >   don't append any new messages until we are out of `console_recursion'
> >   context. Which is somewhat close to wq solution, the difference is that
> >   printk_safe can happen earlier if local IRQs are enabled.

  ^ printk_safe flush

> I like this idea. It would actually make perfect sense to use the same
> limit for PRINTK_SAFE buffer size and for the printk recursion.

Yes, we probably can do it that way, but this thing

" They both should be big enough to "

is a bit of a concern. The "big enough to" can lead to different things.

> > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > printk_safe was never about "immediately flush the buffer", it was about
> > "avoid deadlocks", which was extended to "flush from any context which
> > will let us to avoid deadlock". It just happened that it inherited
> > irq_work dependency from printk_nmi.
> 
> I see the point. But if I remember correctly, it was also designed
> before we started to be concerned about a sudden death and "get
> printks out ASAP" mantra.

Can you elaborate a bit?

-ss


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-01-31 Thread Sergey Senozhatsky
On (01/30/18 13:23), Petr Mladek wrote:
[..]
> > If the system is in "big troubles" then what makes irq_work more
> > possible? Local IRQs can stay disabled, just like preemption. I
> > guess when the troubles are really big our strategy is the same
> > for both wq and irq_work solutions - we keep the printk_safe buffer
> > and wait for panic()->flush.
> 
> But the patch still uses irq work because queue_work_on() could not
> be safely called from printk_safe(). By other words, it requires
> both irq_work and workqueues to be functional.

Right, that's all true. The reason it's done this way is because buffers can
be big and we still flush under console_sem in console_unlock() loop, which
can in theory be problematic. In other words, I wanted to remove the root
cause - irq flush of printk_safe while we are still in printing loop.
Technically, we minimize the probability by throttling down printk_safe flush,
but we don't eliminate the possibility entirely. Maybe it is good enough,
maybe not. Opinions?

[..]
> > `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
> > a bit useless and hard to understand - despite its value we will
> > store only 100 lines.
> >
> > We probably can replace `console_recursion_limit' with the following:
> > - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
> >   chars to be stored in printk-safe buffer and, once we reached the limit,
> >   don't append any new messages until we are out of `console_recursion'
> >   context. Which is somewhat close to wq solution, the difference is that
> >   printk_safe can happen earlier if local IRQs are enabled.

  ^ printk_safe flush

> I like this idea. It would actually make perfect sense to use the same
> limit for PRINTK_SAFE buffer size and for the printk recursion.

Yes, we probably can do it that way, but this thing

" They both should be big enough to "

is a bit of a concern. The "big enough to" can lead to different things.

> > I guess I'm OK with the wq dependency after all, but I may be mistaken.
> > printk_safe was never about "immediately flush the buffer", it was about
> > "avoid deadlocks", which was extended to "flush from any context which
> > will let us to avoid deadlock". It just happened that it inherited
> > irq_work dependency from printk_nmi.
> 
> I see the point. But if I remember correctly, it was also designed
> before we started to be concerned about a sudden death and "get
> printks out ASAP" mantra.

Can you elaborate a bit?

-ss


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-01-30 Thread Petr Mladek
On Mon 2018-01-29 11:29:18, Sergey Senozhatsky wrote:
> On (01/26/18 16:26), Petr Mladek wrote:
> [..]
> > First, this delays showing eventually valuable information until
> > the preemption is enabled. It might never happen if the system
> > is in big troubles. In each case, it might be much longer delay
> > than it was before.
> 
> If the system is in "big troubles" then what makes irq_work more
> possible? Local IRQs can stay disabled, just like preemption. I
> guess when the troubles are really big our strategy is the same
> for both wq and irq_work solutions - we keep the printk_safe buffer
> and wait for panic()->flush.

But the patch still uses irq work because queue_work_on() could not
be safely called from printk_safe(). By other words, it requires
both irq_work and workqueues to be functional. Note that there
might be deadlock or livelock in the workqueues subsystem. It is
just another non-trivial thingy that might get wrong.

Also interrupts are handled immediately when they are enabled.
On the other hand, a workqueue work is proceed only when the worker
is scheduled and the work is first in the queue. It might
take ages if there is a high load on the CPU or on the given
workqueue.


> > Second, it makes printk() dependent on another non-trivial subsystem.
> > I mean workqueues.
> [..]
> > The following, a bit ugly, solution has came to my mind. We could
> > think about it like extending the printk_context. It counts
> > printks called in this context and does nothing when we reach
> > the limit. The difference is that the context is task-specific
> > instead of CPU-specific.
> [..]
> > +int console_recursion_count;
> > +int console_recursion_limit = 100;
> 
> Hm... I'm not entirely happy with magic constants, to be honest.
> Why 100? One of the printk_safe lockdep reports I saw was ~270
> lines long: https://marc.info/?l=linux-kernel=150659041411473=2

I am not happy with this constant either. It was used just for
a simplicity of the RFC.


> `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
> a bit useless and hard to understand - despite its value we will
> store only 100 lines.
>
> We probably can replace `console_recursion_limit' with the following:
> - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
>   chars to be stored in printk-safe buffer and, once we reached the limit,
>   don't append any new messages until we are out of `console_recursion'
>   context. Which is somewhat close to wq solution, the difference is that
>   printk_safe can happen earlier if local IRQs are enabled.

I like this idea. It would actually make perfect sense to use the same
limit for PRINTK_SAFE buffer size and for the printk recursion.

They both should be big enough to allow a meaningful report. On
the other hand, they both should be relatively small. One because
of memory constrains, the other because of reducing redundancy.

In each case, there is a direct dependency. The recursive messages
are stored into the printk_safe buffer.


>   same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to
>   still cause troubles, just because printk-deadlock errors sound scary
>   and important enough.

We could always make it more complicated if people come up with
a reasonable use case. IMHO, most people will not care about
these limits.


> I guess I'm OK with the wq dependency after all, but I may be mistaken.
> printk_safe was never about "immediately flush the buffer", it was about
> "avoid deadlocks", which was extended to "flush from any context which
> will let us to avoid deadlock". It just happened that it inherited
> irq_work dependency from printk_nmi.

I see the point. But if I remember correctly, it was also designed
before we started to be concerned about a sudden death and "get
printks out ASAP" mantra.

Best Regards,
Petr


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-01-30 Thread Petr Mladek
On Mon 2018-01-29 11:29:18, Sergey Senozhatsky wrote:
> On (01/26/18 16:26), Petr Mladek wrote:
> [..]
> > First, this delays showing eventually valuable information until
> > the preemption is enabled. It might never happen if the system
> > is in big troubles. In each case, it might be much longer delay
> > than it was before.
> 
> If the system is in "big troubles" then what makes irq_work more
> possible? Local IRQs can stay disabled, just like preemption. I
> guess when the troubles are really big our strategy is the same
> for both wq and irq_work solutions - we keep the printk_safe buffer
> and wait for panic()->flush.

But the patch still uses irq work because queue_work_on() could not
be safely called from printk_safe(). By other words, it requires
both irq_work and workqueues to be functional. Note that there
might be deadlock or livelock in the workqueues subsystem. It is
just another non-trivial thingy that might get wrong.

Also interrupts are handled immediately when they are enabled.
On the other hand, a workqueue work is proceed only when the worker
is scheduled and the work is first in the queue. It might
take ages if there is a high load on the CPU or on the given
workqueue.


> > Second, it makes printk() dependent on another non-trivial subsystem.
> > I mean workqueues.
> [..]
> > The following, a bit ugly, solution has came to my mind. We could
> > think about it like extending the printk_context. It counts
> > printks called in this context and does nothing when we reach
> > the limit. The difference is that the context is task-specific
> > instead of CPU-specific.
> [..]
> > +int console_recursion_count;
> > +int console_recursion_limit = 100;
> 
> Hm... I'm not entirely happy with magic constants, to be honest.
> Why 100? One of the printk_safe lockdep reports I saw was ~270
> lines long: https://marc.info/?l=linux-kernel=150659041411473=2

I am not happy with this constant either. It was used just for
a simplicity of the RFC.


> `console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
> a bit useless and hard to understand - despite its value we will
> store only 100 lines.
>
> We probably can replace `console_recursion_limit' with the following:
> - in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
>   chars to be stored in printk-safe buffer and, once we reached the limit,
>   don't append any new messages until we are out of `console_recursion'
>   context. Which is somewhat close to wq solution, the difference is that
>   printk_safe can happen earlier if local IRQs are enabled.

I like this idea. It would actually make perfect sense to use the same
limit for PRINTK_SAFE buffer size and for the printk recursion.

They both should be big enough to allow a meaningful report. On
the other hand, they both should be relatively small. One because
of memory constrains, the other because of reducing redundancy.

In each case, there is a direct dependency. The recursive messages
are stored into the printk_safe buffer.


>   same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to
>   still cause troubles, just because printk-deadlock errors sound scary
>   and important enough.

We could always make it more complicated if people come up with
a reasonable use case. IMHO, most people will not care about
these limits.


> I guess I'm OK with the wq dependency after all, but I may be mistaken.
> printk_safe was never about "immediately flush the buffer", it was about
> "avoid deadlocks", which was extended to "flush from any context which
> will let us to avoid deadlock". It just happened that it inherited
> irq_work dependency from printk_nmi.

I see the point. But if I remember correctly, it was also designed
before we started to be concerned about a sudden death and "get
printks out ASAP" mantra.

Best Regards,
Petr


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-01-28 Thread Sergey Senozhatsky
On (01/26/18 16:26), Petr Mladek wrote:
[..]
> First, this delays showing eventually valuable information until
> the preemption is enabled. It might never happen if the system
> is in big troubles. In each case, it might be much longer delay
> than it was before.

If the system is in "big troubles" then what makes irq_work more
possible? Local IRQs can stay disabled, just like preemption. I
guess when the troubles are really big our strategy is the same
for both wq and irq_work solutions - we keep the printk_safe buffer
and wait for panic()->flush.

> Second, it makes printk() dependent on another non-trivial subsystem.
> I mean workqueues.
[..]
> The following, a bit ugly, solution has came to my mind. We could
> think about it like extending the printk_context. It counts
> printks called in this context and does nothing when we reach
> the limit. The difference is that the context is task-specific
> instead of CPU-specific.
[..]
> +int console_recursion_count;
> +int console_recursion_limit = 100;

Hm... I'm not entirely happy with magic constants, to be honest.
Why 100? One of the printk_safe lockdep reports I saw was ~270
lines long: https://marc.info/?l=linux-kernel=150659041411473=2

`console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
a bit useless and hard to understand - despite its value we will
store only 100 lines.

We probably can replace `console_recursion_limit' with the following:
- in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
  chars to be stored in printk-safe buffer and, once we reached the limit,
  don't append any new messages until we are out of `console_recursion'
  context. Which is somewhat close to wq solution, the difference is that
  printk_safe can happen earlier if local IRQs are enabled. But at the
  same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to
  still cause troubles, just because printk-deadlock errors sound scary
  and important enough.

I guess I'm OK with the wq dependency after all, but I may be mistaken.
printk_safe was never about "immediately flush the buffer", it was about
"avoid deadlocks", which was extended to "flush from any context which
will let us to avoid deadlock". It just happened that it inherited
irq_work dependency from printk_nmi.

We also probably can add printk_safe flush to some sysrq handler
may be.

> PS: I answered this mail because the original discussion[1] has
> already been way too long and covered many other issues.

Yep, that's I started it as a new discussion.

-ss


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-01-28 Thread Sergey Senozhatsky
On (01/26/18 16:26), Petr Mladek wrote:
[..]
> First, this delays showing eventually valuable information until
> the preemption is enabled. It might never happen if the system
> is in big troubles. In each case, it might be much longer delay
> than it was before.

If the system is in "big troubles" then what makes irq_work more
possible? Local IRQs can stay disabled, just like preemption. I
guess when the troubles are really big our strategy is the same
for both wq and irq_work solutions - we keep the printk_safe buffer
and wait for panic()->flush.

> Second, it makes printk() dependent on another non-trivial subsystem.
> I mean workqueues.
[..]
> The following, a bit ugly, solution has came to my mind. We could
> think about it like extending the printk_context. It counts
> printks called in this context and does nothing when we reach
> the limit. The difference is that the context is task-specific
> instead of CPU-specific.
[..]
> +int console_recursion_count;
> +int console_recursion_limit = 100;

Hm... I'm not entirely happy with magic constants, to be honest.
Why 100? One of the printk_safe lockdep reports I saw was ~270
lines long: https://marc.info/?l=linux-kernel=150659041411473=2

`console_recursion_limit' also makes PRINTK_SAFE_LOG_BUF_SHIFT
a bit useless and hard to understand - despite its value we will
store only 100 lines.

We probably can replace `console_recursion_limit' with the following:
- in the current `console_recursion' section we let only SAFE_LOG_BUF_LEN
  chars to be stored in printk-safe buffer and, once we reached the limit,
  don't append any new messages until we are out of `console_recursion'
  context. Which is somewhat close to wq solution, the difference is that
  printk_safe can happen earlier if local IRQs are enabled. But at the
  same time someone might set PRINTK_SAFE_LOG_BUF_SHIFT big enough to
  still cause troubles, just because printk-deadlock errors sound scary
  and important enough.

I guess I'm OK with the wq dependency after all, but I may be mistaken.
printk_safe was never about "immediately flush the buffer", it was about
"avoid deadlocks", which was extended to "flush from any context which
will let us to avoid deadlock". It just happened that it inherited
irq_work dependency from printk_nmi.

We also probably can add printk_safe flush to some sysrq handler
may be.

> PS: I answered this mail because the original discussion[1] has
> already been way too long and covered many other issues.

Yep, that's I started it as a new discussion.

-ss


Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-01-27 Thread Petr Mladek
On Wed 2018-01-24 18:37:23, Sergey Senozhatsky wrote:
> Tejun Heo reported that net_console can cause recursive printk()-s
> from call_console_drivers() (under OOM/etc.) and this can lockup that
> CPU, because CPU calls call_console_drivers() to print pending logbufs,
> but recursive printk() issued by one of console drivers adds several
> new messages to the logbuf via recursive printk().

> Note, this is not a printk_safe problem. In fact printk_safe helps
> us to address the issue here and to rate-limit recursive printk()-s.
> 
> Schematically, what we have at the moment is:
> 
> ==
> printk()
>  preempt_disable()
>   console_unlock()
>   {
> for (;;) {
>  local_irq_save()
>  printk_safe_enter()
> 
>  call_console_drivers()
>  printk()
>   vprintk_safe()
>irq_work_queue()
> 
>  printk_safe_exit()
>  local_irq_restore()
> 
>  << irq work >>
>printk_safe_flush()
>  printk()
>}
>   }
>  preempt_enable()
> ==
> 
> So CPU triggers printk() recursion in call_console_drivers(),
> immediately after local_irq_restore() it executes printk_safe
> flush irq_work, which adds even more pending messages to the
> logbuf which it has to print. CPU hits printk() recursion again
> while it prints pending messages, which results in even more
> pending messages flushed from printk_safe via irq_work. And so
> on.
> 
> The idea is to delay printk_safe flush until CPU is in preemptible
> context, so we won't lockup it up.
> 
> The new behaviour is:
> 
> ==
> printk()
>  preempt_disable()
>   console_unlock()
>   {
> for (;;) {
>  local_irq_save()
>  printk_safe_enter()
> 
>  call_console_drivers()
>  printk()
>   vprintk_safe()
>queue_work_on(smp_processor_id())
> 
>  printk_safe_exit()
>  local_irq_restore()
>}
>   }
>  preempt_enable()
> 
>  << work queue >>
>printk_safe_flush()
>  printk()
> ==
> 
> Since console_unlock() is called under preemption_disabled() now
> we cannot schedule flush work as long as CPU is printing messages.
> We only can flush printk_safe messages after CPU returns from
> console_unlock() and enables preemption.
> 
> This might look like with the delayed flush we increase the possibility
> of lost printk_safe messages (printk_safe per-CPU buffers are pretty
> limited in size). But prinkt_safe was never meant to be used for huge
> amounts of data which can generate e.g. debugging of serial consoles,
> etc. Its goal is to make recursive printk()-s less deadlock prone.
> With this patch we add one more thing: avoid CPU lockup which might
> be caused by printk_safe_flush().

I have two concerns about this approach:

First, this delays showing eventually valuable information until
the preemption is enabled. It might never happen if the system
is in big troubles. In each case, it might be much longer delay
than it was before.

Second, it makes printk() dependent on another non-trivial subsystem.
I mean workqueues.

If I got it right, we want to limit the number of recursively added
messages that might be printed() from inside console_unlock(),
especially from console_drivers().


The following, a bit ugly, solution has came to my mind. We could
think about it like extending the printk_context. It counts
printks called in this context and does nothing when we reach
the limit. The difference is that the context is task-specific
instead of CPU-specific.

I used only one compiler barrier which looks strange. Well, it
is late Friday here and it is just a POC. I wonder what you
think about this approach.


Just a compile tested code:

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2a7d04049af4..747098310179 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -23,6 +23,9 @@
 #define PRINTK_NMI_CONTEXT_MASK 0x8000
 
 extern raw_spinlock_t logbuf_lock;
+extern struct task_struct *console_unlock_owner;
+extern int console_recursion_count;
+extern int console_recursion_limit;
 
 __printf(1, 0) int vprintk_default(const char *fmt, va_list args);
 __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index db4b9b8929eb..4508bb579c5b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -81,6 +81,9 @@ EXPORT_SYMBOL(oops_in_progress);
 static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
+struct task_struct *console_unlock_owner;
+int console_recursion_count;
+int console_recursion_limit = 100;
 
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
@@ -2293,6 +2296,7 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;

Re: [RFC][PATCH] printk: do not flush printk_safe from irq_work

2018-01-27 Thread Petr Mladek
On Wed 2018-01-24 18:37:23, Sergey Senozhatsky wrote:
> Tejun Heo reported that net_console can cause recursive printk()-s
> from call_console_drivers() (under OOM/etc.) and this can lockup that
> CPU, because CPU calls call_console_drivers() to print pending logbufs,
> but recursive printk() issued by one of console drivers adds several
> new messages to the logbuf via recursive printk().

> Note, this is not a printk_safe problem. In fact printk_safe helps
> us to address the issue here and to rate-limit recursive printk()-s.
> 
> Schematically, what we have at the moment is:
> 
> ==
> printk()
>  preempt_disable()
>   console_unlock()
>   {
> for (;;) {
>  local_irq_save()
>  printk_safe_enter()
> 
>  call_console_drivers()
>  printk()
>   vprintk_safe()
>irq_work_queue()
> 
>  printk_safe_exit()
>  local_irq_restore()
> 
>  << irq work >>
>printk_safe_flush()
>  printk()
>}
>   }
>  preempt_enable()
> ==
> 
> So CPU triggers printk() recursion in call_console_drivers(),
> immediately after local_irq_restore() it executes printk_safe
> flush irq_work, which adds even more pending messages to the
> logbuf which it has to print. CPU hits printk() recursion again
> while it prints pending messages, which results in even more
> pending messages flushed from printk_safe via irq_work. And so
> on.
> 
> The idea is to delay printk_safe flush until CPU is in preemptible
> context, so we won't lockup it up.
> 
> The new behaviour is:
> 
> ==
> printk()
>  preempt_disable()
>   console_unlock()
>   {
> for (;;) {
>  local_irq_save()
>  printk_safe_enter()
> 
>  call_console_drivers()
>  printk()
>   vprintk_safe()
>queue_work_on(smp_processor_id())
> 
>  printk_safe_exit()
>  local_irq_restore()
>}
>   }
>  preempt_enable()
> 
>  << work queue >>
>printk_safe_flush()
>  printk()
> ==
> 
> Since console_unlock() is called under preemption_disabled() now
> we cannot schedule flush work as long as CPU is printing messages.
> We only can flush printk_safe messages after CPU returns from
> console_unlock() and enables preemption.
> 
> This might look like with the delayed flush we increase the possibility
> of lost printk_safe messages (printk_safe per-CPU buffers are pretty
> limited in size). But prinkt_safe was never meant to be used for huge
> amounts of data which can generate e.g. debugging of serial consoles,
> etc. Its goal is to make recursive printk()-s less deadlock prone.
> With this patch we add one more thing: avoid CPU lockup which might
> be caused by printk_safe_flush().

I have two concerns about this approach:

First, this delays showing eventually valuable information until
the preemption is enabled. It might never happen if the system
is in big troubles. In each case, it might be much longer delay
than it was before.

Second, it makes printk() dependent on another non-trivial subsystem.
I mean workqueues.

If I got it right, we want to limit the number of recursively added
messages that might be printed() from inside console_unlock(),
especially from console_drivers().


The following, a bit ugly, solution has came to my mind. We could
think about it like extending the printk_context. It counts
printks called in this context and does nothing when we reach
the limit. The difference is that the context is task-specific
instead of CPU-specific.

I used only one compiler barrier which looks strange. Well, it
is late Friday here and it is just a POC. I wonder what you
think about this approach.


Just a compile tested code:

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2a7d04049af4..747098310179 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -23,6 +23,9 @@
 #define PRINTK_NMI_CONTEXT_MASK 0x8000
 
 extern raw_spinlock_t logbuf_lock;
+extern struct task_struct *console_unlock_owner;
+extern int console_recursion_count;
+extern int console_recursion_limit;
 
 __printf(1, 0) int vprintk_default(const char *fmt, va_list args);
 __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index db4b9b8929eb..4508bb579c5b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -81,6 +81,9 @@ EXPORT_SYMBOL(oops_in_progress);
 static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
+struct task_struct *console_unlock_owner;
+int console_recursion_count;
+int console_recursion_limit = 100;
 
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
@@ -2293,6 +2296,7 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;