Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-05-09 Thread Sergey Senozhatsky
Hi,

Move printk and (some of) MM people to the recipients list.

On (01/10/18 09:02), Tejun Heo wrote:
[..]
> The particular case that we've been seeing regularly in the fleet was
> the following scenario.
> 
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>/ driver tries to allocate memory and then fail, which in turn
>triggers allocation failure or other warning messages.  printk was
>already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>shrinking.  Because OOM handler is trapped in printk flushing, it
>never manages to free memory and no one else can enter OOM path
>either, so the system is trapped in this state.

Tejun, we have a theory [since there are no logs available] that what
you are looking at is something as follows:

console_unlock()
{
  for (;;) {
   call_console_drivers()
 kmalloc()/etc/* netconsole, skb kmalloc(), for instance */
  __alloc_pages_slowpath()
warn_alloc()  /* a bunch of printk() -> log_store() */
  }
}

Now, warn_alloc() is rate limited to
DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST

so net console driver can add 10 warn_alloc() reports every 5 seconds to
the logbuf.

You have a "super slow" IPMI console and net console. So for every
logbuf entry we do:

console_unlock()
{
  for (;;) {
call_console_drivers(msg) -> IPMI_write()
call_console_drivers(msg) -> netconsole_write() -> skb kmalloc() -> 
warn_alloc() -> ratelimit
  }
}

IPMI_write() is very slow, as you have noted, so it consumes time
printing messages, simultaneously warn_alloc() rate limit depends on
time. *Probably*, slow IPMI_write() is unable to flush 10 warn_alloc()
reports under 5 seconds, which gives net console a chance to add another
10 warn_alloc()-s, while the previous 10 warn_alloc()-s have not been
flushed yet.

It seems that DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST
warn_alloc() rate limit is too permissive for your setup.

Can you confirm that the theory is actually correct?

If it is correct, then can we simply tweak warn_alloc() rate limit?
Say, make it x2 / x4 / etc. times less verbose? E.g. "up to 5 warn_alloc()-s
every 10 seconds"? What do MM folks think?

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-05-09 Thread Sergey Senozhatsky
Hi,

Move printk and (some of) MM people to the recipients list.

On (01/10/18 09:02), Tejun Heo wrote:
[..]
> The particular case that we've been seeing regularly in the fleet was
> the following scenario.
> 
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>/ driver tries to allocate memory and then fail, which in turn
>triggers allocation failure or other warning messages.  printk was
>already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>shrinking.  Because OOM handler is trapped in printk flushing, it
>never manages to free memory and no one else can enter OOM path
>either, so the system is trapped in this state.

Tejun, we have a theory [since there are no logs available] that what
you are looking at is something as follows:

console_unlock()
{
  for (;;) {
   call_console_drivers()
 kmalloc()/etc/* netconsole, skb kmalloc(), for instance */
  __alloc_pages_slowpath()
warn_alloc()  /* a bunch of printk() -> log_store() */
  }
}

Now, warn_alloc() is rate limited to
DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST

so net console driver can add 10 warn_alloc() reports every 5 seconds to
the logbuf.

You have a "super slow" IPMI console and net console. So for every
logbuf entry we do:

console_unlock()
{
  for (;;) {
call_console_drivers(msg) -> IPMI_write()
call_console_drivers(msg) -> netconsole_write() -> skb kmalloc() -> 
warn_alloc() -> ratelimit
  }
}

IPMI_write() is very slow, as you have noted, so it consumes time
printing messages, simultaneously warn_alloc() rate limit depends on
time. *Probably*, slow IPMI_write() is unable to flush 10 warn_alloc()
reports under 5 seconds, which gives net console a chance to add another
10 warn_alloc()-s, while the previous 10 warn_alloc()-s have not been
flushed yet.

It seems that DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST
warn_alloc() rate limit is too permissive for your setup.

Can you confirm that the theory is actually correct?

If it is correct, then can we simply tweak warn_alloc() rate limit?
Say, make it x2 / x4 / etc. times less verbose? E.g. "up to 5 warn_alloc()-s
every 10 seconds"? What do MM folks think?

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-04-22 Thread Sergey Senozhatsky
On (01/23/18 07:43), Tejun Heo wrote:
> > 
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.

Hello Tejun,

I'm willing to take a look at those printk()-s from console drivers.
Any chance you can send me some of the backtraces you see [the most
common/disturbing]?

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-04-22 Thread Sergey Senozhatsky
On (01/23/18 07:43), Tejun Heo wrote:
> > 
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.

Hello Tejun,

I'm willing to take a look at those printk()-s from console drivers.
Any chance you can send me some of the backtraces you see [the most
common/disturbing]?

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-24 Thread Tejun Heo
Hello, Peter.

On Wed, Jan 24, 2018 at 10:36:07AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> > 1. Console is IPMI emulated serial console.  Super slow.  Also
> >netconsole is in use.
> 
> So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had
> trouble like that (granted I don't typically trigger OOM storms, but
> they do occasionally happen).
> 
> Is your IPMI much slower and not fixable to be faster?

It looks like the latest machines have the baud rate at 57600 and I'm
pretty sure we have a lot of slower ones.  57600 isn't 9600 but is
still slow enough to get into trouble often enough.  There are a huge
number of machines running all sorts of things under heavy load and
trying to rapidly deploy new kernels / features contributes to
encountering bugs and weird corner cases.

UART can run a lot faster and I have no idea why IPMI consoles behave
as if they were connected over mile-long DB9 cables.  Maybe we can
convince hardware people to improve it but, even if that happened
today, we'd still be looking at years of dealing with slower ones, and
IPMI situation here is likely better than what many others are facing.

idk, it's not a particularly difficult problem to solve from kernel
side.  Just need to figure out a better / more robust trade-off.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-24 Thread Tejun Heo
Hello, Peter.

On Wed, Jan 24, 2018 at 10:36:07AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> > 1. Console is IPMI emulated serial console.  Super slow.  Also
> >netconsole is in use.
> 
> So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had
> trouble like that (granted I don't typically trigger OOM storms, but
> they do occasionally happen).
> 
> Is your IPMI much slower and not fixable to be faster?

It looks like the latest machines have the baud rate at 57600 and I'm
pretty sure we have a lot of slower ones.  57600 isn't 9600 but is
still slow enough to get into trouble often enough.  There are a huge
number of machines running all sorts of things under heavy load and
trying to rapidly deploy new kernels / features contributes to
encountering bugs and weird corner cases.

UART can run a lot faster and I have no idea why IPMI consoles behave
as if they were connected over mile-long DB9 cables.  Maybe we can
convince hardware people to improve it but, even if that happened
today, we'd still be looking at years of dealing with slower ones, and
IPMI situation here is likely better than what many others are facing.

idk, it's not a particularly difficult problem to solve from kernel
side.  Just need to figure out a better / more robust trade-off.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-24 Thread Peter Zijlstra
On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>netconsole is in use.

So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had
trouble like that (granted I don't typically trigger OOM storms, but
they do occasionally happen).

Is your IPMI much slower and not fixable to be faster?


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-24 Thread Peter Zijlstra
On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>netconsole is in use.

So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had
trouble like that (granted I don't typically trigger OOM storms, but
they do occasionally happen).

Is your IPMI much slower and not fixable to be faster?


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
On (01/23/18 21:52), Steven Rostedt wrote:
> On Wed, 24 Jan 2018 11:11:33 +0900
> Sergey Senozhatsky  wrote:
> 
> > Please take a look.
> 
> Was there something specific to look at?

Not really. Just my previous email, basically.
You said "I have to look at the latest code." so I replied.

Well, if the proposed direction does make sense then I'll send
out a patch.


> I'm doing a hundred different things at once, and my memory cache...

Meltdown vulnerable? Suddenly it all makes sense - you talk too fast
because of speculative execution... ;)

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
On (01/23/18 21:52), Steven Rostedt wrote:
> On Wed, 24 Jan 2018 11:11:33 +0900
> Sergey Senozhatsky  wrote:
> 
> > Please take a look.
> 
> Was there something specific to look at?

Not really. Just my previous email, basically.
You said "I have to look at the latest code." so I replied.

Well, if the proposed direction does make sense then I'll send
out a patch.


> I'm doing a hundred different things at once, and my memory cache...

Meltdown vulnerable? Suddenly it all makes sense - you talk too fast
because of speculative execution... ;)

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Wed, 24 Jan 2018 11:11:33 +0900
Sergey Senozhatsky  wrote:

> Please take a look.

Was there something specific to look at?

I'm doing a hundred different things at once, and my memory cache keeps
getting flushed.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Wed, 24 Jan 2018 11:11:33 +0900
Sergey Senozhatsky  wrote:

> Please take a look.

Was there something specific to look at?

I'm doing a hundred different things at once, and my memory cache keeps
getting flushed.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
Hello,

On (01/23/18 11:24), Steven Rostedt wrote:
[..]
> > With WQ we don't lockup the kernel, because we flush printk_safe in
> > preemptible context. And people are very much expected to fix the
> > misbehaving consoles. But that should not be printk_safe problem.
> 
> Right, but now you just made printk safe unreliable to get information
> out, because you need to wait for a schedule to occur, and if there's
> issues, like a deadlock, that thread will never run. And you just lost
> you lockdep splat.

Yes and No.

printk_safe and printk_nmi are unreliable - both need irq_work. That's
why we forcibly flush those buffers in panic(). At least for printk_safe
case, and I'm pretty sure the same stands for printk_nmi, we never said
that we will store all the messages that were printed from unsafe context
(recursion or NMI). The only thing we said - we will try not to deadlock
the system.

Now we are adding one more thing to printk_safe - we will also try not to
lockup the system.

Default printk_safe buffer size might not be enough to store a very large
lockdep splat. And we will report that the buffer is too small and that we
lost some of the lines: "here is what we have, we lost N lines, but at least
we didn't deadlock the system". See f975237b76827956fe13ecfe993a319158e2c303
for more details, it contains a list of recursive-printk deadlock scenarios
that printk_safe was meant to handle.

It is possible and OK to lose messages in printk_safe/printk_nmi

printk_safe_enter_irqsave()
  printk
  printk
  ...
  ...
  printk
  printk
printk_safe_exit_irqrestore()

No flush will take place as long as there is no IRQ on that CPU.
But printk_safe and printk_nmi are solving different problem in
the first place.

> > I'll re-read this one tomorrow. Not quite following it.
> 
> I'll add more capitals next time ;-)

Ha-ha-ha ;)

[..]
> > pintk_safe was designed to be recursive. It was never designed to be
> > used to troubleshoot or debug consoles. But it was designed to be
> > recursive - because that's the sort of the problems it was meant to
> > handle: recursive printks that would otherwise deadlock us. That's why
> > we have it in the first place.
> 
> So printk safe is only triggered when at the same context? If we can
> guarantee that printk safe is triggered only when its because a printk
> is happening at the same context (not because of an interrupt, but
> really at the same context, using my context check), then I'm fine with
> delaying them to a work queue.

printk_safe is for printk recursion only. It happens in the same context
only. When we switch to printk_safe we disable local IRQs, NMIs have their
own printk_nmi thing. And the way we flush printk_safe is mostly recursive.
Because we flush when we know that we will not deadlock [as much as we can;
we can't control any 3rd party locks which might be involved; thus
printk_deferred() usage].

Usually it's something like

   printk
spin_lock_irqsave(logbuf_lock)
 printk
  spin_lock_irqsave(logbuf_lock) << deadlock

What we have with printk_safe is

  printk
   local_irq_save
   printk_safe_enter
   spin_lock(logbuf_lock)
printk
 vprintk_safe
  queue irq work
   spin_unlock(logbuf_lock)
   printk_safe_exit
   local_irq_restore
   >>> IRQ work
   printk_safe_flush
printk
 spin_lock_irqsave(logbuf_lock)
 log_store()
 spin_unlock_irqrestore(logbuf_lock)

So we flush printk_safe ASAP, which usually (unless originally we were
not in IRQ) means that the flush is recursive, but safe - we don't
deadlock.

> That is, if we have this:
> 
>   printk()
>   console_lock()
>   
>   printk()
>   add to log buffer
>   
>   console_unlock();

Right. This is what we have right now. Every time we enable local IRQs in
the console_unlock() printing loop - we flush printk_safe. And that's the
problem.

>   printk()
>   console_lock()
>   
>   put in printk safe buffer
>   trigger work queue
>   console_unlock()
>   
>   flush safe buffer
>   printk()

Right. This is what we will have with WQ. We don't flush printk_safe until
we return from console_unlock(). Because printk() disables preemption for
the duration of console_unlock(), we can't schedule WQ on that CPU. And we
schedule flushing work only on the CPU that has triggered the recursion.

Another thing:

console_lock()
 blah blah
console_unlock()

In this case we will flush printk_safe withing the printing loop.
Immediately. But we don't care - the CPU is preemptible, we don't
lock up the kernel.


> Then I'm fine with that.
> 
> I have to look at the latest code. If this is indeed what we have, then
> I admit I misunderstood the problem you want to solve.
> 
> I only want recursive printks (those that are 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
Hello,

On (01/23/18 11:24), Steven Rostedt wrote:
[..]
> > With WQ we don't lockup the kernel, because we flush printk_safe in
> > preemptible context. And people are very much expected to fix the
> > misbehaving consoles. But that should not be printk_safe problem.
> 
> Right, but now you just made printk safe unreliable to get information
> out, because you need to wait for a schedule to occur, and if there's
> issues, like a deadlock, that thread will never run. And you just lost
> you lockdep splat.

Yes and No.

printk_safe and printk_nmi are unreliable - both need irq_work. That's
why we forcibly flush those buffers in panic(). At least for printk_safe
case, and I'm pretty sure the same stands for printk_nmi, we never said
that we will store all the messages that were printed from unsafe context
(recursion or NMI). The only thing we said - we will try not to deadlock
the system.

Now we are adding one more thing to printk_safe - we will also try not to
lockup the system.

Default printk_safe buffer size might not be enough to store a very large
lockdep splat. And we will report that the buffer is too small and that we
lost some of the lines: "here is what we have, we lost N lines, but at least
we didn't deadlock the system". See f975237b76827956fe13ecfe993a319158e2c303
for more details, it contains a list of recursive-printk deadlock scenarios
that printk_safe was meant to handle.

It is possible and OK to lose messages in printk_safe/printk_nmi

printk_safe_enter_irqsave()
  printk
  printk
  ...
  ...
  printk
  printk
printk_safe_exit_irqrestore()

No flush will take place as long as there is no IRQ on that CPU.
But printk_safe and printk_nmi are solving different problem in
the first place.

> > I'll re-read this one tomorrow. Not quite following it.
> 
> I'll add more capitals next time ;-)

Ha-ha-ha ;)

[..]
> > pintk_safe was designed to be recursive. It was never designed to be
> > used to troubleshoot or debug consoles. But it was designed to be
> > recursive - because that's the sort of the problems it was meant to
> > handle: recursive printks that would otherwise deadlock us. That's why
> > we have it in the first place.
> 
> So printk safe is only triggered when at the same context? If we can
> guarantee that printk safe is triggered only when its because a printk
> is happening at the same context (not because of an interrupt, but
> really at the same context, using my context check), then I'm fine with
> delaying them to a work queue.

printk_safe is for printk recursion only. It happens in the same context
only. When we switch to printk_safe we disable local IRQs, NMIs have their
own printk_nmi thing. And the way we flush printk_safe is mostly recursive.
Because we flush when we know that we will not deadlock [as much as we can;
we can't control any 3rd party locks which might be involved; thus
printk_deferred() usage].

Usually it's something like

   printk
spin_lock_irqsave(logbuf_lock)
 printk
  spin_lock_irqsave(logbuf_lock) << deadlock

What we have with printk_safe is

  printk
   local_irq_save
   printk_safe_enter
   spin_lock(logbuf_lock)
printk
 vprintk_safe
  queue irq work
   spin_unlock(logbuf_lock)
   printk_safe_exit
   local_irq_restore
   >>> IRQ work
   printk_safe_flush
printk
 spin_lock_irqsave(logbuf_lock)
 log_store()
 spin_unlock_irqrestore(logbuf_lock)

So we flush printk_safe ASAP, which usually (unless originally we were
not in IRQ) means that the flush is recursive, but safe - we don't
deadlock.

> That is, if we have this:
> 
>   printk()
>   console_lock()
>   
>   printk()
>   add to log buffer
>   
>   console_unlock();

Right. This is what we have right now. Every time we enable local IRQs in
the console_unlock() printing loop - we flush printk_safe. And that's the
problem.

>   printk()
>   console_lock()
>   
>   put in printk safe buffer
>   trigger work queue
>   console_unlock()
>   
>   flush safe buffer
>   printk()

Right. This is what we will have with WQ. We don't flush printk_safe until
we return from console_unlock(). Because printk() disables preemption for
the duration of console_unlock(), we can't schedule WQ on that CPU. And we
schedule flushing work only on the CPU that has triggered the recursion.

Another thing:

console_lock()
 blah blah
console_unlock()

In this case we will flush printk_safe withing the printing loop.
Immediately. But we don't care - the CPU is preemptible, we don't
lock up the kernel.


> Then I'm fine with that.
> 
> I have to look at the latest code. If this is indeed what we have, then
> I admit I misunderstood the problem you want to solve.
> 
> I only want recursive printks (those that are 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Tejun Heo
Hello, Sergey.

On Wed, Jan 24, 2018 at 01:01:53AM +0900, Sergey Senozhatsky wrote:
> On (01/23/18 10:41), Steven Rostedt wrote:
> [..]
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> Dropping the messages is not the solution either. The original bug report
> report was - this "locks up my kernel". That's it. That's all people asked
> us to solve.
> 
> With WQ we don't lockup the kernel, because we flush printk_safe in
> preemptible context. And people are very much expected to fix the
> misbehaving consoles. But that should not be printk_safe problem.

I really don't care as long as it's robust enough.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Tejun Heo
Hello, Sergey.

On Wed, Jan 24, 2018 at 01:01:53AM +0900, Sergey Senozhatsky wrote:
> On (01/23/18 10:41), Steven Rostedt wrote:
> [..]
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> Dropping the messages is not the solution either. The original bug report
> report was - this "locks up my kernel". That's it. That's all people asked
> us to solve.
> 
> With WQ we don't lockup the kernel, because we flush printk_safe in
> preemptible context. And people are very much expected to fix the
> misbehaving consoles. But that should not be printk_safe problem.

I really don't care as long as it's robust enough.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Tejun Heo
Hey,

On Tue, Jan 23, 2018 at 11:13:30AM -0500, Steven Rostedt wrote:
> From what I understand is that there's an issue with one of the printk
> consoles, due to memory pressure or whatnot. Then a printk happens
> within a printk recursively. It gets put into the safe buffer and an
> irq is sent to printk this printk.
> 
> The issue you are saying is that when the printk enables interrupts,
> the irq work triggers and loads the log buffer with the safe buffer, and
> then the printk sees the new data added and continues to print, and
> hence never leaves this printk.

I'm not sure it's irq or the same calling context, but yeah whatever
it may be, it keeps adding new data.

> Your solution is to delay the flushing of the safe buffer to another
> thread (work queue), which I also have issues with, because you break
> the "get printks out ASAP mantra". Then the work queue comes in and
> flushes the printks. And since the printks cause printks, we continue
> to spam the machine, but hey, we are making forward progress.

I'm not sure "get printks out ASAP mantra" is the overriding concern
after spending 20s flushing in an unknown context.  I'm honestly
curious.  Would that still matter that much at that point?  I went
through the recent common crashes in the fleet earlier today and a
good number of them are printk taking too long unnecessarily
escalating the situation (most commonly triggering NMI watchdog).  I'm
not saying that this should override other concerns but it seems clear
to me that we're pretty badly exposed on this front.

> Again, this is treating the symptom and not solving the problem.

Or adding a safety net when things go south, but this isn't what I was
trying to argue.  I mostly thought your understanding of what I
reported wasn't accurate and wanted to clear that up.

> I really hate delaying printks to another thread, unless we can
> guarantee that that thread is ready to go immediately (basically
> spinning on a run queue waiting to print). Because if the system is
> having issues (which is the main reason for printks to happen), there's
> no guarantee that a work queue or another thread will ever schedule,
> and the safe printk buffer never gets out to the consoles.
>
> I much rather have throttling when recursive printks are detected.
> Make it a 100 lines to print if you want, but then throttle. Because
> once you have 100 lines or so, you will know that printks are causing
> printks, and you don't give a crap about the repeated process. Allow
> one flushing of the printk safe buffers, and then if it happens again,
> throttle it.
> 
> Both methods can lose important data. I believe the throttling of
> recursive printks, after 100 prints or whatever, will be the least
> likely to lose important data, because printks caused by printks will
> just keep repeating the same data, and we don't care about repeats. But
> delaying the flushing could very well lose important data that caused
> a lockup.

Hmmm... what you're suggesting still seems more fragile - ie. when
does that 100 count get reset?  OOM prints quite a few lines and if
we're resetting on each line, that two order explosion of messages can
still be really really bad.  And issues like that seem to suggest that
the root problem to handle here is avoiding locking up a context in
flushing for too long.  Your approach is trying to avoid causing that
but it's a symptom which can be reached in many different ways.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Tejun Heo
Hey,

On Tue, Jan 23, 2018 at 11:13:30AM -0500, Steven Rostedt wrote:
> From what I understand is that there's an issue with one of the printk
> consoles, due to memory pressure or whatnot. Then a printk happens
> within a printk recursively. It gets put into the safe buffer and an
> irq is sent to printk this printk.
> 
> The issue you are saying is that when the printk enables interrupts,
> the irq work triggers and loads the log buffer with the safe buffer, and
> then the printk sees the new data added and continues to print, and
> hence never leaves this printk.

I'm not sure it's irq or the same calling context, but yeah whatever
it may be, it keeps adding new data.

> Your solution is to delay the flushing of the safe buffer to another
> thread (work queue), which I also have issues with, because you break
> the "get printks out ASAP mantra". Then the work queue comes in and
> flushes the printks. And since the printks cause printks, we continue
> to spam the machine, but hey, we are making forward progress.

I'm not sure "get printks out ASAP mantra" is the overriding concern
after spending 20s flushing in an unknown context.  I'm honestly
curious.  Would that still matter that much at that point?  I went
through the recent common crashes in the fleet earlier today and a
good number of them are printk taking too long unnecessarily
escalating the situation (most commonly triggering NMI watchdog).  I'm
not saying that this should override other concerns but it seems clear
to me that we're pretty badly exposed on this front.

> Again, this is treating the symptom and not solving the problem.

Or adding a safety net when things go south, but this isn't what I was
trying to argue.  I mostly thought your understanding of what I
reported wasn't accurate and wanted to clear that up.

> I really hate delaying printks to another thread, unless we can
> guarantee that that thread is ready to go immediately (basically
> spinning on a run queue waiting to print). Because if the system is
> having issues (which is the main reason for printks to happen), there's
> no guarantee that a work queue or another thread will ever schedule,
> and the safe printk buffer never gets out to the consoles.
>
> I much rather have throttling when recursive printks are detected.
> Make it a 100 lines to print if you want, but then throttle. Because
> once you have 100 lines or so, you will know that printks are causing
> printks, and you don't give a crap about the repeated process. Allow
> one flushing of the printk safe buffers, and then if it happens again,
> throttle it.
> 
> Both methods can lose important data. I believe the throttling of
> recursive printks, after 100 prints or whatever, will be the least
> likely to lose important data, because printks caused by printks will
> just keep repeating the same data, and we don't care about repeats. But
> delaying the flushing could very well lose important data that caused
> a lockup.

Hmmm... what you're suggesting still seems more fragile - ie. when
does that 100 count get reset?  OOM prints quite a few lines and if
we're resetting on each line, that two order explosion of messages can
still be really really bad.  And issues like that seem to suggest that
the root problem to handle here is avoiding locking up a context in
flushing for too long.  Your approach is trying to avoid causing that
but it's a symptom which can be reached in many different ways.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Wed, 24 Jan 2018 01:01:53 +0900
Sergey Senozhatsky  wrote:

> On (01/23/18 10:41), Steven Rostedt wrote:
> [..]
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.  
> 
> Dropping the messages is not the solution either. The original bug report
> report was - this "locks up my kernel". That's it. That's all people asked
> us to solve.

And throttling the printks would stop the lock up too.

> 
> With WQ we don't lockup the kernel, because we flush printk_safe in
> preemptible context. And people are very much expected to fix the
> misbehaving consoles. But that should not be printk_safe problem.

Right, but now you just made printk safe unreliable to get information
out, because you need to wait for a schedule to occur, and if there's
issues, like a deadlock, that thread will never run. And you just lost
you lockdep splat.

> 
> > A printk causing a printk is a special case, and we need to just show
> > enough to let the user know that its happening, and why printks are
> > being throttled. Yes, we may lose data, but if every printk that goes
> > out causes another printk, then there's going to be so much noise that
> > we wont know what other things went wrong. Honestly, if someone showed
> > me a report where the logs were filled with printks that caused
> > printks, I'd stop right there and tell them that needs to be fixed
> > before we do anything else. And if that recursion is happening because
> > of another problem, I don't want to see the recursion printks. I want
> > to see the printks that show what is causing the recursions.  
> 
> I'll re-read this one tomorrow. Not quite following it.

I'll add more capitals next time ;-)

> 
> > > The problem is - we flush printk_safe too soon and printing CPU ends up
> > > in a lockup - it log_store()-s new messages while it's printing the 
> > > pending  
> > 
> > No, the problem is that printks are causing more printks. Yes that will
> > make flushing them soon more likely to lock up the system. But that's
> > not the problem. The problem is printks causing printks.  
> 
> Yes. And ignoring those printk()-s by simply dropping them does not fix
> the problem by any means.

How so? If we drop them, then the stuck printk has nothing to print and
will move forward.

I say once you start dropping printks due to recursion, keep dropping
them. For at least a second, to allow them to stop killing the machine.

> 
> > > ones. It's fine to do so when CPU is in preemptible context. Really, we
> > > should not care in printk_safe as long as we don't lockup the kernel. The
> > > misbehaving console must be fixed. If CPU is not in preemptible context 
> > > then
> > > we do lockup the kernel. Because we flush printk_safe regardless of the
> > > current CPU context. If we will flush printk_safe via WQ then we 
> > > automatically  
> > 
> > And if we can throttle recursive printks, then we should be able to
> > stop that from happening.  
> 
> pintk_safe was designed to be recursive. It was never designed to be
> used to troubleshoot or debug consoles. But it was designed to be
> recursive - because that's the sort of the problems it was meant to
> handle: recursive printks that would otherwise deadlock us. That's why
> we have it in the first place.

So printk safe is only triggered when at the same context? If we can
guarantee that printk safe is triggered only when its because a printk
is happening at the same context (not because of an interrupt, but
really at the same context, using my context check), then I'm fine with
delaying them to a work queue.

That is, if we have this:

printk()
console_lock()

printk()
add to log buffer

console_unlock();


printk()
console_lock()

put in printk safe buffer
trigger work queue
console_unlock()

flush safe buffer
printk()

Then I'm fine with that.

I have to look at the latest code. If this is indeed what we have, then
I admit I misunderstood the problem you want to solve.

I only want recursive printks (those that are actually triggered by
doing a printk) to be allowed to be delayed.

Make sense?

-- Steve



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Wed, 24 Jan 2018 01:01:53 +0900
Sergey Senozhatsky  wrote:

> On (01/23/18 10:41), Steven Rostedt wrote:
> [..]
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.  
> 
> Dropping the messages is not the solution either. The original bug report
> report was - this "locks up my kernel". That's it. That's all people asked
> us to solve.

And throttling the printks would stop the lock up too.

> 
> With WQ we don't lockup the kernel, because we flush printk_safe in
> preemptible context. And people are very much expected to fix the
> misbehaving consoles. But that should not be printk_safe problem.

Right, but now you just made printk safe unreliable to get information
out, because you need to wait for a schedule to occur, and if there's
issues, like a deadlock, that thread will never run. And you just lost
you lockdep splat.

> 
> > A printk causing a printk is a special case, and we need to just show
> > enough to let the user know that its happening, and why printks are
> > being throttled. Yes, we may lose data, but if every printk that goes
> > out causes another printk, then there's going to be so much noise that
> > we wont know what other things went wrong. Honestly, if someone showed
> > me a report where the logs were filled with printks that caused
> > printks, I'd stop right there and tell them that needs to be fixed
> > before we do anything else. And if that recursion is happening because
> > of another problem, I don't want to see the recursion printks. I want
> > to see the printks that show what is causing the recursions.  
> 
> I'll re-read this one tomorrow. Not quite following it.

I'll add more capitals next time ;-)

> 
> > > The problem is - we flush printk_safe too soon and printing CPU ends up
> > > in a lockup - it log_store()-s new messages while it's printing the 
> > > pending  
> > 
> > No, the problem is that printks are causing more printks. Yes that will
> > make flushing them soon more likely to lock up the system. But that's
> > not the problem. The problem is printks causing printks.  
> 
> Yes. And ignoring those printk()-s by simply dropping them does not fix
> the problem by any means.

How so? If we drop them, then the stuck printk has nothing to print and
will move forward.

I say once you start dropping printks due to recursion, keep dropping
them. For at least a second, to allow them to stop killing the machine.

> 
> > > ones. It's fine to do so when CPU is in preemptible context. Really, we
> > > should not care in printk_safe as long as we don't lockup the kernel. The
> > > misbehaving console must be fixed. If CPU is not in preemptible context 
> > > then
> > > we do lockup the kernel. Because we flush printk_safe regardless of the
> > > current CPU context. If we will flush printk_safe via WQ then we 
> > > automatically  
> > 
> > And if we can throttle recursive printks, then we should be able to
> > stop that from happening.  
> 
> pintk_safe was designed to be recursive. It was never designed to be
> used to troubleshoot or debug consoles. But it was designed to be
> recursive - because that's the sort of the problems it was meant to
> handle: recursive printks that would otherwise deadlock us. That's why
> we have it in the first place.

So printk safe is only triggered when at the same context? If we can
guarantee that printk safe is triggered only when its because a printk
is happening at the same context (not because of an interrupt, but
really at the same context, using my context check), then I'm fine with
delaying them to a work queue.

That is, if we have this:

printk()
console_lock()

printk()
add to log buffer

console_unlock();


printk()
console_lock()

put in printk safe buffer
trigger work queue
console_unlock()

flush safe buffer
printk()

Then I'm fine with that.

I have to look at the latest code. If this is indeed what we have, then
I admit I misunderstood the problem you want to solve.

I only want recursive printks (those that are actually triggered by
doing a printk) to be allowed to be delayed.

Make sense?

-- Steve



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Tue, 23 Jan 2018 07:43:47 -0800
Tejun Heo  wrote:

> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.
>

Does it?

>From what I understand is that there's an issue with one of the printk
consoles, due to memory pressure or whatnot. Then a printk happens
within a printk recursively. It gets put into the safe buffer and an
irq is sent to printk this printk.

The issue you are saying is that when the printk enables interrupts,
the irq work triggers and loads the log buffer with the safe buffer, and
then the printk sees the new data added and continues to print, and
hence never leaves this printk.

Your solution is to delay the flushing of the safe buffer to another
thread (work queue), which I also have issues with, because you break
the "get printks out ASAP mantra". Then the work queue comes in and
flushes the printks. And since the printks cause printks, we continue
to spam the machine, but hey, we are making forward progress.

Again, this is treating the symptom and not solving the problem.

I really hate delaying printks to another thread, unless we can
guarantee that that thread is ready to go immediately (basically
spinning on a run queue waiting to print). Because if the system is
having issues (which is the main reason for printks to happen), there's
no guarantee that a work queue or another thread will ever schedule,
and the safe printk buffer never gets out to the consoles.

I much rather have throttling when recursive printks are detected.
Make it a 100 lines to print if you want, but then throttle. Because
once you have 100 lines or so, you will know that printks are causing
printks, and you don't give a crap about the repeated process. Allow
one flushing of the printk safe buffers, and then if it happens again,
throttle it.

Both methods can lose important data. I believe the throttling of
recursive printks, after 100 prints or whatever, will be the least
likely to lose important data, because printks caused by printks will
just keep repeating the same data, and we don't care about repeats. But
delaying the flushing could very well lose important data that caused
a lockup.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Tue, 23 Jan 2018 07:43:47 -0800
Tejun Heo  wrote:

> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.
>

Does it?

>From what I understand is that there's an issue with one of the printk
consoles, due to memory pressure or whatnot. Then a printk happens
within a printk recursively. It gets put into the safe buffer and an
irq is sent to printk this printk.

The issue you are saying is that when the printk enables interrupts,
the irq work triggers and loads the log buffer with the safe buffer, and
then the printk sees the new data added and continues to print, and
hence never leaves this printk.

Your solution is to delay the flushing of the safe buffer to another
thread (work queue), which I also have issues with, because you break
the "get printks out ASAP mantra". Then the work queue comes in and
flushes the printks. And since the printks cause printks, we continue
to spam the machine, but hey, we are making forward progress.

Again, this is treating the symptom and not solving the problem.

I really hate delaying printks to another thread, unless we can
guarantee that that thread is ready to go immediately (basically
spinning on a run queue waiting to print). Because if the system is
having issues (which is the main reason for printks to happen), there's
no guarantee that a work queue or another thread will ever schedule,
and the safe printk buffer never gets out to the consoles.

I much rather have throttling when recursive printks are detected.
Make it a 100 lines to print if you want, but then throttle. Because
once you have 100 lines or so, you will know that printks are causing
printks, and you don't give a crap about the repeated process. Allow
one flushing of the printk safe buffers, and then if it happens again,
throttle it.

Both methods can lose important data. I believe the throttling of
recursive printks, after 100 prints or whatever, will be the least
likely to lose important data, because printks caused by printks will
just keep repeating the same data, and we don't care about repeats. But
delaying the flushing could very well lose important data that caused
a lockup.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
Hello, Tejun

On (01/23/18 07:43), Tejun Heo wrote:
> Hello, Steven.
> 
> On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote:
> > > I don't want to have heuristics in print_safe, I don't want to have a 
> > > magic
> > > number controlled by a user-space visible knob, I don't want to have the
> > > first 3 lines of a lockdep splat.
> > 
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.

Would you, as the original bug reporter, be OK if we flush printk_safe (only
printk_safe, not printk_nmi for the time being) via WQ? This should move that
"uncontrolled" flush to a safe context. I don't think we can easily add
kthread offloading to printk at the moment (this will result in a massive gun
fight).

Just in case, below is something like a patch. I think I worked around the
possible wq deadlock scenario. But I haven't tested the patch yet. It's
a bit late here and I guess I need some rest. Will try to look more at
it tomorrow.

From: Sergey Senozhatsky 
Subject: [PATCH] printk/safe: split flush works

---
 kernel/printk/printk_safe.c | 75 +
 1 file changed, 63 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..54bc40ce3c34 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
@@ -49,7 +50,10 @@ static int printk_safe_irq_ready __read_mostly;
 struct printk_safe_seq_buf {
atomic_tlen;/* length of written data */
atomic_tmessage_lost;
-   struct irq_work work;   /* IRQ work that flushes the buffer */
+   /* IRQ work that flushes NMI buffer */
+   struct irq_work irq_flush_work;
+   /* WQ work that flushes SAFE buffer */
+   struct work_struct  wq_flush_work;
unsigned char   buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,10 +65,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, 
nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
if (printk_safe_irq_ready)
-   irq_work_queue(>work);
+   irq_work_queue(>irq_flush_work);
+}
+
+static void queue_wq_flush_work(struct printk_safe_seq_buf *s)
+{
+   if (printk_safe_irq_ready)
+   queue_work_on(smp_processor_id(),
+   system_wq,
+   >wq_flush_work);
 }
 
 /*
@@ -89,7 +101,7 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
/* The trailing '\0' is not counted into len. */
if (len >= sizeof(s->buffer) - 1) {
atomic_inc(>message_lost);
-   queue_flush_work(s);
+   queue_irq_flush_work(s);
return 0;
}
 
@@ -112,7 +124,7 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
if (atomic_cmpxchg(>len, len, len + add) != len)
goto again;
 
-   queue_flush_work(s);
+   queue_irq_flush_work(s);
return add;
 }
 
@@ -186,12 +198,10 @@ static void report_message_lost(struct 
printk_safe_seq_buf *s)
  * Flush data from the associated per-CPU buffer. The function
  * can be called either via IRQ work or independently.
  */
-static void __printk_safe_flush(struct irq_work *work)
+static void __printk_safe_flush(struct printk_safe_seq_buf *s)
 {
static raw_spinlock_t read_lock =
__RAW_SPIN_LOCK_INITIALIZER(read_lock);
-   struct printk_safe_seq_buf *s =
-   container_of(work, struct printk_safe_seq_buf, work);
unsigned long flags;
size_t len;
int i;
@@ -243,6 +253,46 @@ static void __printk_safe_flush(struct irq_work *work)
raw_spin_unlock_irqrestore(_lock, flags);
 }
 
+static void irq_flush_work_fn(struct irq_work *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+   __printk_safe_flush(s);
+}
+
+/*
+ * We can't queue wq work directly from vprintk_safe(), because we can
+ * deadlock. 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
Hello, Tejun

On (01/23/18 07:43), Tejun Heo wrote:
> Hello, Steven.
> 
> On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote:
> > > I don't want to have heuristics in print_safe, I don't want to have a 
> > > magic
> > > number controlled by a user-space visible knob, I don't want to have the
> > > first 3 lines of a lockdep splat.
> > 
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.

Would you, as the original bug reporter, be OK if we flush printk_safe (only
printk_safe, not printk_nmi for the time being) via WQ? This should move that
"uncontrolled" flush to a safe context. I don't think we can easily add
kthread offloading to printk at the moment (this will result in a massive gun
fight).

Just in case, below is something like a patch. I think I worked around the
possible wq deadlock scenario. But I haven't tested the patch yet. It's
a bit late here and I guess I need some rest. Will try to look more at
it tomorrow.

From: Sergey Senozhatsky 
Subject: [PATCH] printk/safe: split flush works

---
 kernel/printk/printk_safe.c | 75 +
 1 file changed, 63 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..54bc40ce3c34 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
@@ -49,7 +50,10 @@ static int printk_safe_irq_ready __read_mostly;
 struct printk_safe_seq_buf {
atomic_tlen;/* length of written data */
atomic_tmessage_lost;
-   struct irq_work work;   /* IRQ work that flushes the buffer */
+   /* IRQ work that flushes NMI buffer */
+   struct irq_work irq_flush_work;
+   /* WQ work that flushes SAFE buffer */
+   struct work_struct  wq_flush_work;
unsigned char   buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,10 +65,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, 
nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
if (printk_safe_irq_ready)
-   irq_work_queue(>work);
+   irq_work_queue(>irq_flush_work);
+}
+
+static void queue_wq_flush_work(struct printk_safe_seq_buf *s)
+{
+   if (printk_safe_irq_ready)
+   queue_work_on(smp_processor_id(),
+   system_wq,
+   >wq_flush_work);
 }
 
 /*
@@ -89,7 +101,7 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
/* The trailing '\0' is not counted into len. */
if (len >= sizeof(s->buffer) - 1) {
atomic_inc(>message_lost);
-   queue_flush_work(s);
+   queue_irq_flush_work(s);
return 0;
}
 
@@ -112,7 +124,7 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
if (atomic_cmpxchg(>len, len, len + add) != len)
goto again;
 
-   queue_flush_work(s);
+   queue_irq_flush_work(s);
return add;
 }
 
@@ -186,12 +198,10 @@ static void report_message_lost(struct 
printk_safe_seq_buf *s)
  * Flush data from the associated per-CPU buffer. The function
  * can be called either via IRQ work or independently.
  */
-static void __printk_safe_flush(struct irq_work *work)
+static void __printk_safe_flush(struct printk_safe_seq_buf *s)
 {
static raw_spinlock_t read_lock =
__RAW_SPIN_LOCK_INITIALIZER(read_lock);
-   struct printk_safe_seq_buf *s =
-   container_of(work, struct printk_safe_seq_buf, work);
unsigned long flags;
size_t len;
int i;
@@ -243,6 +253,46 @@ static void __printk_safe_flush(struct irq_work *work)
raw_spin_unlock_irqrestore(_lock, flags);
 }
 
+static void irq_flush_work_fn(struct irq_work *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+   __printk_safe_flush(s);
+}
+
+/*
+ * We can't queue wq work directly from vprintk_safe(), because we can
+ * deadlock. For instance:
+ *
+ * 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
On (01/23/18 10:41), Steven Rostedt wrote:
[..]
> We can have more. But if printk is causing printks, that's a major bug.
> And work queues are not going to fix it, it will just spread out the
> pain. Have it be 100 printks, it needs to be fixed if it is happening.
> And having all printks just generate more printks is not helpful. Even
> if we slow them down. They will still never end.

Dropping the messages is not the solution either. The original bug report
report was - this "locks up my kernel". That's it. That's all people asked
us to solve.

With WQ we don't lockup the kernel, because we flush printk_safe in
preemptible context. And people are very much expected to fix the
misbehaving consoles. But that should not be printk_safe problem.

> A printk causing a printk is a special case, and we need to just show
> enough to let the user know that its happening, and why printks are
> being throttled. Yes, we may lose data, but if every printk that goes
> out causes another printk, then there's going to be so much noise that
> we wont know what other things went wrong. Honestly, if someone showed
> me a report where the logs were filled with printks that caused
> printks, I'd stop right there and tell them that needs to be fixed
> before we do anything else. And if that recursion is happening because
> of another problem, I don't want to see the recursion printks. I want
> to see the printks that show what is causing the recursions.

I'll re-read this one tomorrow. Not quite following it.

> > The problem is - we flush printk_safe too soon and printing CPU ends up
> > in a lockup - it log_store()-s new messages while it's printing the pending
> 
> No, the problem is that printks are causing more printks. Yes that will
> make flushing them soon more likely to lock up the system. But that's
> not the problem. The problem is printks causing printks.

Yes. And ignoring those printk()-s by simply dropping them does not fix
the problem by any means.

> > ones. It's fine to do so when CPU is in preemptible context. Really, we
> > should not care in printk_safe as long as we don't lockup the kernel. The
> > misbehaving console must be fixed. If CPU is not in preemptible context then
> > we do lockup the kernel. Because we flush printk_safe regardless of the
> > current CPU context. If we will flush printk_safe via WQ then we 
> > automatically
> 
> And if we can throttle recursive printks, then we should be able to
> stop that from happening.

pintk_safe was designed to be recursive. It was never designed to be
used to troubleshoot or debug consoles. But it was designed to be
recursive - because that's the sort of the problems it was meant to
handle: recursive printks that would otherwise deadlock us. That's why
we have it in the first place.

> > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, 
> > we
> > will not lockup it up." thing. Yes, we fill up the logbuf with probably 
> > needed
> > and appreciated or unneeded messages. But we should not care in printk_safe.
> > We don't lockup the kernel... And the misbehaving console must be fixed.
> 
> I agree.

Good.

> > I disagree with "If we are having issues with irq_work, we are going to have
> > issues with a work queue". There is a tremendous difference between irq_work
> > on that CPU and queue_work_on(smp_proessor_id()). One does not care about 
> > CPU
> > context, the other one does.
> 
> But switching to work queue does not address the underlining problem
> that printks are causing more printks.

The only way to address those problems is to fix the console. That's the only.

But that's not what I'm doing with my proposal. I fix the lockup scenario, the
only reported problem so far. Whilst also keeping printk_safe around.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
On (01/23/18 10:41), Steven Rostedt wrote:
[..]
> We can have more. But if printk is causing printks, that's a major bug.
> And work queues are not going to fix it, it will just spread out the
> pain. Have it be 100 printks, it needs to be fixed if it is happening.
> And having all printks just generate more printks is not helpful. Even
> if we slow them down. They will still never end.

Dropping the messages is not the solution either. The original bug report
report was - this "locks up my kernel". That's it. That's all people asked
us to solve.

With WQ we don't lockup the kernel, because we flush printk_safe in
preemptible context. And people are very much expected to fix the
misbehaving consoles. But that should not be printk_safe problem.

> A printk causing a printk is a special case, and we need to just show
> enough to let the user know that its happening, and why printks are
> being throttled. Yes, we may lose data, but if every printk that goes
> out causes another printk, then there's going to be so much noise that
> we wont know what other things went wrong. Honestly, if someone showed
> me a report where the logs were filled with printks that caused
> printks, I'd stop right there and tell them that needs to be fixed
> before we do anything else. And if that recursion is happening because
> of another problem, I don't want to see the recursion printks. I want
> to see the printks that show what is causing the recursions.

I'll re-read this one tomorrow. Not quite following it.

> > The problem is - we flush printk_safe too soon and printing CPU ends up
> > in a lockup - it log_store()-s new messages while it's printing the pending
> 
> No, the problem is that printks are causing more printks. Yes that will
> make flushing them soon more likely to lock up the system. But that's
> not the problem. The problem is printks causing printks.

Yes. And ignoring those printk()-s by simply dropping them does not fix
the problem by any means.

> > ones. It's fine to do so when CPU is in preemptible context. Really, we
> > should not care in printk_safe as long as we don't lockup the kernel. The
> > misbehaving console must be fixed. If CPU is not in preemptible context then
> > we do lockup the kernel. Because we flush printk_safe regardless of the
> > current CPU context. If we will flush printk_safe via WQ then we 
> > automatically
> 
> And if we can throttle recursive printks, then we should be able to
> stop that from happening.

pintk_safe was designed to be recursive. It was never designed to be
used to troubleshoot or debug consoles. But it was designed to be
recursive - because that's the sort of the problems it was meant to
handle: recursive printks that would otherwise deadlock us. That's why
we have it in the first place.

> > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, 
> > we
> > will not lockup it up." thing. Yes, we fill up the logbuf with probably 
> > needed
> > and appreciated or unneeded messages. But we should not care in printk_safe.
> > We don't lockup the kernel... And the misbehaving console must be fixed.
> 
> I agree.

Good.

> > I disagree with "If we are having issues with irq_work, we are going to have
> > issues with a work queue". There is a tremendous difference between irq_work
> > on that CPU and queue_work_on(smp_proessor_id()). One does not care about 
> > CPU
> > context, the other one does.
> 
> But switching to work queue does not address the underlining problem
> that printks are causing more printks.

The only way to address those problems is to fix the console. That's the only.

But that's not what I'm doing with my proposal. I fix the lockup scenario, the
only reported problem so far. Whilst also keeping printk_safe around.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Tejun Heo
Hello, Steven.

On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote:
> > I don't want to have heuristics in print_safe, I don't want to have a magic
> > number controlled by a user-space visible knob, I don't want to have the
> > first 3 lines of a lockdep splat.
> 
> We can have more. But if printk is causing printks, that's a major bug.
> And work queues are not going to fix it, it will just spread out the
> pain. Have it be 100 printks, it needs to be fixed if it is happening.
> And having all printks just generate more printks is not helpful. Even
> if we slow them down. They will still never end.

So, at least in the case that we were seeing, it isn't that black and
white.  printk keeps causing printks but only because printk buffer
flushing is preventing the printk'ing context from making forward
progress.  The key problem there is that a flushing context may get
pinned flushing indefinitely and using a separate context does solve
the problem.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Tejun Heo
Hello, Steven.

On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote:
> > I don't want to have heuristics in print_safe, I don't want to have a magic
> > number controlled by a user-space visible knob, I don't want to have the
> > first 3 lines of a lockdep splat.
> 
> We can have more. But if printk is causing printks, that's a major bug.
> And work queues are not going to fix it, it will just spread out the
> pain. Have it be 100 printks, it needs to be fixed if it is happening.
> And having all printks just generate more printks is not helpful. Even
> if we slow them down. They will still never end.

So, at least in the case that we were seeing, it isn't that black and
white.  printk keeps causing printks but only because printk buffer
flushing is preventing the printk'ing context from making forward
progress.  The key problem there is that a flushing context may get
pinned flushing indefinitely and using a separate context does solve
the problem.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Wed, 24 Jan 2018 00:21:30 +0900
Sergey Senozhatsky  wrote:

> On (01/23/18 09:56), Steven Rostedt wrote:
> [..]
> > > Why do we even use irq_work for printk_safe?  
> > 
> > Why not?
> > 
> > Really, I think you are trying to solve a symptom and not the problem.
> > If we are having issues with irq_work, we are going to have issues with
> > a work queue. It's just spreading out the problem instead of fixing it.  
> 
> I don't want to have heuristics in print_safe, I don't want to have a magic
> number controlled by a user-space visible knob, I don't want to have the
> first 3 lines of a lockdep splat.

We can have more. But if printk is causing printks, that's a major bug.
And work queues are not going to fix it, it will just spread out the
pain. Have it be 100 printks, it needs to be fixed if it is happening.
And having all printks just generate more printks is not helpful. Even
if we slow them down. They will still never end.

A printk causing a printk is a special case, and we need to just show
enough to let the user know that its happening, and why printks are
being throttled. Yes, we may lose data, but if every printk that goes
out causes another printk, then there's going to be so much noise that
we wont know what other things went wrong. Honestly, if someone showed
me a report where the logs were filled with printks that caused
printks, I'd stop right there and tell them that needs to be fixed
before we do anything else. And if that recursion is happening because
of another problem, I don't want to see the recursion printks. I want
to see the printks that show what is causing the recursions.



> The problem is - we flush printk_safe too soon and printing CPU ends up
> in a lockup - it log_store()-s new messages while it's printing the pending

No, the problem is that printks are causing more printks. Yes that will
make flushing them soon more likely to lock up the system. But that's
not the problem. The problem is printks causing printks.

> ones. It's fine to do so when CPU is in preemptible context. Really, we
> should not care in printk_safe as long as we don't lockup the kernel. The
> misbehaving console must be fixed. If CPU is not in preemptible context then
> we do lockup the kernel. Because we flush printk_safe regardless of the
> current CPU context. If we will flush printk_safe via WQ then we automatically

And if we can throttle recursive printks, then we should be able to
stop that from happening.

> add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we
> will not lockup it up." thing. Yes, we fill up the logbuf with probably needed
> and appreciated or unneeded messages. But we should not care in printk_safe.
> We don't lockup the kernel... And the misbehaving console must be fixed.

I agree.

> 
> I disagree with "If we are having issues with irq_work, we are going to have
> issues with a work queue". There is a tremendous difference between irq_work
> on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU
> context, the other one does.

But switching to work queue does not address the underlining problem
that printks are causing more printks.

-- Steve



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Wed, 24 Jan 2018 00:21:30 +0900
Sergey Senozhatsky  wrote:

> On (01/23/18 09:56), Steven Rostedt wrote:
> [..]
> > > Why do we even use irq_work for printk_safe?  
> > 
> > Why not?
> > 
> > Really, I think you are trying to solve a symptom and not the problem.
> > If we are having issues with irq_work, we are going to have issues with
> > a work queue. It's just spreading out the problem instead of fixing it.  
> 
> I don't want to have heuristics in print_safe, I don't want to have a magic
> number controlled by a user-space visible knob, I don't want to have the
> first 3 lines of a lockdep splat.

We can have more. But if printk is causing printks, that's a major bug.
And work queues are not going to fix it, it will just spread out the
pain. Have it be 100 printks, it needs to be fixed if it is happening.
And having all printks just generate more printks is not helpful. Even
if we slow them down. They will still never end.

A printk causing a printk is a special case, and we need to just show
enough to let the user know that its happening, and why printks are
being throttled. Yes, we may lose data, but if every printk that goes
out causes another printk, then there's going to be so much noise that
we wont know what other things went wrong. Honestly, if someone showed
me a report where the logs were filled with printks that caused
printks, I'd stop right there and tell them that needs to be fixed
before we do anything else. And if that recursion is happening because
of another problem, I don't want to see the recursion printks. I want
to see the printks that show what is causing the recursions.



> The problem is - we flush printk_safe too soon and printing CPU ends up
> in a lockup - it log_store()-s new messages while it's printing the pending

No, the problem is that printks are causing more printks. Yes that will
make flushing them soon more likely to lock up the system. But that's
not the problem. The problem is printks causing printks.

> ones. It's fine to do so when CPU is in preemptible context. Really, we
> should not care in printk_safe as long as we don't lockup the kernel. The
> misbehaving console must be fixed. If CPU is not in preemptible context then
> we do lockup the kernel. Because we flush printk_safe regardless of the
> current CPU context. If we will flush printk_safe via WQ then we automatically

And if we can throttle recursive printks, then we should be able to
stop that from happening.

> add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we
> will not lockup it up." thing. Yes, we fill up the logbuf with probably needed
> and appreciated or unneeded messages. But we should not care in printk_safe.
> We don't lockup the kernel... And the misbehaving console must be fixed.

I agree.

> 
> I disagree with "If we are having issues with irq_work, we are going to have
> issues with a work queue". There is a tremendous difference between irq_work
> on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU
> context, the other one does.

But switching to work queue does not address the underlining problem
that printks are causing more printks.

-- Steve



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
On (01/23/18 09:56), Steven Rostedt wrote:
[..]
> > Why do we even use irq_work for printk_safe?
> 
> Why not?
> 
> Really, I think you are trying to solve a symptom and not the problem.
> If we are having issues with irq_work, we are going to have issues with
> a work queue. It's just spreading out the problem instead of fixing it.

I don't want to have heuristics in print_safe, I don't want to have a magic
number controlled by a user-space visible knob, I don't want to have the
first 3 lines of a lockdep splat.


The problem is - we flush printk_safe too soon and printing CPU ends up
in a lockup - it log_store()-s new messages while it's printing the pending
ones. It's fine to do so when CPU is in preemptible context. Really, we
should not care in printk_safe as long as we don't lockup the kernel. The
misbehaving console must be fixed. If CPU is not in preemptible context then
we do lockup the kernel. Because we flush printk_safe regardless of the
current CPU context. If we will flush printk_safe via WQ then we automatically
add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we
will not lockup it up." thing. Yes, we fill up the logbuf with probably needed
and appreciated or unneeded messages. But we should not care in printk_safe.
We don't lockup the kernel... And the misbehaving console must be fixed.

I disagree with "If we are having issues with irq_work, we are going to have
issues with a work queue". There is a tremendous difference between irq_work
on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU
context, the other one does.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Sergey Senozhatsky
On (01/23/18 09:56), Steven Rostedt wrote:
[..]
> > Why do we even use irq_work for printk_safe?
> 
> Why not?
> 
> Really, I think you are trying to solve a symptom and not the problem.
> If we are having issues with irq_work, we are going to have issues with
> a work queue. It's just spreading out the problem instead of fixing it.

I don't want to have heuristics in print_safe, I don't want to have a magic
number controlled by a user-space visible knob, I don't want to have the
first 3 lines of a lockdep splat.


The problem is - we flush printk_safe too soon and printing CPU ends up
in a lockup - it log_store()-s new messages while it's printing the pending
ones. It's fine to do so when CPU is in preemptible context. Really, we
should not care in printk_safe as long as we don't lockup the kernel. The
misbehaving console must be fixed. If CPU is not in preemptible context then
we do lockup the kernel. Because we flush printk_safe regardless of the
current CPU context. If we will flush printk_safe via WQ then we automatically
add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we
will not lockup it up." thing. Yes, we fill up the logbuf with probably needed
and appreciated or unneeded messages. But we should not care in printk_safe.
We don't lockup the kernel... And the misbehaving console must be fixed.

I disagree with "If we are having issues with irq_work, we are going to have
issues with a work queue". There is a tremendous difference between irq_work
on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU
context, the other one does.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Tue, 23 Jan 2018 15:40:23 +0900
Sergey Senozhatsky  wrote:

> Why do we even use irq_work for printk_safe?

Why not?

Really, I think you are trying to solve a symptom and not the problem.
If we are having issues with irq_work, we are going to have issues with
a work queue. It's just spreading out the problem instead of fixing it.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-23 Thread Steven Rostedt
On Tue, 23 Jan 2018 15:40:23 +0900
Sergey Senozhatsky  wrote:

> Why do we even use irq_work for printk_safe?

Why not?

Really, I think you are trying to solve a symptom and not the problem.
If we are having issues with irq_work, we are going to have issues with
a work queue. It's just spreading out the problem instead of fixing it.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/23/18 15:40), Sergey Senozhatsky wrote:
> 
> Why do we even use irq_work for printk_safe?
> 

... perhaps because of

wq: pool->lock -> printk -> call_console_drivers -> printk -> vprintk_safe -> 
wq: pool->lock

Which is a "many things have gone wrong" type of scenario. Maybe we
can workaround it somehow, hm. Tejun, can we have lockless WQ? ;)

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/23/18 15:40), Sergey Senozhatsky wrote:
> 
> Why do we even use irq_work for printk_safe?
> 

... perhaps because of

wq: pool->lock -> printk -> call_console_drivers -> printk -> vprintk_safe -> 
wq: pool->lock

Which is a "many things have gone wrong" type of scenario. Maybe we
can workaround it somehow, hm. Tejun, can we have lockless WQ? ;)

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/23/18 15:40), Sergey Senozhatsky wrote:
[..]
> Why do we even use irq_work for printk_safe?
> 
> Okay... So, how about this. For printk_safe we use system_wq for flushing.
> IOW, we flush from a task running exactly on the same CPU which hit printk
> recursion, not from IRQ. From vprintk_safe() recursion, we queue work on
> *that* CPU. Which gives us the following thing: if CPU stuck in
> console_unlock() loop with preemption disabled, then system_wq does not
> schedule on that CPU and we, thus, don't flush printk_safe buffer from that
> CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe
> buffer, printing extra messages from that CPU will not lock it up, because
> it's in preemptible context.
> 
> Thoughts?

A slightly reworked version:
a) Do not check console_locked
b) Do not have irq_work fast path for printk_safe buffer
 c) Which lets to union WQ/IRQ work structs - we use only IRQ work for
NMI buffers, and only WQ work for SAFE buffers
 d) And also to refactor the code

From: Sergey Senozhatsky 
Subject: [PATCH] printk/safe: use system_wq to flush printk_safe buffers

---
 kernel/printk/printk_safe.c | 52 ++---
 1 file changed, 40 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..6c8c82cedccb 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
@@ -49,7 +50,12 @@ static int printk_safe_irq_ready __read_mostly;
 struct printk_safe_seq_buf {
atomic_tlen;/* length of written data */
atomic_tmessage_lost;
-   struct irq_work work;   /* IRQ work that flushes the buffer */
+   union {
+   /* IRQ work that flushes NMI buffer */
+   struct irq_work irq_flush_work;
+   /* WQ work that flushes SAFE buffer */
+   struct work_struct  wq_flush_work;
+   };
unsigned char   buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,10 +67,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, 
nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
if (printk_safe_irq_ready)
-   irq_work_queue(>work);
+   irq_work_queue(>irq_flush_work);
+}
+
+static void queue_wq_flush_work(struct printk_safe_seq_buf *s)
+{
+   if (printk_safe_irq_ready)
+   queue_work_on(smp_processor_id(),
+   system_wq,
+   >wq_flush_work);
 }
 
 /*
@@ -89,7 +103,6 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
/* The trailing '\0' is not counted into len. */
if (len >= sizeof(s->buffer) - 1) {
atomic_inc(>message_lost);
-   queue_flush_work(s);
return 0;
}
 
@@ -112,7 +125,6 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
if (atomic_cmpxchg(>len, len, len + add) != len)
goto again;
 
-   queue_flush_work(s);
return add;
 }
 
@@ -186,12 +198,10 @@ static void report_message_lost(struct 
printk_safe_seq_buf *s)
  * Flush data from the associated per-CPU buffer. The function
  * can be called either via IRQ work or independently.
  */
-static void __printk_safe_flush(struct irq_work *work)
+static void __printk_safe_flush(struct printk_safe_seq_buf *s)
 {
static raw_spinlock_t read_lock =
__RAW_SPIN_LOCK_INITIALIZER(read_lock);
-   struct printk_safe_seq_buf *s =
-   container_of(work, struct printk_safe_seq_buf, work);
unsigned long flags;
size_t len;
int i;
@@ -243,6 +253,22 @@ static void __printk_safe_flush(struct irq_work *work)
raw_spin_unlock_irqrestore(_lock, flags);
 }
 
+static void irq_flush_work_fn(struct irq_work *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+   __printk_safe_flush(s);
+}
+
+static void wq_flush_work_fn(struct work_struct *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf, wq_flush_work);
+
+   __printk_safe_flush(s);
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -256,9 +282,9 @@ void printk_safe_flush(void)
 
for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
-   __printk_safe_flush(_cpu(nmi_print_seq, cpu).work);
+   __printk_safe_flush(this_cpu_ptr(_print_seq));
 #endif
-   __printk_safe_flush(_cpu(safe_print_seq, cpu).work);
+   __printk_safe_flush(this_cpu_ptr(_print_seq));
}
 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/23/18 15:40), Sergey Senozhatsky wrote:
[..]
> Why do we even use irq_work for printk_safe?
> 
> Okay... So, how about this. For printk_safe we use system_wq for flushing.
> IOW, we flush from a task running exactly on the same CPU which hit printk
> recursion, not from IRQ. From vprintk_safe() recursion, we queue work on
> *that* CPU. Which gives us the following thing: if CPU stuck in
> console_unlock() loop with preemption disabled, then system_wq does not
> schedule on that CPU and we, thus, don't flush printk_safe buffer from that
> CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe
> buffer, printing extra messages from that CPU will not lock it up, because
> it's in preemptible context.
> 
> Thoughts?

A slightly reworked version:
a) Do not check console_locked
b) Do not have irq_work fast path for printk_safe buffer
 c) Which lets to union WQ/IRQ work structs - we use only IRQ work for
NMI buffers, and only WQ work for SAFE buffers
 d) And also to refactor the code

From: Sergey Senozhatsky 
Subject: [PATCH] printk/safe: use system_wq to flush printk_safe buffers

---
 kernel/printk/printk_safe.c | 52 ++---
 1 file changed, 40 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..6c8c82cedccb 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
@@ -49,7 +50,12 @@ static int printk_safe_irq_ready __read_mostly;
 struct printk_safe_seq_buf {
atomic_tlen;/* length of written data */
atomic_tmessage_lost;
-   struct irq_work work;   /* IRQ work that flushes the buffer */
+   union {
+   /* IRQ work that flushes NMI buffer */
+   struct irq_work irq_flush_work;
+   /* WQ work that flushes SAFE buffer */
+   struct work_struct  wq_flush_work;
+   };
unsigned char   buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,10 +67,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, 
nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
if (printk_safe_irq_ready)
-   irq_work_queue(>work);
+   irq_work_queue(>irq_flush_work);
+}
+
+static void queue_wq_flush_work(struct printk_safe_seq_buf *s)
+{
+   if (printk_safe_irq_ready)
+   queue_work_on(smp_processor_id(),
+   system_wq,
+   >wq_flush_work);
 }
 
 /*
@@ -89,7 +103,6 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
/* The trailing '\0' is not counted into len. */
if (len >= sizeof(s->buffer) - 1) {
atomic_inc(>message_lost);
-   queue_flush_work(s);
return 0;
}
 
@@ -112,7 +125,6 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
if (atomic_cmpxchg(>len, len, len + add) != len)
goto again;
 
-   queue_flush_work(s);
return add;
 }
 
@@ -186,12 +198,10 @@ static void report_message_lost(struct 
printk_safe_seq_buf *s)
  * Flush data from the associated per-CPU buffer. The function
  * can be called either via IRQ work or independently.
  */
-static void __printk_safe_flush(struct irq_work *work)
+static void __printk_safe_flush(struct printk_safe_seq_buf *s)
 {
static raw_spinlock_t read_lock =
__RAW_SPIN_LOCK_INITIALIZER(read_lock);
-   struct printk_safe_seq_buf *s =
-   container_of(work, struct printk_safe_seq_buf, work);
unsigned long flags;
size_t len;
int i;
@@ -243,6 +253,22 @@ static void __printk_safe_flush(struct irq_work *work)
raw_spin_unlock_irqrestore(_lock, flags);
 }
 
+static void irq_flush_work_fn(struct irq_work *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+   __printk_safe_flush(s);
+}
+
+static void wq_flush_work_fn(struct work_struct *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf, wq_flush_work);
+
+   __printk_safe_flush(s);
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -256,9 +282,9 @@ void printk_safe_flush(void)
 
for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
-   __printk_safe_flush(_cpu(nmi_print_seq, cpu).work);
+   __printk_safe_flush(this_cpu_ptr(_print_seq));
 #endif
-   __printk_safe_flush(_cpu(safe_print_seq, cpu).work);
+   __printk_safe_flush(this_cpu_ptr(_print_seq));
}
 }
 
@@ -300,6 +326,7 @@ 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
Hello,

On (01/21/18 23:15), Sergey Senozhatsky wrote:
[..]
> we have printk recursion from console drivers. it's redirected to
> printk_safe and we queue an IRQ work to flush the buffer
> 
>  printk
>   console_unlock
>call_console_drivers
> net_console
>  printk
>   printk_save -> irq_work queue
> 
> now console_unlock() enables local IRQs, we have the printk_safe
> flush. but printk_safe flush does not call into the console_unlock(),
> it uses printk_deferred() version of printk
> 
> IRQ work
> 
>  prink_safe_flush
>   printk_deferred -> irq_work queue
> 
> 
> so we schedule another IRQ work (deferred printk work), which eventually
> tries to lock console_sem
> 
> IRQ work
>  wake_up_klogd_work_func()
>   if (console_trylock())
>console_unlock()

Why do we even use irq_work for printk_safe?

Okay... So, how about this. For printk_safe we use system_wq for flushing.
IOW, we flush from a task running exactly on the same CPU which hit printk
recursion, not from IRQ. From vprintk_safe() recursion, we queue work on
*that* CPU. Which gives us the following thing: if CPU stuck in
console_unlock() loop with preemption disabled, then system_wq does not
schedule on that CPU and we, thus, don't flush printk_safe buffer from that
CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe
buffer, printing extra messages from that CPU will not lock it up, because
it's in preemptible context.

Thoughts?


Something like this:

From: Sergey Senozhatsky 
Subject: [PATCH] printk/safe: use slowpath flush for printk_safe

---
 kernel/printk/printk_safe.c | 53 -
 1 file changed, 48 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..c641853a5fa9 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "internal.h"
 
@@ -50,6 +52,7 @@ struct printk_safe_seq_buf {
atomic_tlen;/* length of written data */
atomic_tmessage_lost;
struct irq_work work;   /* IRQ work that flushes the buffer */
+   struct work_struct  slowpath_flush_work;
unsigned char   buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,12 +64,20 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, 
nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
if (printk_safe_irq_ready)
irq_work_queue(>work);
 }
 
+static void queue_slowpath_flush_work(struct printk_safe_seq_buf *s)
+{
+   if (printk_safe_irq_ready)
+   queue_work_on(smp_processor_id(),
+   system_wq,
+   >slowpath_flush_work);
+}
+
 /*
  * Add a message to per-CPU context-dependent buffer. NMI and printk-safe
  * have dedicated buffers, because otherwise printk-safe preempted by
@@ -89,7 +100,7 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
/* The trailing '\0' is not counted into len. */
if (len >= sizeof(s->buffer) - 1) {
atomic_inc(>message_lost);
-   queue_flush_work(s);
+   queue_irq_flush_work(s);
return 0;
}
 
@@ -112,7 +123,6 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
if (atomic_cmpxchg(>len, len, len + add) != len)
goto again;
 
-   queue_flush_work(s);
return add;
 }
 
@@ -243,6 +253,35 @@ static void __printk_safe_flush(struct irq_work *work)
raw_spin_unlock_irqrestore(_lock, flags);
 }
 
+/* NMI buffers are always flushed */
+static void flush_nmi_buffer(struct irq_work *work)
+{
+   __printk_safe_flush(work);
+}
+
+/* printk_safe buffers flushing, on the contrary, can be postponed */
+static void flush_printk_safe_buffer(struct irq_work *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf, work);
+
+   if (is_console_locked()) {
+   queue_slowpath_flush_work(s);
+   return;
+   }
+
+   __printk_safe_flush(work);
+}
+
+static void slowpath_flush_work_fn(struct work_struct *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf,
+   slowpath_flush_work);
+
+   __printk_safe_flush(>work);
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -300,6 +339,7 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, 
va_list args)
 {
struct printk_safe_seq_buf *s = this_cpu_ptr(_print_seq);
 
+   queue_irq_flush_work(s);
return printk_safe_log_store(s, fmt, args);
 }
 
@@ -343,6 +383,7 @@ 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
Hello,

On (01/21/18 23:15), Sergey Senozhatsky wrote:
[..]
> we have printk recursion from console drivers. it's redirected to
> printk_safe and we queue an IRQ work to flush the buffer
> 
>  printk
>   console_unlock
>call_console_drivers
> net_console
>  printk
>   printk_save -> irq_work queue
> 
> now console_unlock() enables local IRQs, we have the printk_safe
> flush. but printk_safe flush does not call into the console_unlock(),
> it uses printk_deferred() version of printk
> 
> IRQ work
> 
>  prink_safe_flush
>   printk_deferred -> irq_work queue
> 
> 
> so we schedule another IRQ work (deferred printk work), which eventually
> tries to lock console_sem
> 
> IRQ work
>  wake_up_klogd_work_func()
>   if (console_trylock())
>console_unlock()

Why do we even use irq_work for printk_safe?

Okay... So, how about this. For printk_safe we use system_wq for flushing.
IOW, we flush from a task running exactly on the same CPU which hit printk
recursion, not from IRQ. From vprintk_safe() recursion, we queue work on
*that* CPU. Which gives us the following thing: if CPU stuck in
console_unlock() loop with preemption disabled, then system_wq does not
schedule on that CPU and we, thus, don't flush printk_safe buffer from that
CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe
buffer, printing extra messages from that CPU will not lock it up, because
it's in preemptible context.

Thoughts?


Something like this:

From: Sergey Senozhatsky 
Subject: [PATCH] printk/safe: use slowpath flush for printk_safe

---
 kernel/printk/printk_safe.c | 53 -
 1 file changed, 48 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..c641853a5fa9 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "internal.h"
 
@@ -50,6 +52,7 @@ struct printk_safe_seq_buf {
atomic_tlen;/* length of written data */
atomic_tmessage_lost;
struct irq_work work;   /* IRQ work that flushes the buffer */
+   struct work_struct  slowpath_flush_work;
unsigned char   buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,12 +64,20 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, 
nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
if (printk_safe_irq_ready)
irq_work_queue(>work);
 }
 
+static void queue_slowpath_flush_work(struct printk_safe_seq_buf *s)
+{
+   if (printk_safe_irq_ready)
+   queue_work_on(smp_processor_id(),
+   system_wq,
+   >slowpath_flush_work);
+}
+
 /*
  * Add a message to per-CPU context-dependent buffer. NMI and printk-safe
  * have dedicated buffers, because otherwise printk-safe preempted by
@@ -89,7 +100,7 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
/* The trailing '\0' is not counted into len. */
if (len >= sizeof(s->buffer) - 1) {
atomic_inc(>message_lost);
-   queue_flush_work(s);
+   queue_irq_flush_work(s);
return 0;
}
 
@@ -112,7 +123,6 @@ static __printf(2, 0) int printk_safe_log_store(struct 
printk_safe_seq_buf *s,
if (atomic_cmpxchg(>len, len, len + add) != len)
goto again;
 
-   queue_flush_work(s);
return add;
 }
 
@@ -243,6 +253,35 @@ static void __printk_safe_flush(struct irq_work *work)
raw_spin_unlock_irqrestore(_lock, flags);
 }
 
+/* NMI buffers are always flushed */
+static void flush_nmi_buffer(struct irq_work *work)
+{
+   __printk_safe_flush(work);
+}
+
+/* printk_safe buffers flushing, on the contrary, can be postponed */
+static void flush_printk_safe_buffer(struct irq_work *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf, work);
+
+   if (is_console_locked()) {
+   queue_slowpath_flush_work(s);
+   return;
+   }
+
+   __printk_safe_flush(work);
+}
+
+static void slowpath_flush_work_fn(struct work_struct *work)
+{
+   struct printk_safe_seq_buf *s =
+   container_of(work, struct printk_safe_seq_buf,
+   slowpath_flush_work);
+
+   __printk_safe_flush(>work);
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -300,6 +339,7 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, 
va_list args)
 {
struct printk_safe_seq_buf *s = this_cpu_ptr(_print_seq);
 
+   queue_irq_flush_work(s);
return printk_safe_log_store(s, fmt, args);
 }
 
@@ -343,6 +383,7 @@ static __printf(1, 0) int 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/22/18 19:28), Sergey Senozhatsky wrote:
> On (01/22/18 17:56), Sergey Senozhatsky wrote:
> [..]
> > Assume the following,
> 
> But more importantly we are missing another huge thing - console_unlock().

IOW, not every console_unlock() is from vprintk_emit(). We can have
console_trylock() -> console_unlock() being from non-preemptible context,
etc. And then irq work to flush printk_safe -> printk_deferred all the time.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/22/18 19:28), Sergey Senozhatsky wrote:
> On (01/22/18 17:56), Sergey Senozhatsky wrote:
> [..]
> > Assume the following,
> 
> But more importantly we are missing another huge thing - console_unlock().

IOW, not every console_unlock() is from vprintk_emit(). We can have
console_trylock() -> console_unlock() being from non-preemptible context,
etc. And then irq work to flush printk_safe -> printk_deferred all the time.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/22/18 17:56), Sergey Senozhatsky wrote:
[..]
> Assume the following,

But more importantly we are missing another huge thing - console_unlock().

Suppose:

console_lock();
<< preemption >>
printk
printk
..
printk
console_unlock()
 for (;;) {
call_console_drivers()
   dump_stack
   queue IRQ work

IRQ work >>
   flush_printk_safe
   printk_deferred()
   ...
   printk_deferred()
<< iret
 }

This should explode: sleepable console_unlock() may reschedule,
printk_safe flush bypasses recursion checks.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/22/18 17:56), Sergey Senozhatsky wrote:
[..]
> Assume the following,

But more importantly we are missing another huge thing - console_unlock().

Suppose:

console_lock();
<< preemption >>
printk
printk
..
printk
console_unlock()
 for (;;) {
call_console_drivers()
   dump_stack
   queue IRQ work

IRQ work >>
   flush_printk_safe
   printk_deferred()
   ...
   printk_deferred()
<< iret
 }

This should explode: sleepable console_unlock() may reschedule,
printk_safe flush bypasses recursion checks.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/21/18 16:04), Steven Rostedt wrote:
[..]
> > The problem is that we flush printk_safe right when console_unlock() 
> > printing
> > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
> > were enabled in the first place when the CPU went to console_unlock()].
> > This forces that CPU to loop in console_unlock() as long as we have
> > printk-s coming from call_console_drivers(). But we probably can postpone
> > printk_safe flush. Basically, we can declare a new rule - we don't flush
> > printk_safe buffer as long as console_sem is locked. Because this is how
> > that printing CPU stuck in the console_unlock() printing loop. printk_safe
> > buffer is very important when it comes to storing a non-repetitive stuff, 
> > like
> > a lockdep splat, which is a single shot event. But the more repetitive the
> > message is, like millions of similar kmalloc() dump_stack()-s over and over
> > again, the less value in it. We should have printk_safe buffer big enough 
> > for
> > important info, like a lockdep splat, but millions of similar kmalloc()
> > messages are pretty invaluable - one is already enough, we can drop the 
> > rest.
> > And we should not flush new messages while there is a CPU looping in
> > console_unlock(), because it already has messages to print, which were
> > log_store()-ed the normal way.
> 
> The above is really hard to read without any capitalization. Everything
> seems to be a run-on sentence and gives me a head ache. So you lost me
> there.

Apologies. Will improve.

> > This is where the "postpone thing" jumps in. so how do we postpone 
> > printk_safe
> > flush.
> > 
> > We can't console_trylock()/console_unlock() in printk_safe flush code.
> > But there is a `console_locked' flag and is_console_locked() function which
> > tell us if the console_sem is locked. As long as we are in console_unlock()
> > printing loop that flag is set, even if we enabled local IRQs and 
> > printk_safe
> > flush work arrived. So now printk_safe flush does extra check and does
> > not flush printk_safe buffer content as long as someone is currently
> > printing or soon will start printing. But we need to take extra step and
> > to re-queue flush on CPUs that did postpone it [console_unlock() can
> > reschedule]. So now we flush only when printing CPU printed all pending
> > logbuf messages, hit the "console_seq == log_next_seq" and up()
> > console_sem. This sets a boundary -- no matter how many times during the
> > current printing loop we called console drivers and how many times those
> > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.
> 
> Another big paragraph with no capitals (besides macros and CPU ;-)

I walked through it and mostly "fixed" your head ache :)

> I guess this is what it is like when people listen to me talk too fast.

Absolutely!!!

> > IOW, what we have now, looks like this:
> > 
> > a) printk_safe is for important stuff, we don't guarantee that a flood
> >of messages will be preserved.
> > 
> > b) we extend the previously existing "will flush messages later on from
> >a safer context" and now we also consider console_unlock() printing loop
> >as unsafe context. so the unsafe context it's not only the one that can
> >deadlock, but also the one that can lockup CPU in a printing loop because
> >of recursive printk messages.
> 
> Sure.
> 
> > 
> > 
> > so this
> > 
> >  printk
> >   console_unlock
> >   {
> >for (;;) {
> >  call_console_drivers
> >   net_console
> >printk
> > printk_save -> irq_work queue
> > 
> >IRQ work
> >  prink_safe_flush
> >printk_deferred -> log_store()
> >iret
> > }
> > up();
> >   }
> > 
> > 
> >// which can never break out, because we can always append new messages
> >// from prink_safe_flush.
> > 
> > becomes this
> > 
> > printk
> >   console_unlock
> >   {
> >for (;;) {
> >  call_console_drivers
> >   net_console
> >printk
> > printk_save -> irq_work queue
> > 
> > }
> > up();
> > 
> >   IRQ work
> >prink_safe_flush
> > printk_deferred -> log_store()
> >   iret
> > }
> 
> But we do eventually send this data out to the consoles, and if the
> consoles cause more printks, wouldn't this still never end?

Right. But not immediately. We wait for all pending messages to be evicted
first (and up()) and we limit the amount of data that we flush. So at least
it's not exponential anymore: every line that we print does not log_store()
a whole new dump_stack() of lines. Which is still miles away from "a perfect
solution", tho. But limiting the number of lines we print recursive is not
much better.

First, we don't know how many lines we want to flush from printk_safe.
And having a knob indicates that no one ever will do it right.

Second, hand off can play games with it.

Assume the following,

- I set `recursion_max' to 200. Which looks reasonable to me.
  Then I 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-22 Thread Sergey Senozhatsky
On (01/21/18 16:04), Steven Rostedt wrote:
[..]
> > The problem is that we flush printk_safe right when console_unlock() 
> > printing
> > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
> > were enabled in the first place when the CPU went to console_unlock()].
> > This forces that CPU to loop in console_unlock() as long as we have
> > printk-s coming from call_console_drivers(). But we probably can postpone
> > printk_safe flush. Basically, we can declare a new rule - we don't flush
> > printk_safe buffer as long as console_sem is locked. Because this is how
> > that printing CPU stuck in the console_unlock() printing loop. printk_safe
> > buffer is very important when it comes to storing a non-repetitive stuff, 
> > like
> > a lockdep splat, which is a single shot event. But the more repetitive the
> > message is, like millions of similar kmalloc() dump_stack()-s over and over
> > again, the less value in it. We should have printk_safe buffer big enough 
> > for
> > important info, like a lockdep splat, but millions of similar kmalloc()
> > messages are pretty invaluable - one is already enough, we can drop the 
> > rest.
> > And we should not flush new messages while there is a CPU looping in
> > console_unlock(), because it already has messages to print, which were
> > log_store()-ed the normal way.
> 
> The above is really hard to read without any capitalization. Everything
> seems to be a run-on sentence and gives me a head ache. So you lost me
> there.

Apologies. Will improve.

> > This is where the "postpone thing" jumps in. so how do we postpone 
> > printk_safe
> > flush.
> > 
> > We can't console_trylock()/console_unlock() in printk_safe flush code.
> > But there is a `console_locked' flag and is_console_locked() function which
> > tell us if the console_sem is locked. As long as we are in console_unlock()
> > printing loop that flag is set, even if we enabled local IRQs and 
> > printk_safe
> > flush work arrived. So now printk_safe flush does extra check and does
> > not flush printk_safe buffer content as long as someone is currently
> > printing or soon will start printing. But we need to take extra step and
> > to re-queue flush on CPUs that did postpone it [console_unlock() can
> > reschedule]. So now we flush only when printing CPU printed all pending
> > logbuf messages, hit the "console_seq == log_next_seq" and up()
> > console_sem. This sets a boundary -- no matter how many times during the
> > current printing loop we called console drivers and how many times those
> > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.
> 
> Another big paragraph with no capitals (besides macros and CPU ;-)

I walked through it and mostly "fixed" your head ache :)

> I guess this is what it is like when people listen to me talk too fast.

Absolutely!!!

> > IOW, what we have now, looks like this:
> > 
> > a) printk_safe is for important stuff, we don't guarantee that a flood
> >of messages will be preserved.
> > 
> > b) we extend the previously existing "will flush messages later on from
> >a safer context" and now we also consider console_unlock() printing loop
> >as unsafe context. so the unsafe context it's not only the one that can
> >deadlock, but also the one that can lockup CPU in a printing loop because
> >of recursive printk messages.
> 
> Sure.
> 
> > 
> > 
> > so this
> > 
> >  printk
> >   console_unlock
> >   {
> >for (;;) {
> >  call_console_drivers
> >   net_console
> >printk
> > printk_save -> irq_work queue
> > 
> >IRQ work
> >  prink_safe_flush
> >printk_deferred -> log_store()
> >iret
> > }
> > up();
> >   }
> > 
> > 
> >// which can never break out, because we can always append new messages
> >// from prink_safe_flush.
> > 
> > becomes this
> > 
> > printk
> >   console_unlock
> >   {
> >for (;;) {
> >  call_console_drivers
> >   net_console
> >printk
> > printk_save -> irq_work queue
> > 
> > }
> > up();
> > 
> >   IRQ work
> >prink_safe_flush
> > printk_deferred -> log_store()
> >   iret
> > }
> 
> But we do eventually send this data out to the consoles, and if the
> consoles cause more printks, wouldn't this still never end?

Right. But not immediately. We wait for all pending messages to be evicted
first (and up()) and we limit the amount of data that we flush. So at least
it's not exponential anymore: every line that we print does not log_store()
a whole new dump_stack() of lines. Which is still miles away from "a perfect
solution", tho. But limiting the number of lines we print recursive is not
much better.

First, we don't know how many lines we want to flush from printk_safe.
And having a knob indicates that no one ever will do it right.

Second, hand off can play games with it.

Assume the following,

- I set `recursion_max' to 200. Which looks reasonable to me.
  Then I 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-21 Thread Steven Rostedt
On Sun, 21 Jan 2018 23:15:21 +0900
Sergey Senozhatsky  wrote:

> so fix the console drivers ;)

Totally agree!

> 
> 
> 
> 
> just kidding. ok...

Darn it! ;-)

> the problem is that we flush printk_safe right when console_unlock() printing
> loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
> were enabled in the first place when the CPU went to console_unlock()].
> this forces that CPU to loop in console_unlock() as long as we have
> printk-s coming from call_console_drivers(). but we probably can postpone
> printk_safe flush. basically, we can declare a new rule - we don't flush
> printk_safe buffer as long as console_sem is locked. because this is how
> that printing CPU stuck in the console_unlock() printing loop. printk_safe
> buffer is very important when it comes to storing a non-repetitive stuff, like
> a lockdep splat, which is a single shot event. but the more repetitive the
> message is, like millions of similar kmalloc() dump_stack()-s over and over
> again, the less value in it. we should have printk_safe buffer big enough for
> important info, like a lockdep splat, but millions of similar kmalloc()
> messages are pretty invaluable - one is already enough, we can drop the rest.
> and we should not flush new messages while there is a CPU looping in
> console_unlock(), because it already has messages to print, which were
> log_store()-ed the normal way.

The above is really hard to read without any capitalization. Everything
seems to be a run-on sentence and gives me a head ache. So you lost me
there.

> 
> this is where the "postpone thing" jumps in. so how do we postpone printk_safe
> flush.
> 
> we can't console_trylock()/console_unlock() in printk_safe flush code.
> but there is a `console_locked' flag and is_console_locked() function which
> tell us if the console_sem is locked. as long as we are in console_unlock()
> printing loop that flag is set, even if we enabled local IRQs and printk_safe
> flush work arrived. so now printk_safe flush does extra check and does
> not flush printk_safe buffer content as long as someone is currently
> printing or soon will start printing. but we need to take extra step and
> to re-queue flush on CPUs that did postpone it [console_unlock() can
> reschedule]. so now we flush only when printing CPU printed all pending
> logbuf messages, hit the "console_seq == log_next_seq" and up()
> console_sem. this sets a boundary -- no matter how many times during the
> current printing loop we called console drivers and how many times those
> drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.

Another big paragraph with no capitals (besides macros and CPU ;-)

I guess this is what it is like when people listen to me talk too fast.


> 
> 
> IOW, what we have now, looks like this:
> 
> a) printk_safe is for important stuff, we don't guarantee that a flood
>of messages will be preserved.
> 
> b) we extend the previously existing "will flush messages later on from
>a safer context" and now we also consider console_unlock() printing loop
>as unsafe context. so the unsafe context it's not only the one that can
>deadlock, but also the one that can lockup CPU in a printing loop because
>of recursive printk messages.

Sure.

> 
> 
> so this
> 
>  printk
>   console_unlock
>   {
>for (;;) {
>  call_console_drivers
>   net_console
>printk
> printk_save -> irq_work queue
> 
>  IRQ work
>prink_safe_flush
>  printk_deferred -> log_store()
>iret
> }
> up();
>   }
> 
> 
>// which can never break out, because we can always append new messages
>// from prink_safe_flush.
> 
> becomes this
> 
> printk
>   console_unlock
>   {
>for (;;) {
>  call_console_drivers
>   net_console
>printk
> printk_save -> irq_work queue
> 
> }
> up();
> 
>   IRQ work
>prink_safe_flush
> printk_deferred -> log_store()
>   iret
> }

But we do eventually send this data out to the consoles, and if the
consoles cause more printks, wouldn't this still never end?

> 
> 
> 
> something completely untested, sketchy and ugly.
> 
> ---
> 
>  kernel/printk/internal.h|  2 ++
>  kernel/printk/printk.c  |  1 +
>  kernel/printk/printk_safe.c | 37 +++--
>  3 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
> index 2a7d04049af4..e85517818a49 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
> args);
>  void __printk_safe_enter(void);
>  void __printk_safe_exit(void);
>  
> +void printk_safe_requeue_flushing(void);
> +
>  #define printk_safe_enter_irqsave(flags) \
>   do {\
>   local_irq_save(flags);  \
> diff --git 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-21 Thread Steven Rostedt
On Sun, 21 Jan 2018 23:15:21 +0900
Sergey Senozhatsky  wrote:

> so fix the console drivers ;)

Totally agree!

> 
> 
> 
> 
> just kidding. ok...

Darn it! ;-)

> the problem is that we flush printk_safe right when console_unlock() printing
> loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
> were enabled in the first place when the CPU went to console_unlock()].
> this forces that CPU to loop in console_unlock() as long as we have
> printk-s coming from call_console_drivers(). but we probably can postpone
> printk_safe flush. basically, we can declare a new rule - we don't flush
> printk_safe buffer as long as console_sem is locked. because this is how
> that printing CPU stuck in the console_unlock() printing loop. printk_safe
> buffer is very important when it comes to storing a non-repetitive stuff, like
> a lockdep splat, which is a single shot event. but the more repetitive the
> message is, like millions of similar kmalloc() dump_stack()-s over and over
> again, the less value in it. we should have printk_safe buffer big enough for
> important info, like a lockdep splat, but millions of similar kmalloc()
> messages are pretty invaluable - one is already enough, we can drop the rest.
> and we should not flush new messages while there is a CPU looping in
> console_unlock(), because it already has messages to print, which were
> log_store()-ed the normal way.

The above is really hard to read without any capitalization. Everything
seems to be a run-on sentence and gives me a head ache. So you lost me
there.

> 
> this is where the "postpone thing" jumps in. so how do we postpone printk_safe
> flush.
> 
> we can't console_trylock()/console_unlock() in printk_safe flush code.
> but there is a `console_locked' flag and is_console_locked() function which
> tell us if the console_sem is locked. as long as we are in console_unlock()
> printing loop that flag is set, even if we enabled local IRQs and printk_safe
> flush work arrived. so now printk_safe flush does extra check and does
> not flush printk_safe buffer content as long as someone is currently
> printing or soon will start printing. but we need to take extra step and
> to re-queue flush on CPUs that did postpone it [console_unlock() can
> reschedule]. so now we flush only when printing CPU printed all pending
> logbuf messages, hit the "console_seq == log_next_seq" and up()
> console_sem. this sets a boundary -- no matter how many times during the
> current printing loop we called console drivers and how many times those
> drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.

Another big paragraph with no capitals (besides macros and CPU ;-)

I guess this is what it is like when people listen to me talk too fast.


> 
> 
> IOW, what we have now, looks like this:
> 
> a) printk_safe is for important stuff, we don't guarantee that a flood
>of messages will be preserved.
> 
> b) we extend the previously existing "will flush messages later on from
>a safer context" and now we also consider console_unlock() printing loop
>as unsafe context. so the unsafe context it's not only the one that can
>deadlock, but also the one that can lockup CPU in a printing loop because
>of recursive printk messages.

Sure.

> 
> 
> so this
> 
>  printk
>   console_unlock
>   {
>for (;;) {
>  call_console_drivers
>   net_console
>printk
> printk_save -> irq_work queue
> 
>  IRQ work
>prink_safe_flush
>  printk_deferred -> log_store()
>iret
> }
> up();
>   }
> 
> 
>// which can never break out, because we can always append new messages
>// from prink_safe_flush.
> 
> becomes this
> 
> printk
>   console_unlock
>   {
>for (;;) {
>  call_console_drivers
>   net_console
>printk
> printk_save -> irq_work queue
> 
> }
> up();
> 
>   IRQ work
>prink_safe_flush
> printk_deferred -> log_store()
>   iret
> }

But we do eventually send this data out to the consoles, and if the
consoles cause more printks, wouldn't this still never end?

> 
> 
> 
> something completely untested, sketchy and ugly.
> 
> ---
> 
>  kernel/printk/internal.h|  2 ++
>  kernel/printk/printk.c  |  1 +
>  kernel/printk/printk_safe.c | 37 +++--
>  3 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
> index 2a7d04049af4..e85517818a49 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
> args);
>  void __printk_safe_enter(void);
>  void __printk_safe_exit(void);
>  
> +void printk_safe_requeue_flushing(void);
> +
>  #define printk_safe_enter_irqsave(flags) \
>   do {\
>   local_irq_save(flags);  \
> diff --git a/kernel/printk/printk.c 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-21 Thread Sergey Senozhatsky
On (01/20/18 10:49), Steven Rostedt wrote:
[..]
> > printks from console_unlock()->call_console_drivers() are redirected
> > to printk_safe buffer. we need irq_work on that CPU to flush its
> > printk_safe buffer.
> 
> So is the issue that we keep triggering this irq work then? Then this
> solution does seem to be one that would work. Because after x amount of
> recursive printks (printk called by printk) it would just stop printing
> them, and end the irq work.
> 
> Perhaps what Tejun is seeing is:
> 
>  printk()
>net_console()
>  printk() --> redirected to irq work
> 
>  
>   printk
> net_console()
>   printk() --> redirected to another irq work
> 
> and so on and so on.

it's a bit trickier than that, I think.

we have printk recursion from console drivers. it's redirected to
printk_safe and we queue an IRQ work to flush the buffer

 printk
  console_unlock
   call_console_drivers
net_console
 printk
  printk_save -> irq_work queue

now console_unlock() enables local IRQs, we have the printk_safe
flush. but printk_safe flush does not call into the console_unlock(),
it uses printk_deferred() version of printk

IRQ work

 prink_safe_flush
  printk_deferred -> irq_work queue


so we schedule another IRQ work (deferred printk work), which eventually
tries to lock console_sem

IRQ work
 wake_up_klogd_work_func()
  if (console_trylock())
   console_unlock()

if it succeeds then it goes to console_unlock(), where console driver
can cause another printk recursion. but, once again, it will be
redirected to printk_safe buffer first. if it fails then we have either
the original CPU to print out those irq_work messages, which is sort
of bad, or another CPU which already acquired the console_sem and will
print out.

> This solution would need to be tweaked to add a timer to allow only so
> many nested printks in a given time. Otherwise it too would be an issue:
[..]
> > how are we going to distinguish between lockdep splats, for instance,
> > or WARNs from call_console_drivers() -> foo_write(), which are valuable,
> > and kmalloc() print outs, which might be less valuable? are we going to
> 
> The problem is that printk causing more printks is extremely dangerous,
> and ANY printk that is caused by a printk is of equal value, whether it
> is a console driver running out of memory or a lockdep splat. And
> the chances of having two hit at the same time is extremely low.

so fix the console drivers ;)




just kidding. ok...
the problem is that we flush printk_safe right when console_unlock() printing
loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
were enabled in the first place when the CPU went to console_unlock()].
this forces that CPU to loop in console_unlock() as long as we have
printk-s coming from call_console_drivers(). but we probably can postpone
printk_safe flush. basically, we can declare a new rule - we don't flush
printk_safe buffer as long as console_sem is locked. because this is how
that printing CPU stuck in the console_unlock() printing loop. printk_safe
buffer is very important when it comes to storing a non-repetitive stuff, like
a lockdep splat, which is a single shot event. but the more repetitive the
message is, like millions of similar kmalloc() dump_stack()-s over and over
again, the less value in it. we should have printk_safe buffer big enough for
important info, like a lockdep splat, but millions of similar kmalloc()
messages are pretty invaluable - one is already enough, we can drop the rest.
and we should not flush new messages while there is a CPU looping in
console_unlock(), because it already has messages to print, which were
log_store()-ed the normal way.

this is where the "postpone thing" jumps in. so how do we postpone printk_safe
flush.

we can't console_trylock()/console_unlock() in printk_safe flush code.
but there is a `console_locked' flag and is_console_locked() function which
tell us if the console_sem is locked. as long as we are in console_unlock()
printing loop that flag is set, even if we enabled local IRQs and printk_safe
flush work arrived. so now printk_safe flush does extra check and does
not flush printk_safe buffer content as long as someone is currently
printing or soon will start printing. but we need to take extra step and
to re-queue flush on CPUs that did postpone it [console_unlock() can
reschedule]. so now we flush only when printing CPU printed all pending
logbuf messages, hit the "console_seq == log_next_seq" and up()
console_sem. this sets a boundary -- no matter how many times during the
current printing loop we called console drivers and how many times those
drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.


IOW, what we have now, looks like this:

a) printk_safe is for important stuff, we don't guarantee that a flood
   of messages will be preserved.

b) we extend the previously existing "will flush messages later on from
   a safer context" and now we also 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-21 Thread Sergey Senozhatsky
On (01/20/18 10:49), Steven Rostedt wrote:
[..]
> > printks from console_unlock()->call_console_drivers() are redirected
> > to printk_safe buffer. we need irq_work on that CPU to flush its
> > printk_safe buffer.
> 
> So is the issue that we keep triggering this irq work then? Then this
> solution does seem to be one that would work. Because after x amount of
> recursive printks (printk called by printk) it would just stop printing
> them, and end the irq work.
> 
> Perhaps what Tejun is seeing is:
> 
>  printk()
>net_console()
>  printk() --> redirected to irq work
> 
>  
>   printk
> net_console()
>   printk() --> redirected to another irq work
> 
> and so on and so on.

it's a bit trickier than that, I think.

we have printk recursion from console drivers. it's redirected to
printk_safe and we queue an IRQ work to flush the buffer

 printk
  console_unlock
   call_console_drivers
net_console
 printk
  printk_save -> irq_work queue

now console_unlock() enables local IRQs, we have the printk_safe
flush. but printk_safe flush does not call into the console_unlock(),
it uses printk_deferred() version of printk

IRQ work

 prink_safe_flush
  printk_deferred -> irq_work queue


so we schedule another IRQ work (deferred printk work), which eventually
tries to lock console_sem

IRQ work
 wake_up_klogd_work_func()
  if (console_trylock())
   console_unlock()

if it succeeds then it goes to console_unlock(), where console driver
can cause another printk recursion. but, once again, it will be
redirected to printk_safe buffer first. if it fails then we have either
the original CPU to print out those irq_work messages, which is sort
of bad, or another CPU which already acquired the console_sem and will
print out.

> This solution would need to be tweaked to add a timer to allow only so
> many nested printks in a given time. Otherwise it too would be an issue:
[..]
> > how are we going to distinguish between lockdep splats, for instance,
> > or WARNs from call_console_drivers() -> foo_write(), which are valuable,
> > and kmalloc() print outs, which might be less valuable? are we going to
> 
> The problem is that printk causing more printks is extremely dangerous,
> and ANY printk that is caused by a printk is of equal value, whether it
> is a console driver running out of memory or a lockdep splat. And
> the chances of having two hit at the same time is extremely low.

so fix the console drivers ;)




just kidding. ok...
the problem is that we flush printk_safe right when console_unlock() printing
loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
were enabled in the first place when the CPU went to console_unlock()].
this forces that CPU to loop in console_unlock() as long as we have
printk-s coming from call_console_drivers(). but we probably can postpone
printk_safe flush. basically, we can declare a new rule - we don't flush
printk_safe buffer as long as console_sem is locked. because this is how
that printing CPU stuck in the console_unlock() printing loop. printk_safe
buffer is very important when it comes to storing a non-repetitive stuff, like
a lockdep splat, which is a single shot event. but the more repetitive the
message is, like millions of similar kmalloc() dump_stack()-s over and over
again, the less value in it. we should have printk_safe buffer big enough for
important info, like a lockdep splat, but millions of similar kmalloc()
messages are pretty invaluable - one is already enough, we can drop the rest.
and we should not flush new messages while there is a CPU looping in
console_unlock(), because it already has messages to print, which were
log_store()-ed the normal way.

this is where the "postpone thing" jumps in. so how do we postpone printk_safe
flush.

we can't console_trylock()/console_unlock() in printk_safe flush code.
but there is a `console_locked' flag and is_console_locked() function which
tell us if the console_sem is locked. as long as we are in console_unlock()
printing loop that flag is set, even if we enabled local IRQs and printk_safe
flush work arrived. so now printk_safe flush does extra check and does
not flush printk_safe buffer content as long as someone is currently
printing or soon will start printing. but we need to take extra step and
to re-queue flush on CPUs that did postpone it [console_unlock() can
reschedule]. so now we flush only when printing CPU printed all pending
logbuf messages, hit the "console_seq == log_next_seq" and up()
console_sem. this sets a boundary -- no matter how many times during the
current printing loop we called console drivers and how many times those
drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.


IOW, what we have now, looks like this:

a) printk_safe is for important stuff, we don't guarantee that a flood
   of messages will be preserved.

b) we extend the previously existing "will flush messages later on from
   a safer context" and now we also 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-20 Thread Steven Rostedt
On Sat, 20 Jan 2018 16:14:02 +0900
Sergey Senozhatsky  wrote:

> [..]
> >  asmlinkage int vprintk_emit(int facility, int level,
> > const char *dict, size_t dictlen,
> > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  
> > /* This stops the holder of console_sem just where we want him */
> > logbuf_lock_irqsave(flags);
> > +
> > +   if (recursion_check_test()) {
> > +   /* A printk happened within a printk at the same context */
> > +   if (this_cpu_inc_return(recursion_count) > recursion_max) {
> > +   atomic_inc(_overflow);
> > +   logbuf_unlock_irqrestore(flags);
> > +   printed_len = 0;
> > +   goto out;
> > +   }
> > +   }  
> 
> didn't have time to look at this carefully, but is this possible?
> 
> printks from console_unlock()->call_console_drivers() are redirected
> to printk_safe buffer. we need irq_work on that CPU to flush its
> printk_safe buffer.

So is the issue that we keep triggering this irq work then? Then this
solution does seem to be one that would work. Because after x amount of
recursive printks (printk called by printk) it would just stop printing
them, and end the irq work.

Perhaps what Tejun is seeing is:

 printk()
   net_console()
 printk() --> redirected to irq work

 
  printk
net_console()
  printk() --> redirected to another irq work

and so on and so on.

This solution would need to be tweaked to add a timer to allow only so
many nested printks in a given time. Otherwise it too would be an issue:

 printk()
   net_console()
 printk() -> redirected
 printk() -> throttled

But the first x printk()s would still be redirected. and that x gets
reset in this current patch at he end of the outermost printk. Perhaps
it shouldn't reset x, or it can flush the printk safe buffer first. Is
there a reason that console_unlock() doesn't flush the
printk_safe_buffer? With a throttle number and flushing the
printk_safe_buffer, that should solve the issue Tejun explained.


> 
> how are we going to distinguish between lockdep splats, for instance,
> or WARNs from call_console_drivers() -> foo_write(), which are valuable,
> and kmalloc() print outs, which might be less valuable? are we going to

The problem is that printk causing more printks is extremely dangerous,
and ANY printk that is caused by a printk is of equal value, whether it
is a console driver running out of memory or a lockdep splat. And
the chances of having two hit at the same time is extremely low.

> lose all of them now? then we can do a much simpler thing - steal one
> bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which
> will be set around call_console_drivers(). vprintk_func() would redirect
> printks to vprintk_noop(fmt, args), which will do nothing.

Not sure what you mean here. Have some pseudo code to demonstrate with?

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-20 Thread Steven Rostedt
On Sat, 20 Jan 2018 16:14:02 +0900
Sergey Senozhatsky  wrote:

> [..]
> >  asmlinkage int vprintk_emit(int facility, int level,
> > const char *dict, size_t dictlen,
> > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  
> > /* This stops the holder of console_sem just where we want him */
> > logbuf_lock_irqsave(flags);
> > +
> > +   if (recursion_check_test()) {
> > +   /* A printk happened within a printk at the same context */
> > +   if (this_cpu_inc_return(recursion_count) > recursion_max) {
> > +   atomic_inc(_overflow);
> > +   logbuf_unlock_irqrestore(flags);
> > +   printed_len = 0;
> > +   goto out;
> > +   }
> > +   }  
> 
> didn't have time to look at this carefully, but is this possible?
> 
> printks from console_unlock()->call_console_drivers() are redirected
> to printk_safe buffer. we need irq_work on that CPU to flush its
> printk_safe buffer.

So is the issue that we keep triggering this irq work then? Then this
solution does seem to be one that would work. Because after x amount of
recursive printks (printk called by printk) it would just stop printing
them, and end the irq work.

Perhaps what Tejun is seeing is:

 printk()
   net_console()
 printk() --> redirected to irq work

 
  printk
net_console()
  printk() --> redirected to another irq work

and so on and so on.

This solution would need to be tweaked to add a timer to allow only so
many nested printks in a given time. Otherwise it too would be an issue:

 printk()
   net_console()
 printk() -> redirected
 printk() -> throttled

But the first x printk()s would still be redirected. and that x gets
reset in this current patch at he end of the outermost printk. Perhaps
it shouldn't reset x, or it can flush the printk safe buffer first. Is
there a reason that console_unlock() doesn't flush the
printk_safe_buffer? With a throttle number and flushing the
printk_safe_buffer, that should solve the issue Tejun explained.


> 
> how are we going to distinguish between lockdep splats, for instance,
> or WARNs from call_console_drivers() -> foo_write(), which are valuable,
> and kmalloc() print outs, which might be less valuable? are we going to

The problem is that printk causing more printks is extremely dangerous,
and ANY printk that is caused by a printk is of equal value, whether it
is a console driver running out of memory or a lockdep splat. And
the chances of having two hit at the same time is extremely low.

> lose all of them now? then we can do a much simpler thing - steal one
> bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which
> will be set around call_console_drivers(). vprintk_func() would redirect
> printks to vprintk_noop(fmt, args), which will do nothing.

Not sure what you mean here. Have some pseudo code to demonstrate with?

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-20 Thread Steven Rostedt
On Sat, 20 Jan 2018 04:19:53 -0800
Tejun Heo  wrote:

> I'm a bit worried tho because this essentially seems like "detect
> recursion, ignore messages" approach.  netcons can have a very large
> surface for bugs.  Suppressing those messages would make them
> difficult to debug.  For example, all our machines have both serial
> console (thus the slowness) and netconsole hooked up and netcons code
> has had its fair share of issues.  This would likely make tracking
> down those problems more challenging.

Well, it's not totally ignoring them. There's a variable that tells
printk how many to print before it starts ignoring them. I picked 3,
but that could very well be 5 or 10. Probably 10 is the best, because
then it would give us enough idea why printk is recursing on itself
without overloading the buffer. And I made it a variable to easily make
it a knob for userspace to tweak if need be.

> 
> Can we discuss pros and cons of this approach against offloading
> before committing to this?

I'm open. I was just thinking about the scenario that you mentioned and
how what the best way to solve it would be.

We need to define the exact problem(s) we are dealing with before we
offer a solution. The one thing I don't want is a solution looking for
a problem. I want a full understanding of what the problem exactly is
and then we can discuss various solutions, and how they solve the
problem(s). Otherwise we are just doing (to quote Linus) code masturbation.

-- Steve



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-20 Thread Steven Rostedt
On Sat, 20 Jan 2018 04:19:53 -0800
Tejun Heo  wrote:

> I'm a bit worried tho because this essentially seems like "detect
> recursion, ignore messages" approach.  netcons can have a very large
> surface for bugs.  Suppressing those messages would make them
> difficult to debug.  For example, all our machines have both serial
> console (thus the slowness) and netconsole hooked up and netcons code
> has had its fair share of issues.  This would likely make tracking
> down those problems more challenging.

Well, it's not totally ignoring them. There's a variable that tells
printk how many to print before it starts ignoring them. I picked 3,
but that could very well be 5 or 10. Probably 10 is the best, because
then it would give us enough idea why printk is recursing on itself
without overloading the buffer. And I made it a variable to easily make
it a knob for userspace to tweak if need be.

> 
> Can we discuss pros and cons of this approach against offloading
> before committing to this?

I'm open. I was just thinking about the scenario that you mentioned and
how what the best way to solve it would be.

We need to define the exact problem(s) we are dealing with before we
offer a solution. The one thing I don't want is a solution looking for
a problem. I want a full understanding of what the problem exactly is
and then we can discuss various solutions, and how they solve the
problem(s). Otherwise we are just doing (to quote Linus) code masturbation.

-- Steve



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-20 Thread Tejun Heo
Hello, Steven.

On Fri, Jan 19, 2018 at 01:20:52PM -0500, Steven Rostedt wrote:
> I was thinking about this a bit more, and instead of offloading a
> recursive printk, perhaps its best to simply throttle it. Because the
> problem may not go away if a printk thread takes over, because the bug
> is really the printk infrastructure filling the printk buffer keeping
> printk from ever stopping.
> 
> This patch detects that printk is causing itself to print more and
> throttles it after 3 messages have printed due to recursion. Could you
> see if this helps your test cases?

Sure, if this is the approach we're gonna take, I can try it with the
silly test code and also try to reproduce the original problem and see
whether this helps.

I'm a bit worried tho because this essentially seems like "detect
recursion, ignore messages" approach.  netcons can have a very large
surface for bugs.  Suppressing those messages would make them
difficult to debug.  For example, all our machines have both serial
console (thus the slowness) and netconsole hooked up and netcons code
has had its fair share of issues.  This would likely make tracking
down those problems more challenging.

Can we discuss pros and cons of this approach against offloading
before committing to this?

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-20 Thread Tejun Heo
Hello, Steven.

On Fri, Jan 19, 2018 at 01:20:52PM -0500, Steven Rostedt wrote:
> I was thinking about this a bit more, and instead of offloading a
> recursive printk, perhaps its best to simply throttle it. Because the
> problem may not go away if a printk thread takes over, because the bug
> is really the printk infrastructure filling the printk buffer keeping
> printk from ever stopping.
> 
> This patch detects that printk is causing itself to print more and
> throttles it after 3 messages have printed due to recursion. Could you
> see if this helps your test cases?

Sure, if this is the approach we're gonna take, I can try it with the
silly test code and also try to reproduce the original problem and see
whether this helps.

I'm a bit worried tho because this essentially seems like "detect
recursion, ignore messages" approach.  netcons can have a very large
surface for bugs.  Suppressing those messages would make them
difficult to debug.  For example, all our machines have both serial
console (thus the slowness) and netconsole hooked up and netcons code
has had its fair share of issues.  This would likely make tracking
down those problems more challenging.

Can we discuss pros and cons of this approach against offloading
before committing to this?

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-19 Thread Sergey Senozhatsky

On (01/19/18 13:20), Steven Rostedt wrote:
[..]
> I was thinking about this a bit more, and instead of offloading a
> recursive printk, perhaps its best to simply throttle it. Because the
> problem may not go away if a printk thread takes over, because the bug
> is really the printk infrastructure filling the printk buffer keeping
> printk from ever stopping.

right. I didn't quite got it how that would help. if we would
kick_offload every time we add new printks after call_console_drivers(),
then we can just end up in a kick_offload loop traveling across all CPUs.

[..]
>  asmlinkage int vprintk_emit(int facility, int level,
>   const char *dict, size_t dictlen,
> @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>   /* This stops the holder of console_sem just where we want him */
>   logbuf_lock_irqsave(flags);
> +
> + if (recursion_check_test()) {
> + /* A printk happened within a printk at the same context */
> + if (this_cpu_inc_return(recursion_count) > recursion_max) {
> + atomic_inc(_overflow);
> + logbuf_unlock_irqrestore(flags);
> + printed_len = 0;
> + goto out;
> + }
> + }

didn't have time to look at this carefully, but is this possible?

printks from console_unlock()->call_console_drivers() are redirected
to printk_safe buffer. we need irq_work on that CPU to flush its
printk_safe buffer.

>  EXPORT_SYMBOL(vprintk_emit);
> @@ -2343,9 +2428,14 @@ void console_unlock(void)
>   seen_seq = log_next_seq;
>   }
>  
> - if (console_seq < log_first_seq) {
> + if (console_seq < log_first_seq || 
> atomic_read(_overflow)) {
> + size_t missed;
> +
> + missed = atomic_xchg(_overflow, 0);
> + missed += log_first_seq - console_seq;
> +
>   len = sprintf(text, "** %u printk messages dropped 
> **\n",
> -   (unsigned)(log_first_seq - console_seq));
> +   (unsigned)missed);
>  
>   /* messages are gone, move to first one */
>   console_seq = log_first_seq;

how are we going to distinguish between lockdep splats, for instance,
or WARNs from call_console_drivers() -> foo_write(), which are valuable,
and kmalloc() print outs, which might be less valuable? are we going to
lose all of them now? then we can do a much simpler thing - steal one
bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which
will be set around call_console_drivers(). vprintk_func() would redirect
printks to vprintk_noop(fmt, args), which will do nothing.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-19 Thread Sergey Senozhatsky

On (01/19/18 13:20), Steven Rostedt wrote:
[..]
> I was thinking about this a bit more, and instead of offloading a
> recursive printk, perhaps its best to simply throttle it. Because the
> problem may not go away if a printk thread takes over, because the bug
> is really the printk infrastructure filling the printk buffer keeping
> printk from ever stopping.

right. I didn't quite got it how that would help. if we would
kick_offload every time we add new printks after call_console_drivers(),
then we can just end up in a kick_offload loop traveling across all CPUs.

[..]
>  asmlinkage int vprintk_emit(int facility, int level,
>   const char *dict, size_t dictlen,
> @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>   /* This stops the holder of console_sem just where we want him */
>   logbuf_lock_irqsave(flags);
> +
> + if (recursion_check_test()) {
> + /* A printk happened within a printk at the same context */
> + if (this_cpu_inc_return(recursion_count) > recursion_max) {
> + atomic_inc(_overflow);
> + logbuf_unlock_irqrestore(flags);
> + printed_len = 0;
> + goto out;
> + }
> + }

didn't have time to look at this carefully, but is this possible?

printks from console_unlock()->call_console_drivers() are redirected
to printk_safe buffer. we need irq_work on that CPU to flush its
printk_safe buffer.

>  EXPORT_SYMBOL(vprintk_emit);
> @@ -2343,9 +2428,14 @@ void console_unlock(void)
>   seen_seq = log_next_seq;
>   }
>  
> - if (console_seq < log_first_seq) {
> + if (console_seq < log_first_seq || 
> atomic_read(_overflow)) {
> + size_t missed;
> +
> + missed = atomic_xchg(_overflow, 0);
> + missed += log_first_seq - console_seq;
> +
>   len = sprintf(text, "** %u printk messages dropped 
> **\n",
> -   (unsigned)(log_first_seq - console_seq));
> +   (unsigned)missed);
>  
>   /* messages are gone, move to first one */
>   console_seq = log_first_seq;

how are we going to distinguish between lockdep splats, for instance,
or WARNs from call_console_drivers() -> foo_write(), which are valuable,
and kmalloc() print outs, which might be less valuable? are we going to
lose all of them now? then we can do a much simpler thing - steal one
bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which
will be set around call_console_drivers(). vprintk_func() would redirect
printks to vprintk_noop(fmt, args), which will do nothing.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-19 Thread Steven Rostedt
Tejun,

I was thinking about this a bit more, and instead of offloading a
recursive printk, perhaps its best to simply throttle it. Because the
problem may not go away if a printk thread takes over, because the bug
is really the printk infrastructure filling the printk buffer keeping
printk from ever stopping.

This patch detects that printk is causing itself to print more and
throttles it after 3 messages have printed due to recursion. Could you
see if this helps your test cases?

I built this on top of linux-next (yesterday's branch).

It compiles and boots, but I didn't do any other tests on it.

Thanks!

-- Steve

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..2c7f18876224 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1826,6 +1826,75 @@ static size_t log_output(int facility, int level, enum 
log_flags lflags, const c
/* Store it in the record log */
return log_store(facility, level, lflags, 0, dict, dictlen, text, 
text_len);
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI = 0
+ *  IRQ = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+   CTX_NMI,
+   CTX_IRQ,
+   CTX_SOFTIRQ,
+   CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+static DEFINE_PER_CPU(int, recursion_count);
+static atomic_t recursion_overflow;
+static const int recursion_max = 3;
+
+static int __recursion_check_test(int val)
+{
+   unsigned long pc = preempt_count();
+   int bit;
+
+   if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+   bit = CTX_NORMAL;
+   else
+   bit = pc & NMI_MASK ? CTX_NMI :
+   pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+   return val & (1 << bit);
+}
+
+static bool recursion_check_test(void)
+{
+   int val = this_cpu_read(recursion_bits);
+
+   return __recursion_check_test(val);
+}
+
+static bool recursion_check_start(void)
+{
+   int val = this_cpu_read(recursion_bits);
+   int set;
+
+   set = __recursion_check_test(val);
+
+   if (unlikely(set))
+   return true;
+
+   val |= set;
+   this_cpu_write(recursion_bits, val);
+   return false;
+}
+
+static void recursion_check_finish(bool recursion)
+{
+   int val = this_cpu_read(recursion_bits);
+
+   if (recursion)
+   return;
+
+   val &= val - 1;
+   this_cpu_write(recursion_bits, val);
+}
 
 asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
@@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 
/* This stops the holder of console_sem just where we want him */
logbuf_lock_irqsave(flags);
+
+   if (recursion_check_test()) {
+   /* A printk happened within a printk at the same context */
+   if (this_cpu_inc_return(recursion_count) > recursion_max) {
+   atomic_inc(_overflow);
+   logbuf_unlock_irqrestore(flags);
+   printed_len = 0;
+   goto out;
+   }
+   }
+
/*
 * The printf needs to come first; we need the syslog
 * prefix which might be passed-in as a parameter.
@@ -1895,12 +1975,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 
/* If called from the scheduler, we can not call up(). */
if (!in_sched) {
+   bool recursion;
/*
 * Disable preemption to avoid being preempted while holding
 * console_sem which would prevent anyone from printing to
 * console
 */
preempt_disable();
+   recursion = recursion_check_start();
/*
 * Try to acquire and then immediately release the console
 * semaphore.  The release will print out buffers and wake up
@@ -1908,9 +1990,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 */
if (console_trylock_spinning())
console_unlock();
+
+   recursion_check_finish(recursion);
+   this_cpu_write(recursion_count, 0);
preempt_enable();
}
-
+out:
return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
@@ -2343,9 +2428,14 @@ void console_unlock(void)
seen_seq = log_next_seq;
}
 
-   if (console_seq < log_first_seq) {
+   if (console_seq < log_first_seq || 
atomic_read(_overflow)) {
+   size_t missed;
+
+   missed = atomic_xchg(_overflow, 0);
+   missed += log_first_seq - console_seq;
+
len = sprintf(text, "** %u printk messages 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-19 Thread Steven Rostedt
Tejun,

I was thinking about this a bit more, and instead of offloading a
recursive printk, perhaps its best to simply throttle it. Because the
problem may not go away if a printk thread takes over, because the bug
is really the printk infrastructure filling the printk buffer keeping
printk from ever stopping.

This patch detects that printk is causing itself to print more and
throttles it after 3 messages have printed due to recursion. Could you
see if this helps your test cases?

I built this on top of linux-next (yesterday's branch).

It compiles and boots, but I didn't do any other tests on it.

Thanks!

-- Steve

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..2c7f18876224 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1826,6 +1826,75 @@ static size_t log_output(int facility, int level, enum 
log_flags lflags, const c
/* Store it in the record log */
return log_store(facility, level, lflags, 0, dict, dictlen, text, 
text_len);
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI = 0
+ *  IRQ = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+   CTX_NMI,
+   CTX_IRQ,
+   CTX_SOFTIRQ,
+   CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+static DEFINE_PER_CPU(int, recursion_count);
+static atomic_t recursion_overflow;
+static const int recursion_max = 3;
+
+static int __recursion_check_test(int val)
+{
+   unsigned long pc = preempt_count();
+   int bit;
+
+   if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+   bit = CTX_NORMAL;
+   else
+   bit = pc & NMI_MASK ? CTX_NMI :
+   pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+   return val & (1 << bit);
+}
+
+static bool recursion_check_test(void)
+{
+   int val = this_cpu_read(recursion_bits);
+
+   return __recursion_check_test(val);
+}
+
+static bool recursion_check_start(void)
+{
+   int val = this_cpu_read(recursion_bits);
+   int set;
+
+   set = __recursion_check_test(val);
+
+   if (unlikely(set))
+   return true;
+
+   val |= set;
+   this_cpu_write(recursion_bits, val);
+   return false;
+}
+
+static void recursion_check_finish(bool recursion)
+{
+   int val = this_cpu_read(recursion_bits);
+
+   if (recursion)
+   return;
+
+   val &= val - 1;
+   this_cpu_write(recursion_bits, val);
+}
 
 asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
@@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 
/* This stops the holder of console_sem just where we want him */
logbuf_lock_irqsave(flags);
+
+   if (recursion_check_test()) {
+   /* A printk happened within a printk at the same context */
+   if (this_cpu_inc_return(recursion_count) > recursion_max) {
+   atomic_inc(_overflow);
+   logbuf_unlock_irqrestore(flags);
+   printed_len = 0;
+   goto out;
+   }
+   }
+
/*
 * The printf needs to come first; we need the syslog
 * prefix which might be passed-in as a parameter.
@@ -1895,12 +1975,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 
/* If called from the scheduler, we can not call up(). */
if (!in_sched) {
+   bool recursion;
/*
 * Disable preemption to avoid being preempted while holding
 * console_sem which would prevent anyone from printing to
 * console
 */
preempt_disable();
+   recursion = recursion_check_start();
/*
 * Try to acquire and then immediately release the console
 * semaphore.  The release will print out buffers and wake up
@@ -1908,9 +1990,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 */
if (console_trylock_spinning())
console_unlock();
+
+   recursion_check_finish(recursion);
+   this_cpu_write(recursion_count, 0);
preempt_enable();
}
-
+out:
return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
@@ -2343,9 +2428,14 @@ void console_unlock(void)
seen_seq = log_next_seq;
}
 
-   if (console_seq < log_first_seq) {
+   if (console_seq < log_first_seq || 
atomic_read(_overflow)) {
+   size_t missed;
+
+   missed = atomic_xchg(_overflow, 0);
+   missed += log_first_seq - console_seq;
+
len = sprintf(text, "** %u printk messages 

Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-18 Thread Steven Rostedt
On Thu, 18 Jan 2018 13:31:16 +0900
Sergey Senozhatsky  wrote:

> d'oh... indeed, I copy-pasted the wrong URL... it should
> have been lkml.kernel.org/r/ [and it actually was].

I've learned to do a copy after entering the lkml.kernel.org link into
the browser url, and before hitting enter. The redirection kills you.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-18 Thread Steven Rostedt
On Thu, 18 Jan 2018 13:31:16 +0900
Sergey Senozhatsky  wrote:

> d'oh... indeed, I copy-pasted the wrong URL... it should
> have been lkml.kernel.org/r/ [and it actually was].

I've learned to do a copy after entering the lkml.kernel.org link into
the browser url, and before hitting enter. The redirection kills you.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-18 Thread Petr Mladek
On Wed 2018-01-17 12:05:51, Tejun Heo wrote:
> Hello, Steven.
> 
> On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote:
> > From what I gathered, you said an OOM would trigger, and then the
> > network console would not be able to allocate memory and it would
> > trigger a printk too, and cause an infinite amount of printks.
> 
> Yeah, it falls into back-and-forth loop between the OOM code and
> netconsole path.
> 
> > This could very well be a great place to force offloading. If a printk
> > is called from within a printk, at the same context (normal, softirq,
> > irq or NMI), then we should trigger the offloading.
> 
> I was thinking more of a timeout based approach (ie. if stuck for
> longer than X or X messages, offload), but if local feedback loop is
> the only thing we're missing after your improvements, detecting that
> specific condition definitely works and is likely a better approach in
> terms of message delivery guarantee.

I think that we could combine both. The recursion can be detected
rather easily and immediately so there is no reason to wait.

Once we have the code for offloading from recursion then we could
kick_offload_thread() also from other reasons, e.g. when
console_unlock() takes too long.

I think that Sergey is already playing with this. It seems
that we all could be happy in the end.


Best Regards,
Petr

PS: I am sorry for the answer yesterday. Tejun's mail did not mention
any details about the problem. I evidently forgot them. I have OOM
and printk issues associated with Tetsuo. So I messed it. Believe
me. It is a big relief to realize that we are not in the cycle
again.


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-18 Thread Petr Mladek
On Wed 2018-01-17 12:05:51, Tejun Heo wrote:
> Hello, Steven.
> 
> On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote:
> > From what I gathered, you said an OOM would trigger, and then the
> > network console would not be able to allocate memory and it would
> > trigger a printk too, and cause an infinite amount of printks.
> 
> Yeah, it falls into back-and-forth loop between the OOM code and
> netconsole path.
> 
> > This could very well be a great place to force offloading. If a printk
> > is called from within a printk, at the same context (normal, softirq,
> > irq or NMI), then we should trigger the offloading.
> 
> I was thinking more of a timeout based approach (ie. if stuck for
> longer than X or X messages, offload), but if local feedback loop is
> the only thing we're missing after your improvements, detecting that
> specific condition definitely works and is likely a better approach in
> terms of message delivery guarantee.

I think that we could combine both. The recursion can be detected
rather easily and immediately so there is no reason to wait.

Once we have the code for offloading from recursion then we could
kick_offload_thread() also from other reasons, e.g. when
console_unlock() takes too long.

I think that Sergey is already playing with this. It seems
that we all could be happy in the end.


Best Regards,
Petr

PS: I am sorry for the answer yesterday. Tejun's mail did not mention
any details about the problem. I evidently forgot them. I have OOM
and printk issues associated with Tetsuo. So I messed it. Believe
me. It is a big relief to realize that we are not in the cycle
again.


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Sergey Senozhatsky
On (01/17/18 12:05), Tejun Heo wrote:
[..]
> > This could very well be a great place to force offloading. If a printk
> > is called from within a printk, at the same context (normal, softirq,
> > irq or NMI), then we should trigger the offloading.
> 
> I was thinking more of a timeout based approach (ie. if stuck for
> longer than X or X messages, offload)

yep, that's what I want. for a whole bunch of different reasons.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Sergey Senozhatsky
On (01/17/18 12:05), Tejun Heo wrote:
[..]
> > This could very well be a great place to force offloading. If a printk
> > is called from within a printk, at the same context (normal, softirq,
> > irq or NMI), then we should trigger the offloading.
> 
> I was thinking more of a timeout based approach (ie. if stuck for
> longer than X or X messages, offload)

yep, that's what I want. for a whole bunch of different reasons.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Sergey Senozhatsky
On (01/17/18 12:12), Steven Rostedt wrote:
[..]
>  /*
>   * Can we actually use the console at this time on this cpu?
> @@ -2333,6 +2390,7 @@ void console_unlock(void)
>  
>   for (;;) {
>   struct printk_log *msg;
> + bool offload;
>   size_t ext_len = 0;
>   size_t len;
>  
> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>* waiter waiting to take over.
>*/
>   console_lock_spinning_enable();
> + offload = recursion_check_start();
>  
>   stop_critical_timings();/* don't trace print latency */
>   call_console_drivers(ext_text, ext_len, text, len);
>   start_critical_timings();
>  
> + recursion_check_finish(offload);
> +
>   if (console_lock_spinning_disable_and_check()) {
>   printk_safe_exit_irqrestore(flags);
>   return;
>   }
> + if (offload)
> + kick_offload_thread();
>  
>   printk_safe_exit_irqrestore(flags);


but we call console drivers in printk_safe.
printk -> console_drivers -> printk will be
redirected to this-CPU printk_safe buffer.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Sergey Senozhatsky
On (01/17/18 12:12), Steven Rostedt wrote:
[..]
>  /*
>   * Can we actually use the console at this time on this cpu?
> @@ -2333,6 +2390,7 @@ void console_unlock(void)
>  
>   for (;;) {
>   struct printk_log *msg;
> + bool offload;
>   size_t ext_len = 0;
>   size_t len;
>  
> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>* waiter waiting to take over.
>*/
>   console_lock_spinning_enable();
> + offload = recursion_check_start();
>  
>   stop_critical_timings();/* don't trace print latency */
>   call_console_drivers(ext_text, ext_len, text, len);
>   start_critical_timings();
>  
> + recursion_check_finish(offload);
> +
>   if (console_lock_spinning_disable_and_check()) {
>   printk_safe_exit_irqrestore(flags);
>   return;
>   }
> + if (offload)
> + kick_offload_thread();
>  
>   printk_safe_exit_irqrestore(flags);


but we call console drivers in printk_safe.
printk -> console_drivers -> printk will be
redirected to this-CPU printk_safe buffer.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Sergey Senozhatsky
On (01/17/18 14:04), Petr Mladek wrote:
> On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> > On (01/16/18 10:45), Steven Rostedt wrote:
> > [..]
> > > > [1] https://marc.info/?l=linux-mm=145692016122716
> > > 
> > > Especially since Konstantin is working on pulling in all LKML archives,
> > > the above should be denoted as:
> > > 
> > >  Link: 
> > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
> > 
> > hm, may I ask why? is there a new rule now to percent-encode commit 
> > messages?
> 
> IMHO, the most important thing is that Steven's link is based
> on the Message-ID and the stable redirector
> https://lkml.kernel.org/. It has a better chance to work
> even in the future.

d'oh... indeed, I copy-pasted the wrong URL... it should
have been lkml.kernel.org/r/ [and it actually was].

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Sergey Senozhatsky
On (01/17/18 14:04), Petr Mladek wrote:
> On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> > On (01/16/18 10:45), Steven Rostedt wrote:
> > [..]
> > > > [1] https://marc.info/?l=linux-mm=145692016122716
> > > 
> > > Especially since Konstantin is working on pulling in all LKML archives,
> > > the above should be denoted as:
> > > 
> > >  Link: 
> > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
> > 
> > hm, may I ask why? is there a new rule now to percent-encode commit 
> > messages?
> 
> IMHO, the most important thing is that Steven's link is based
> on the Message-ID and the stable redirector
> https://lkml.kernel.org/. It has a better chance to work
> even in the future.

d'oh... indeed, I copy-pasted the wrong URL... it should
have been lkml.kernel.org/r/ [and it actually was].

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Tejun Heo
Hello, Steven.

On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote:
> From what I gathered, you said an OOM would trigger, and then the
> network console would not be able to allocate memory and it would
> trigger a printk too, and cause an infinite amount of printks.

Yeah, it falls into back-and-forth loop between the OOM code and
netconsole path.

> This could very well be a great place to force offloading. If a printk
> is called from within a printk, at the same context (normal, softirq,
> irq or NMI), then we should trigger the offloading.

I was thinking more of a timeout based approach (ie. if stuck for
longer than X or X messages, offload), but if local feedback loop is
the only thing we're missing after your improvements, detecting that
specific condition definitely works and is likely a better approach in
terms of message delivery guarantee.

> +static void kick_offload_thread(void)
> +{
> + /*
> +  * Consoles are triggering printks, offload the printks
> +  * to another CPU to hopefully avoid a lockup.
> +  */
> +}
...
> @@ -2333,6 +2390,7 @@ void console_unlock(void)
>  
>   for (;;) {
>   struct printk_log *msg;
> + bool offload;
>   size_t ext_len = 0;
>   size_t len;
>  
> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>* waiter waiting to take over.
>*/
>   console_lock_spinning_enable();
> + offload = recursion_check_start();
>  
>   stop_critical_timings();/* don't trace print latency */
>   call_console_drivers(ext_text, ext_len, text, len);
>   start_critical_timings();
>  
> + recursion_check_finish(offload);
> +
>   if (console_lock_spinning_disable_and_check()) {
>   printk_safe_exit_irqrestore(flags);
>   return;
>   }
> + if (offload)
> + kick_offload_thread();

Yeah, something like this would definitely work.

Thanks a lot.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Tejun Heo
Hello, Steven.

On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote:
> From what I gathered, you said an OOM would trigger, and then the
> network console would not be able to allocate memory and it would
> trigger a printk too, and cause an infinite amount of printks.

Yeah, it falls into back-and-forth loop between the OOM code and
netconsole path.

> This could very well be a great place to force offloading. If a printk
> is called from within a printk, at the same context (normal, softirq,
> irq or NMI), then we should trigger the offloading.

I was thinking more of a timeout based approach (ie. if stuck for
longer than X or X messages, offload), but if local feedback loop is
the only thing we're missing after your improvements, detecting that
specific condition definitely works and is likely a better approach in
terms of message delivery guarantee.

> +static void kick_offload_thread(void)
> +{
> + /*
> +  * Consoles are triggering printks, offload the printks
> +  * to another CPU to hopefully avoid a lockup.
> +  */
> +}
...
> @@ -2333,6 +2390,7 @@ void console_unlock(void)
>  
>   for (;;) {
>   struct printk_log *msg;
> + bool offload;
>   size_t ext_len = 0;
>   size_t len;
>  
> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>* waiter waiting to take over.
>*/
>   console_lock_spinning_enable();
> + offload = recursion_check_start();
>  
>   stop_critical_timings();/* don't trace print latency */
>   call_console_drivers(ext_text, ext_len, text, len);
>   start_critical_timings();
>  
> + recursion_check_finish(offload);
> +
>   if (console_lock_spinning_disable_and_check()) {
>   printk_safe_exit_irqrestore(flags);
>   return;
>   }
> + if (offload)
> + kick_offload_thread();

Yeah, something like this would definitely work.

Thanks a lot.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Steven Rostedt
On Wed, 17 Jan 2018 12:12:51 -0500
Steven Rostedt  wrote:

> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>* waiter waiting to take over.
>*/
>   console_lock_spinning_enable();
> + offload = recursion_check_start();
>  
>   stop_critical_timings();/* don't trace print latency */
>   call_console_drivers(ext_text, ext_len, text, len);
>   start_critical_timings();
>  
> + recursion_check_finish(offload);
> +
>   if (console_lock_spinning_disable_and_check()) {
>   printk_safe_exit_irqrestore(flags);
>   return;
>   }
> + if (offload)
> + kick_offload_thread();
>  

Ah, major flaw in this code. The recursion check needs to be in
printk() itself around the trylock.

-- Steve

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..31df145cc4d7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1826,6 +1826,63 @@ static size_t log_output(int facility, int level, enum 
log_flags lflags, const c
/* Store it in the record log */
return log_store(facility, level, lflags, 0, dict, dictlen, text, 
text_len);
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI = 0
+ *  IRQ = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+   CTX_NMI,
+   CTX_IRQ,
+   CTX_SOFTIRQ,
+   CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+
+static bool recursion_check_start(void)
+{
+   unsigned long pc = preempt_count();
+   int val = this_cpu_read(recursion_bits);
+
+   if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+   bit = CTX_NORMAL;
+   else
+   bit = pc & NMI_MASK ? CTX_NMI :
+   pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+   if (unlikely(val & (1 << bit)))
+   return true;
+
+   val |= (1 << bit);
+   this_cpu_write(recursion_bits, val);
+   return false;
+}
+
+static void recursion_check_finish(bool offload)
+{
+   int val = this_cpu_read(recursion_bits);
+
+   if (offload)
+   return;
+
+   val &= val - 1;
+   this_cpu_write(recursion_bits, val);
+}
+
+static void kick_offload_thread(void)
+{
+   /*
+* Consoles are triggering printks, offload the printks
+* to another CPU to hopefully avoid a lockup.
+*/
+}
 
 asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
@@ -1895,12 +1952,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 
/* If called from the scheduler, we can not call up(). */
if (!in_sched) {
+   bool offload;
/*
 * Disable preemption to avoid being preempted while holding
 * console_sem which would prevent anyone from printing to
 * console
 */
preempt_disable();
+   offload = recursion_check_start();
/*
 * Try to acquire and then immediately release the console
 * semaphore.  The release will print out buffers and wake up
@@ -1908,7 +1967,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 */
if (console_trylock_spinning())
console_unlock();
+
+   recursion_check_finish(offload);
preempt_enable();
+
+   if (offload)
+   kick_offload_thread();
}
 
return printed_len;


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Steven Rostedt
On Wed, 17 Jan 2018 12:12:51 -0500
Steven Rostedt  wrote:

> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>* waiter waiting to take over.
>*/
>   console_lock_spinning_enable();
> + offload = recursion_check_start();
>  
>   stop_critical_timings();/* don't trace print latency */
>   call_console_drivers(ext_text, ext_len, text, len);
>   start_critical_timings();
>  
> + recursion_check_finish(offload);
> +
>   if (console_lock_spinning_disable_and_check()) {
>   printk_safe_exit_irqrestore(flags);
>   return;
>   }
> + if (offload)
> + kick_offload_thread();
>  

Ah, major flaw in this code. The recursion check needs to be in
printk() itself around the trylock.

-- Steve

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..31df145cc4d7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1826,6 +1826,63 @@ static size_t log_output(int facility, int level, enum 
log_flags lflags, const c
/* Store it in the record log */
return log_store(facility, level, lflags, 0, dict, dictlen, text, 
text_len);
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI = 0
+ *  IRQ = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+   CTX_NMI,
+   CTX_IRQ,
+   CTX_SOFTIRQ,
+   CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+
+static bool recursion_check_start(void)
+{
+   unsigned long pc = preempt_count();
+   int val = this_cpu_read(recursion_bits);
+
+   if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+   bit = CTX_NORMAL;
+   else
+   bit = pc & NMI_MASK ? CTX_NMI :
+   pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+   if (unlikely(val & (1 << bit)))
+   return true;
+
+   val |= (1 << bit);
+   this_cpu_write(recursion_bits, val);
+   return false;
+}
+
+static void recursion_check_finish(bool offload)
+{
+   int val = this_cpu_read(recursion_bits);
+
+   if (offload)
+   return;
+
+   val &= val - 1;
+   this_cpu_write(recursion_bits, val);
+}
+
+static void kick_offload_thread(void)
+{
+   /*
+* Consoles are triggering printks, offload the printks
+* to another CPU to hopefully avoid a lockup.
+*/
+}
 
 asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
@@ -1895,12 +1952,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 
/* If called from the scheduler, we can not call up(). */
if (!in_sched) {
+   bool offload;
/*
 * Disable preemption to avoid being preempted while holding
 * console_sem which would prevent anyone from printing to
 * console
 */
preempt_disable();
+   offload = recursion_check_start();
/*
 * Try to acquire and then immediately release the console
 * semaphore.  The release will print out buffers and wake up
@@ -1908,7 +1967,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 */
if (console_trylock_spinning())
console_unlock();
+
+   recursion_check_finish(offload);
preempt_enable();
+
+   if (offload)
+   kick_offload_thread();
}
 
return printed_len;


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Steven Rostedt
On Wed, 17 Jan 2018 07:15:09 -0800
Tejun Heo  wrote:

> It's great that Steven's patches solve a good number of problems.  It
> is also true that there's a class of problems that it doesn't solve,
> which other approaches do.  The productive thing to do here is trying
> to solve the unsolved one too, especially given that it doesn't seem
> too difficuilt to do so on top of what's proposed.

OK, let's talk about the other problems, as this is no longer related
to my patch.

>From your previous email:

> 1. Console is IPMI emulated serial console.  Super slow.  Also
>netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>/ driver tries to allocate memory and then fail, which in turn
>triggers allocation failure or other warning messages.  printk was
>already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>shrinking.  Because OOM handler is trapped in printk flushing, it
>never manages to free memory and no one else can enter OOM path
>either, so the system is trapped in this state.

>From what I gathered, you said an OOM would trigger, and then the
network console would not be able to allocate memory and it would
trigger a printk too, and cause an infinite amount of printks.

This could very well be a great place to force offloading. If a printk
is called from within a printk, at the same context (normal, softirq,
irq or NMI), then we should trigger the offloading.

My ftrace ring buffer has a context level recursion check, we could use
that, and even tie it into my previous patch:

With something like this (not compiled tested or anything, and
kick_offload_thread() would need to be implemented).

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..b80b23a0ca13 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2261,6 +2261,63 @@ static int have_callable_console(void)
 
return 0;
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI = 0
+ *  IRQ = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+   CTX_NMI,
+   CTX_IRQ,
+   CTX_SOFTIRQ,
+   CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+
+static bool recursion_check_start(void)
+{
+   unsigned long pc = preempt_count();
+   int val = this_cpu_read(recursion_bits);
+
+   if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+   bit = CTX_NORMAL;
+   else
+   bit = pc & NMI_MASK ? CTX_NMI :
+   pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+   if (unlikely(val & (1 << bit)))
+   return true;
+
+   val |= (1 << bit);
+   this_cpu_write(recursion_bits, val);
+   return false;
+}
+
+static void recursion_check_finish(bool offload)
+{
+   int val = this_cpu_read(recursion_bits);
+
+   if (offload)
+   return;
+
+   val &= val - 1;
+   this_cpu_write(recursion_bits, val);
+}
+
+static void kick_offload_thread(void)
+{
+   /*
+* Consoles are triggering printks, offload the printks
+* to another CPU to hopefully avoid a lockup.
+*/
+}
 
 /*
  * Can we actually use the console at this time on this cpu?
@@ -2333,6 +2390,7 @@ void console_unlock(void)
 
for (;;) {
struct printk_log *msg;
+   bool offload;
size_t ext_len = 0;
size_t len;
 
@@ -2393,15 +2451,20 @@ void console_unlock(void)
 * waiter waiting to take over.
 */
console_lock_spinning_enable();
+   offload = recursion_check_start();
 
stop_critical_timings();/* don't trace print latency */
call_console_drivers(ext_text, ext_len, text, len);
start_critical_timings();
 
+   recursion_check_finish(offload);
+
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
return;
}
+   if (offload)
+   kick_offload_thread();
 
printk_safe_exit_irqrestore(flags);
 

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Steven Rostedt
On Wed, 17 Jan 2018 07:15:09 -0800
Tejun Heo  wrote:

> It's great that Steven's patches solve a good number of problems.  It
> is also true that there's a class of problems that it doesn't solve,
> which other approaches do.  The productive thing to do here is trying
> to solve the unsolved one too, especially given that it doesn't seem
> too difficuilt to do so on top of what's proposed.

OK, let's talk about the other problems, as this is no longer related
to my patch.

>From your previous email:

> 1. Console is IPMI emulated serial console.  Super slow.  Also
>netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>/ driver tries to allocate memory and then fail, which in turn
>triggers allocation failure or other warning messages.  printk was
>already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>shrinking.  Because OOM handler is trapped in printk flushing, it
>never manages to free memory and no one else can enter OOM path
>either, so the system is trapped in this state.

>From what I gathered, you said an OOM would trigger, and then the
network console would not be able to allocate memory and it would
trigger a printk too, and cause an infinite amount of printks.

This could very well be a great place to force offloading. If a printk
is called from within a printk, at the same context (normal, softirq,
irq or NMI), then we should trigger the offloading.

My ftrace ring buffer has a context level recursion check, we could use
that, and even tie it into my previous patch:

With something like this (not compiled tested or anything, and
kick_offload_thread() would need to be implemented).

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..b80b23a0ca13 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2261,6 +2261,63 @@ static int have_callable_console(void)
 
return 0;
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI = 0
+ *  IRQ = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+   CTX_NMI,
+   CTX_IRQ,
+   CTX_SOFTIRQ,
+   CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+
+static bool recursion_check_start(void)
+{
+   unsigned long pc = preempt_count();
+   int val = this_cpu_read(recursion_bits);
+
+   if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+   bit = CTX_NORMAL;
+   else
+   bit = pc & NMI_MASK ? CTX_NMI :
+   pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+   if (unlikely(val & (1 << bit)))
+   return true;
+
+   val |= (1 << bit);
+   this_cpu_write(recursion_bits, val);
+   return false;
+}
+
+static void recursion_check_finish(bool offload)
+{
+   int val = this_cpu_read(recursion_bits);
+
+   if (offload)
+   return;
+
+   val &= val - 1;
+   this_cpu_write(recursion_bits, val);
+}
+
+static void kick_offload_thread(void)
+{
+   /*
+* Consoles are triggering printks, offload the printks
+* to another CPU to hopefully avoid a lockup.
+*/
+}
 
 /*
  * Can we actually use the console at this time on this cpu?
@@ -2333,6 +2390,7 @@ void console_unlock(void)
 
for (;;) {
struct printk_log *msg;
+   bool offload;
size_t ext_len = 0;
size_t len;
 
@@ -2393,15 +2451,20 @@ void console_unlock(void)
 * waiter waiting to take over.
 */
console_lock_spinning_enable();
+   offload = recursion_check_start();
 
stop_critical_timings();/* don't trace print latency */
call_console_drivers(ext_text, ext_len, text, len);
start_critical_timings();
 
+   recursion_check_finish(offload);
+
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
return;
}
+   if (offload)
+   kick_offload_thread();
 
printk_safe_exit_irqrestore(flags);
 

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Steven Rostedt
On Wed, 17 Jan 2018 14:04:07 +0100
Petr Mladek  wrote:

> On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> > On (01/16/18 10:45), Steven Rostedt wrote:
> > [..]  
> > > > [1] https://marc.info/?l=linux-mm=145692016122716  
> > > 
> > > Especially since Konstantin is working on pulling in all LKML archives,
> > > the above should be denoted as:
> > > 
> > >  Link: 
> > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
> > >   
> > 
> > hm, may I ask why? is there a new rule now to percent-encode commit 
> > messages?  
> 
> IMHO, the most important thing is that Steven's link is based
> on the Message-ID and the stable redirector
> https://lkml.kernel.org/. It has a better chance to work
> even in the future.

Exactly. There's an effort to avoid any outside link dependencies in
the Linux git history. No one expected gmane to end (although it
appears to be making a comeback), but we don't want to get stuck if
marc.info disappears one day.

> > > 
> > > Should we Cc sta...@vger.kernel.org?  
> > 
> > that's a good question... maybe yes, maybe no... I'd say this
> > change is "safer" when we have hand-off.  
> 
> I would keep it as is in stable kernels unless there are
> many bug reports.

Agreed.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Steven Rostedt
On Wed, 17 Jan 2018 14:04:07 +0100
Petr Mladek  wrote:

> On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> > On (01/16/18 10:45), Steven Rostedt wrote:
> > [..]  
> > > > [1] https://marc.info/?l=linux-mm=145692016122716  
> > > 
> > > Especially since Konstantin is working on pulling in all LKML archives,
> > > the above should be denoted as:
> > > 
> > >  Link: 
> > > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
> > >   
> > 
> > hm, may I ask why? is there a new rule now to percent-encode commit 
> > messages?  
> 
> IMHO, the most important thing is that Steven's link is based
> on the Message-ID and the stable redirector
> https://lkml.kernel.org/. It has a better chance to work
> even in the future.

Exactly. There's an effort to avoid any outside link dependencies in
the Linux git history. No one expected gmane to end (although it
appears to be making a comeback), but we don't want to get stuck if
marc.info disappears one day.

> > > 
> > > Should we Cc sta...@vger.kernel.org?  
> > 
> > that's a good question... maybe yes, maybe no... I'd say this
> > change is "safer" when we have hand-off.  
> 
> I would keep it as is in stable kernels unless there are
> many bug reports.

Agreed.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Tejun Heo
Hello,

On Wed, Jan 17, 2018 at 10:12:08AM +0100, Petr Mladek wrote:
> IMHO, the bad scenario with OOM was that any printk() called in
> the OOM report became console_lock owner and was responsible
> for pushing all new messages to the console. There was a possible
> livelock because OOM Killer was blocked in console_unlock() while
> other CPUs repeatedly complained about failed allocations.

I don't know why we're constantly back into this same loop on this
topic but that's not the problem we've been seeing.  There are no
other CPUs involved.

It's great that Steven's patches solve a good number of problems.  It
is also true that there's a class of problems that it doesn't solve,
which other approaches do.  The productive thing to do here is trying
to solve the unsolved one too, especially given that it doesn't seem
too difficuilt to do so on top of what's proposed.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Tejun Heo
Hello,

On Wed, Jan 17, 2018 at 10:12:08AM +0100, Petr Mladek wrote:
> IMHO, the bad scenario with OOM was that any printk() called in
> the OOM report became console_lock owner and was responsible
> for pushing all new messages to the console. There was a possible
> livelock because OOM Killer was blocked in console_unlock() while
> other CPUs repeatedly complained about failed allocations.

I don't know why we're constantly back into this same loop on this
topic but that's not the problem we've been seeing.  There are no
other CPUs involved.

It's great that Steven's patches solve a good number of problems.  It
is also true that there's a class of problems that it doesn't solve,
which other approaches do.  The productive thing to do here is trying
to solve the unsolved one too, especially given that it doesn't seem
too difficuilt to do so on top of what's proposed.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Petr Mladek
On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> On (01/16/18 10:45), Steven Rostedt wrote:
> [..]
> > > [1] https://marc.info/?l=linux-mm=145692016122716
> > 
> > Especially since Konstantin is working on pulling in all LKML archives,
> > the above should be denoted as:
> > 
> >  Link: 
> > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
> 
> hm, may I ask why? is there a new rule now to percent-encode commit messages?

IMHO, the most important thing is that Steven's link is based
on the Message-ID and the stable redirector
https://lkml.kernel.org/. It has a better chance to work
even in the future.

I have been asked by other people to use this type
of links as well.

> > Although the above is for linux-mm and not LKML (it still works), I
> > should ask Konstantin if he will be pulling in any of the other
> > archives. Perhaps have both? (in case marc.info goes away).
> > 
> > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of 
> > > console_trylock() callers")
> > 
> > Should we Cc sta...@vger.kernel.org?
> 
> that's a good question... maybe yes, maybe no... I'd say this
> change is "safer" when we have hand-off.

I would keep it as is in stable kernels unless there are
many bug reports.

Best Regards,
Petr


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Petr Mladek
On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> On (01/16/18 10:45), Steven Rostedt wrote:
> [..]
> > > [1] https://marc.info/?l=linux-mm=145692016122716
> > 
> > Especially since Konstantin is working on pulling in all LKML archives,
> > the above should be denoted as:
> > 
> >  Link: 
> > http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
> 
> hm, may I ask why? is there a new rule now to percent-encode commit messages?

IMHO, the most important thing is that Steven's link is based
on the Message-ID and the stable redirector
https://lkml.kernel.org/. It has a better chance to work
even in the future.

I have been asked by other people to use this type
of links as well.

> > Although the above is for linux-mm and not LKML (it still works), I
> > should ask Konstantin if he will be pulling in any of the other
> > archives. Perhaps have both? (in case marc.info goes away).
> > 
> > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of 
> > > console_trylock() callers")
> > 
> > Should we Cc sta...@vger.kernel.org?
> 
> that's a good question... maybe yes, maybe no... I'd say this
> change is "safer" when we have hand-off.

I would keep it as is in stable kernels unless there are
many bug reports.

Best Regards,
Petr


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Petr Mladek
On Tue 2018-01-16 11:44:56, Tejun Heo wrote:
> Hello, Steven.
> 
> On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote:
> > All I did was start off a work queue on each CPU, and each CPU does one
> > printk() followed by a millisecond sleep. No 10,000 printks, nothing
> > in an interrupt handler. Preemption is disabled while the printk
> > happens, but that's normal.
> > 
> > This is much closer to an OOM happening all over the system, where OOMs
> > stack dumps are occurring on different CPUS.
> 
> OOMs can't happen all over the system.  It can only happen on a single
> CPU at a time.  If you're printing from multiple CPUs, your solution
> would work great.  That is the situation your patches are designed to
> address to begin with.  That isn't the problem that I reported tho.  I
> understand that your solution works for that class of problems and
> that is great.  I really wish that it could address the other class of
> problems too tho, and it doesn't seem like it would be that difficult
> to cover both cases, right?

IMHO, the bad scenario with OOM was that any printk() called in
the OOM report became console_lock owner and was responsible
for pushing all new messages to the console. There was a possible
livelock because OOM Killer was blocked in console_unlock() while
other CPUs repeatedly complained about failed allocations.

Even the current patch should help. It allows to hand off
the console_lock to another CPU and OOM killer could eventually
continue.

Of course, it is possible that it might not be enough. For example,
there might still be too many messages to print when the memory is
freed. Therefore there will be no more complains, no more
hand offs and the last console_lock owner might still
cause softlockup. But it still will be better than
the livelockup. Of course, we will need to address
the softlockup. But let's see how this works in practice.

Best Regards,
Petr


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-17 Thread Petr Mladek
On Tue 2018-01-16 11:44:56, Tejun Heo wrote:
> Hello, Steven.
> 
> On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote:
> > All I did was start off a work queue on each CPU, and each CPU does one
> > printk() followed by a millisecond sleep. No 10,000 printks, nothing
> > in an interrupt handler. Preemption is disabled while the printk
> > happens, but that's normal.
> > 
> > This is much closer to an OOM happening all over the system, where OOMs
> > stack dumps are occurring on different CPUS.
> 
> OOMs can't happen all over the system.  It can only happen on a single
> CPU at a time.  If you're printing from multiple CPUs, your solution
> would work great.  That is the situation your patches are designed to
> address to begin with.  That isn't the problem that I reported tho.  I
> understand that your solution works for that class of problems and
> that is great.  I really wish that it could address the other class of
> problems too tho, and it doesn't seem like it would be that difficult
> to cover both cases, right?

IMHO, the bad scenario with OOM was that any printk() called in
the OOM report became console_lock owner and was responsible
for pushing all new messages to the console. There was a possible
livelock because OOM Killer was blocked in console_unlock() while
other CPUs repeatedly complained about failed allocations.

Even the current patch should help. It allows to hand off
the console_lock to another CPU and OOM killer could eventually
continue.

Of course, it is possible that it might not be enough. For example,
there might still be too many messages to print when the memory is
freed. Therefore there will be no more complains, no more
hand offs and the last console_lock owner might still
cause softlockup. But it still will be better than
the livelockup. Of course, we will need to address
the softlockup. But let's see how this works in practice.

Best Regards,
Petr


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Sergey Senozhatsky
On (01/16/18 11:13), Petr Mladek wrote:
[..]
> IMHO, it would make sense if flushing the printk buffer behaves
> the same when called either from printk() or from any other path.
> I mean that it should be aggressive and allow an effective
> hand off.
> 
> It should be safe as long as foo_specific_work() does not take
> too much time.
> 
> From other side. The cond_resched() in console_unlock() should
> be obsoleted by the hand-shake code.

hm, let's not have too optimistic expectations. hand off works in very
specific conditions. console is not exclusively owned by printk, and
console_sem is not printk's own lock. even things like

systemd -> n_tty_write -> do_output_char -> con_write

involves console_lock() and console_unlock(). IOW user space
logging/debugging can cause printk stalls, and vice versa.

by the way, do_con_write() explicitly calls console_conditional_schedule()
under console_sem, before it goes to console_unlock(). so the scope of
"situation normal, console_sem locked, the owner scheduled out" is much
bigger than just vprintk_emit() -> console_unlock(). IMHO.

and there are even more things there. personally, I don't think
that hand off is enough to obsolete anything in that area.

[...]
> They both were obvious solutions that helped to reduce the risk
> of soft-lockups. The first one handled evidently safe scenarios.
> The second one was even more aggressive. I would say that
> they both were more or less add-hoc solutions that did not
> take into account the other side effects (delaying output,
> even loosing messages).

agreed.

> I would not say that one is a diametric difference between them.
> Therefore if we remove one for a reason, we should think about
> reverting the other as well. But again. I am fine if we remove
> only one now.
> 
> Does this make any sense?

I see cond_resched() as a mirroring of console_lock()->console_unlock()
behaviour on PREEMPT systems, and as such it looks valid to me, so we
probably better keep it there. IMHO.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Sergey Senozhatsky
On (01/16/18 11:13), Petr Mladek wrote:
[..]
> IMHO, it would make sense if flushing the printk buffer behaves
> the same when called either from printk() or from any other path.
> I mean that it should be aggressive and allow an effective
> hand off.
> 
> It should be safe as long as foo_specific_work() does not take
> too much time.
> 
> From other side. The cond_resched() in console_unlock() should
> be obsoleted by the hand-shake code.

hm, let's not have too optimistic expectations. hand off works in very
specific conditions. console is not exclusively owned by printk, and
console_sem is not printk's own lock. even things like

systemd -> n_tty_write -> do_output_char -> con_write

involves console_lock() and console_unlock(). IOW user space
logging/debugging can cause printk stalls, and vice versa.

by the way, do_con_write() explicitly calls console_conditional_schedule()
under console_sem, before it goes to console_unlock(). so the scope of
"situation normal, console_sem locked, the owner scheduled out" is much
bigger than just vprintk_emit() -> console_unlock(). IMHO.

and there are even more things there. personally, I don't think
that hand off is enough to obsolete anything in that area.

[...]
> They both were obvious solutions that helped to reduce the risk
> of soft-lockups. The first one handled evidently safe scenarios.
> The second one was even more aggressive. I would say that
> they both were more or less add-hoc solutions that did not
> take into account the other side effects (delaying output,
> even loosing messages).

agreed.

> I would not say that one is a diametric difference between them.
> Therefore if we remove one for a reason, we should think about
> reverting the other as well. But again. I am fine if we remove
> only one now.
> 
> Does this make any sense?

I see cond_resched() as a mirroring of console_lock()->console_unlock()
behaviour on PREEMPT systems, and as such it looks valid to me, so we
probably better keep it there. IMHO.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Sergey Senozhatsky
On (01/16/18 11:19), Petr Mladek wrote:
[..]
> > [1] https://marc.info/?l=linux-mm=145692016122716
> > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of 
> > console_trylock() callers")
> > Signed-off-by: Sergey Senozhatsky 
> > Reported-by: Tetsuo Handa 
> 
> IMHO, this is a step in the right direction.
> 
> Reviewed-by: Petr Mladek 
> 
> I'll wait for Steven's review and push this into printk.git.
> I'll also add your Acks for the other patches.
> 
> Thanks for the patch and the various observations.

thanks!


a side note,

our console output is still largely preemptible. a typical system
acquires console_sem via console_lock() all the time, so we still
can have "where is my printk output?" cases.


for instance, my IDLE PREEMPT x86 box, has the following stats

uptime 15 min

# of console_lock() calls: 10981  // can sleep under console_sem
# of vprintk_emit() calls: 825// cannot sleep under console_sem

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Sergey Senozhatsky
On (01/16/18 11:19), Petr Mladek wrote:
[..]
> > [1] https://marc.info/?l=linux-mm=145692016122716
> > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of 
> > console_trylock() callers")
> > Signed-off-by: Sergey Senozhatsky 
> > Reported-by: Tetsuo Handa 
> 
> IMHO, this is a step in the right direction.
> 
> Reviewed-by: Petr Mladek 
> 
> I'll wait for Steven's review and push this into printk.git.
> I'll also add your Acks for the other patches.
> 
> Thanks for the patch and the various observations.

thanks!


a side note,

our console output is still largely preemptible. a typical system
acquires console_sem via console_lock() all the time, so we still
can have "where is my printk output?" cases.


for instance, my IDLE PREEMPT x86 box, has the following stats

uptime 15 min

# of console_lock() calls: 10981  // can sleep under console_sem
# of vprintk_emit() calls: 825// cannot sleep under console_sem

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Sergey Senozhatsky
On (01/16/18 10:45), Steven Rostedt wrote:
[..]
> > [1] https://marc.info/?l=linux-mm=145692016122716
> 
> Especially since Konstantin is working on pulling in all LKML archives,
> the above should be denoted as:
> 
>  Link: 
> http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp

hm, may I ask why? is there a new rule now to percent-encode commit messages?

> Although the above is for linux-mm and not LKML (it still works), I
> should ask Konstantin if he will be pulling in any of the other
> archives. Perhaps have both? (in case marc.info goes away).
> 
> > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of 
> > console_trylock() callers")
> 
> Should we Cc sta...@vger.kernel.org?

that's a good question... maybe yes, maybe no... I'd say this
change is "safer" when we have hand-off.

> Reviewed-by: Steven Rostedt (VMware) 
> 
> Thanks Sergey!

thanks.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Sergey Senozhatsky
On (01/16/18 10:45), Steven Rostedt wrote:
[..]
> > [1] https://marc.info/?l=linux-mm=145692016122716
> 
> Especially since Konstantin is working on pulling in all LKML archives,
> the above should be denoted as:
> 
>  Link: 
> http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp

hm, may I ask why? is there a new rule now to percent-encode commit messages?

> Although the above is for linux-mm and not LKML (it still works), I
> should ask Konstantin if he will be pulling in any of the other
> archives. Perhaps have both? (in case marc.info goes away).
> 
> > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of 
> > console_trylock() callers")
> 
> Should we Cc sta...@vger.kernel.org?

that's a good question... maybe yes, maybe no... I'd say this
change is "safer" when we have hand-off.

> Reviewed-by: Steven Rostedt (VMware) 
> 
> Thanks Sergey!

thanks.

-ss


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Tejun Heo
Hello, Steven.

On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote:
> All I did was start off a work queue on each CPU, and each CPU does one
> printk() followed by a millisecond sleep. No 10,000 printks, nothing
> in an interrupt handler. Preemption is disabled while the printk
> happens, but that's normal.
> 
> This is much closer to an OOM happening all over the system, where OOMs
> stack dumps are occurring on different CPUS.

OOMs can't happen all over the system.  It can only happen on a single
CPU at a time.  If you're printing from multiple CPUs, your solution
would work great.  That is the situation your patches are designed to
address to begin with.  That isn't the problem that I reported tho.  I
understand that your solution works for that class of problems and
that is great.  I really wish that it could address the other class of
problems too tho, and it doesn't seem like it would be that difficult
to cover both cases, right?

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Tejun Heo
Hello, Steven.

On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote:
> All I did was start off a work queue on each CPU, and each CPU does one
> printk() followed by a millisecond sleep. No 10,000 printks, nothing
> in an interrupt handler. Preemption is disabled while the printk
> happens, but that's normal.
> 
> This is much closer to an OOM happening all over the system, where OOMs
> stack dumps are occurring on different CPUS.

OOMs can't happen all over the system.  It can only happen on a single
CPU at a time.  If you're printing from multiple CPUs, your solution
would work great.  That is the situation your patches are designed to
address to begin with.  That isn't the problem that I reported tho.  I
understand that your solution works for that class of problems and
that is great.  I really wish that it could address the other class of
problems too tho, and it doesn't seem like it would be that difficult
to cover both cases, right?

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Steven Rostedt
On Tue, 16 Jan 2018 15:10:13 +0900
Sergey Senozhatsky  wrote:

> overall that's very close to what I have in one of my private branches.
> console_trylock_spinning() for some reason does not perform really
> well on my made-up internal printk torture tests. it seems that I

One thing I noticed in my test with the module that does printks on all
cpus, was that the patch spreads out the processing of the consoles.
Before my patch, one printk user would be doing all the work, and all
the other printks only had to load their data into the logbuf then
exit. The majority of printks took a few microseconds, which looks
great if you ignore the one worker that is taking milliseconds to
complete. After my patch, since a printk that comes in while another
one was running would block, then it would start printing, it did
lengthen the time for individual printks to finish. Worst case it
would double the time to do printk. But it removed the burden of a
single printk doing all the work for all new printks that came in.

In other words, I would expect this to make printk on average slower.
But no longer unlimited.

-- Steve


> have a much better stability (no lockups and so on) when I also let
> printk_kthread to sleep on console_sem(). but I will look further.



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Steven Rostedt
On Tue, 16 Jan 2018 15:10:13 +0900
Sergey Senozhatsky  wrote:

> overall that's very close to what I have in one of my private branches.
> console_trylock_spinning() for some reason does not perform really
> well on my made-up internal printk torture tests. it seems that I

One thing I noticed in my test with the module that does printks on all
cpus, was that the patch spreads out the processing of the consoles.
Before my patch, one printk user would be doing all the work, and all
the other printks only had to load their data into the logbuf then
exit. The majority of printks took a few microseconds, which looks
great if you ignore the one worker that is taking milliseconds to
complete. After my patch, since a printk that comes in while another
one was running would block, then it would start printing, it did
lengthen the time for individual printks to finish. Worst case it
would double the time to do printk. But it removed the burden of a
single printk doing all the work for all new printks that came in.

In other words, I would expect this to make printk on average slower.
But no longer unlimited.

-- Steve


> have a much better stability (no lockups and so on) when I also let
> printk_kthread to sleep on console_sem(). but I will look further.



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Steven Rostedt
On Tue, 16 Jan 2018 13:47:16 +0900
Sergey Senozhatsky  wrote:

> From: Sergey Senozhatsky 
> Subject: [PATCH] printk: never set console_may_schedule in console_trylock()
> 
> This patch, basically, reverts commit 6b97a20d3a79 ("printk:
> set may_schedule for some of console_trylock() callers").
> That commit was a mistake, it introduced a big dependency
> on the scheduler, by enabling preemption under console_sem
> in printk()->console_unlock() path, which is rather too
> critical. The patch did not significantly reduce the
> possibilities of printk() lockups, but made it possible to
> stall printk(), as has been reported by Tetsuo Handa [1].
> 
> Another issues is that preemption under console_sem also
> messes up with Steven Rostedt's hand off scheme, by making
> it possible to sleep with console_sem both in console_unlock()
> and in vprintk_emit(), after acquiring the console_sem
> ownership (anywhere between printk_safe_exit_irqrestore() in
> console_trylock_spinning() and printk_safe_enter_irqsave()
> in console_unlock()). This makes hand off less likely and,
> at the same time, may result in a significant amount of
> pending logbuf messages. Preempted console_sem owner makes
> it impossible for other CPUs to emit logbuf messages, but
> does not make it impossible for other CPUs to append new
> messages to the logbuf.
> 
> Reinstate the old behavior and make printk() non-preemptible.
> Should any printk() lockup reports arrive they must be handled
> in a different way.
> 
> [1] https://marc.info/?l=linux-mm=145692016122716

Especially since Konstantin is working on pulling in all LKML archives,
the above should be denoted as:

 Link: 
http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp

Although the above is for linux-mm and not LKML (it still works), I
should ask Konstantin if he will be pulling in any of the other
archives. Perhaps have both? (in case marc.info goes away).

> Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() 
> callers")

Should we Cc sta...@vger.kernel.org?

> Signed-off-by: Sergey Senozhatsky 
> Reported-by: Tetsuo Handa 
> ---
>  kernel/printk/printk.c | 22 --
>  1 file changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ffe05024c622..9cb943c90d98 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1895,6 +1895,12 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>   /* If called from the scheduler, we can not call up(). */
>   if (!in_sched) {
> + /*
> +  * Disable preemption to avoid being preempted while holding
> +  * console_sem which would prevent anyone from printing to
> +  * console
> +  */
> + preempt_disable();
>   /*
>* Try to acquire and then immediately release the console
>* semaphore.  The release will print out buffers and wake up
> @@ -1902,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>*/
>   if (console_trylock_spinning())
>   console_unlock();
> + preempt_enable();
>   }
>  
>   return printed_len;
> @@ -2229,20 +2236,7 @@ int console_trylock(void)
>   return 0;
>   }
>   console_locked = 1;
> - /*
> -  * When PREEMPT_COUNT disabled we can't reliably detect if it's
> -  * safe to schedule (e.g. calling printk while holding a spin_lock),
> -  * because preempt_disable()/preempt_enable() are just barriers there
> -  * and preempt_count() is always 0.
> -  *
> -  * RCU read sections have a separate preemption counter when
> -  * PREEMPT_RCU enabled thus we must take extra care and check
> -  * rcu_preempt_depth(), otherwise RCU read sections modify
> -  * preempt_count().
> -  */
> - console_may_schedule = !oops_in_progress &&
> - preemptible() &&
> - !rcu_preempt_depth();
> + console_may_schedule = 0;
>   return 1;
>  }
>  EXPORT_SYMBOL(console_trylock);

Reviewed-by: Steven Rostedt (VMware) 

Thanks Sergey!

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Steven Rostedt
On Tue, 16 Jan 2018 13:47:16 +0900
Sergey Senozhatsky  wrote:

> From: Sergey Senozhatsky 
> Subject: [PATCH] printk: never set console_may_schedule in console_trylock()
> 
> This patch, basically, reverts commit 6b97a20d3a79 ("printk:
> set may_schedule for some of console_trylock() callers").
> That commit was a mistake, it introduced a big dependency
> on the scheduler, by enabling preemption under console_sem
> in printk()->console_unlock() path, which is rather too
> critical. The patch did not significantly reduce the
> possibilities of printk() lockups, but made it possible to
> stall printk(), as has been reported by Tetsuo Handa [1].
> 
> Another issues is that preemption under console_sem also
> messes up with Steven Rostedt's hand off scheme, by making
> it possible to sleep with console_sem both in console_unlock()
> and in vprintk_emit(), after acquiring the console_sem
> ownership (anywhere between printk_safe_exit_irqrestore() in
> console_trylock_spinning() and printk_safe_enter_irqsave()
> in console_unlock()). This makes hand off less likely and,
> at the same time, may result in a significant amount of
> pending logbuf messages. Preempted console_sem owner makes
> it impossible for other CPUs to emit logbuf messages, but
> does not make it impossible for other CPUs to append new
> messages to the logbuf.
> 
> Reinstate the old behavior and make printk() non-preemptible.
> Should any printk() lockup reports arrive they must be handled
> in a different way.
> 
> [1] https://marc.info/?l=linux-mm=145692016122716

Especially since Konstantin is working on pulling in all LKML archives,
the above should be denoted as:

 Link: 
http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp

Although the above is for linux-mm and not LKML (it still works), I
should ask Konstantin if he will be pulling in any of the other
archives. Perhaps have both? (in case marc.info goes away).

> Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() 
> callers")

Should we Cc sta...@vger.kernel.org?

> Signed-off-by: Sergey Senozhatsky 
> Reported-by: Tetsuo Handa 
> ---
>  kernel/printk/printk.c | 22 --
>  1 file changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ffe05024c622..9cb943c90d98 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1895,6 +1895,12 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>   /* If called from the scheduler, we can not call up(). */
>   if (!in_sched) {
> + /*
> +  * Disable preemption to avoid being preempted while holding
> +  * console_sem which would prevent anyone from printing to
> +  * console
> +  */
> + preempt_disable();
>   /*
>* Try to acquire and then immediately release the console
>* semaphore.  The release will print out buffers and wake up
> @@ -1902,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>*/
>   if (console_trylock_spinning())
>   console_unlock();
> + preempt_enable();
>   }
>  
>   return printed_len;
> @@ -2229,20 +2236,7 @@ int console_trylock(void)
>   return 0;
>   }
>   console_locked = 1;
> - /*
> -  * When PREEMPT_COUNT disabled we can't reliably detect if it's
> -  * safe to schedule (e.g. calling printk while holding a spin_lock),
> -  * because preempt_disable()/preempt_enable() are just barriers there
> -  * and preempt_count() is always 0.
> -  *
> -  * RCU read sections have a separate preemption counter when
> -  * PREEMPT_RCU enabled thus we must take extra care and check
> -  * rcu_preempt_depth(), otherwise RCU read sections modify
> -  * preempt_count().
> -  */
> - console_may_schedule = !oops_in_progress &&
> - preemptible() &&
> - !rcu_preempt_depth();
> + console_may_schedule = 0;
>   return 1;
>  }
>  EXPORT_SYMBOL(console_trylock);

Reviewed-by: Steven Rostedt (VMware) 

Thanks Sergey!

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Petr Mladek
On Tue 2018-01-16 13:47:16, Sergey Senozhatsky wrote:
> if you don't mind, let me fix the thing that I broke.
> that would be responsible. I believe I also must say the following:
>   Tetsuo, many thanks for reporting the issues for song long, and
>   sorry that it took quite a while to revert that change.
> 
> 8<
> 
> From: Sergey Senozhatsky 
> Subject: [PATCH] printk: never set console_may_schedule in console_trylock()
> 
> This patch, basically, reverts commit 6b97a20d3a79 ("printk:
> set may_schedule for some of console_trylock() callers").
> That commit was a mistake, it introduced a big dependency
> on the scheduler, by enabling preemption under console_sem
> in printk()->console_unlock() path, which is rather too
> critical. The patch did not significantly reduce the
> possibilities of printk() lockups, but made it possible to
> stall printk(), as has been reported by Tetsuo Handa [1].
> 
> Another issues is that preemption under console_sem also
> messes up with Steven Rostedt's hand off scheme, by making
> it possible to sleep with console_sem both in console_unlock()
> and in vprintk_emit(), after acquiring the console_sem
> ownership (anywhere between printk_safe_exit_irqrestore() in
> console_trylock_spinning() and printk_safe_enter_irqsave()
> in console_unlock()). This makes hand off less likely and,
> at the same time, may result in a significant amount of
> pending logbuf messages. Preempted console_sem owner makes
> it impossible for other CPUs to emit logbuf messages, but
> does not make it impossible for other CPUs to append new
> messages to the logbuf.
> 
> Reinstate the old behavior and make printk() non-preemptible.
> Should any printk() lockup reports arrive they must be handled
> in a different way.
> 
> [1] https://marc.info/?l=linux-mm=145692016122716
> Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() 
> callers")
> Signed-off-by: Sergey Senozhatsky 
> Reported-by: Tetsuo Handa 

IMHO, this is a step in the right direction.

Reviewed-by: Petr Mladek 

I'll wait for Steven's review and push this into printk.git.
I'll also add your Acks for the other patches.

Thanks for the patch and the various observations.

Best Regards,
Petr


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Petr Mladek
On Tue 2018-01-16 13:47:16, Sergey Senozhatsky wrote:
> if you don't mind, let me fix the thing that I broke.
> that would be responsible. I believe I also must say the following:
>   Tetsuo, many thanks for reporting the issues for song long, and
>   sorry that it took quite a while to revert that change.
> 
> 8<
> 
> From: Sergey Senozhatsky 
> Subject: [PATCH] printk: never set console_may_schedule in console_trylock()
> 
> This patch, basically, reverts commit 6b97a20d3a79 ("printk:
> set may_schedule for some of console_trylock() callers").
> That commit was a mistake, it introduced a big dependency
> on the scheduler, by enabling preemption under console_sem
> in printk()->console_unlock() path, which is rather too
> critical. The patch did not significantly reduce the
> possibilities of printk() lockups, but made it possible to
> stall printk(), as has been reported by Tetsuo Handa [1].
> 
> Another issues is that preemption under console_sem also
> messes up with Steven Rostedt's hand off scheme, by making
> it possible to sleep with console_sem both in console_unlock()
> and in vprintk_emit(), after acquiring the console_sem
> ownership (anywhere between printk_safe_exit_irqrestore() in
> console_trylock_spinning() and printk_safe_enter_irqsave()
> in console_unlock()). This makes hand off less likely and,
> at the same time, may result in a significant amount of
> pending logbuf messages. Preempted console_sem owner makes
> it impossible for other CPUs to emit logbuf messages, but
> does not make it impossible for other CPUs to append new
> messages to the logbuf.
> 
> Reinstate the old behavior and make printk() non-preemptible.
> Should any printk() lockup reports arrive they must be handled
> in a different way.
> 
> [1] https://marc.info/?l=linux-mm=145692016122716
> Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() 
> callers")
> Signed-off-by: Sergey Senozhatsky 
> Reported-by: Tetsuo Handa 

IMHO, this is a step in the right direction.

Reviewed-by: Petr Mladek 

I'll wait for Steven's review and push this into printk.git.
I'll also add your Acks for the other patches.

Thanks for the patch and the various observations.

Best Regards,
Petr


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Petr Mladek
On Tue 2018-01-16 11:23:49, Sergey Senozhatsky wrote:
> On (01/15/18 15:45), Petr Mladek wrote:
> > > I think adding the preempt_disable() would fix printk() but let non
> > > printk console_unlock() still preempt.
> > 
> > I would personally remove cond_resched() from console_unlock()
> > completely.
> 
> hmm, not so sure. I think it's there for !PREEMPT systems which have
> to print a lot of messages. the case I'm speaking about in particular
> is when we register a CON_PRINTBUFFER console and need to console_unlock()
> (flush) all of the messages we currently have in the logbuf. we better
> have that cond_resched() there, I think.

Good point. I agree that we should keep the cond_resched() there
at least for now.


> > Sleeping in console_unlock() increases the chance that more messages
> > would need to be handled. And more importantly it reduces the chance
> > of a successful handover.
> > 
> > As a result, the caller might spend there very long time, it might
> > be getting increasingly far behind. There is higher risk of lost
> > messages. Also the eventual taker might have too much to proceed
> > in preemption disabled context.
> 
> yes.
> 
> > Removing cond_resched() is in sync with printk() priorities.
> 
> hmm, not sure. we have sleeping console_lock()->console_unlock() path
> for PREEMPT kernels, that cond_resched() makes the !PREEMPT kernels to
> have the same sleeping console_lock()->console_unlock().
> 
> printk()->console_unlock() seems to be a pretty independent thing,
> unfortunately (!), yet sleeping console_lock()->console_unlock()
> messes up with it a lot.

IMHO, the problem here is that console_lock is used to synchronize
too many things. It would be great to separate printk() duties
into a separate lock in the long term.

Anyway, I see it the following way. Most console_lock() callers
do the following things:

void foo()
{
console_lock()
foo_specific_work();
console_unlock();
}

where console_unlock() flushes the printk buffer before actually
releasing the lock.

IMHO, it would make sense if flushing the printk buffer behaves
the same when called either from printk() or from any other path.
I mean that it should be aggressive and allow an effective
hand off.

It should be safe as long as foo_specific_work() does not take
too much time.

>From other side. The cond_resched() in console_unlock() should
be obsoleted by the hand-shake code.


> > The highest one is to get the messages out.
> > 
> > Finally, removing cond_resched() should make the behavior more
> > predictable (never preempted)
> 
> but we are always preempted in PREEMPT kernels when the current
> console_sem owner acquired the lock via console_lock(), not via
> console_trylock(). cond_resched() does the same, but for !PREEMPT.

I agree that the situation is more complicated for cond_resched()
called after console_lock(). I do not resist on removing it now.

Just one more thing. The time axe looks like:

+ cond_resched added into console_unlock in v4.5-rc1, Jan 15, 2016
 (commit 8d91f8b15361dfb438ab6)

+ preemtion enabled in printk in, v4.6-rc1, Mar 17, 2016
 (commit 6b97a20d3a7909daa0662)

They both were obvious solutions that helped to reduce the risk
of soft-lockups. The first one handled evidently safe scenarios.
The second one was even more aggressive. I would say that
they both were more or less add-hoc solutions that did not
take into account the other side effects (delaying output,
even loosing messages).

I would not say that one is a diametric difference between them.
Therefore if we remove one for a reason, we should think about
reverting the other as well. But again. I am fine if we remove
only one now.

Does this make any sense?

Best Regard,
Petr


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-16 Thread Petr Mladek
On Tue 2018-01-16 11:23:49, Sergey Senozhatsky wrote:
> On (01/15/18 15:45), Petr Mladek wrote:
> > > I think adding the preempt_disable() would fix printk() but let non
> > > printk console_unlock() still preempt.
> > 
> > I would personally remove cond_resched() from console_unlock()
> > completely.
> 
> hmm, not so sure. I think it's there for !PREEMPT systems which have
> to print a lot of messages. the case I'm speaking about in particular
> is when we register a CON_PRINTBUFFER console and need to console_unlock()
> (flush) all of the messages we currently have in the logbuf. we better
> have that cond_resched() there, I think.

Good point. I agree that we should keep the cond_resched() there
at least for now.


> > Sleeping in console_unlock() increases the chance that more messages
> > would need to be handled. And more importantly it reduces the chance
> > of a successful handover.
> > 
> > As a result, the caller might spend there very long time, it might
> > be getting increasingly far behind. There is higher risk of lost
> > messages. Also the eventual taker might have too much to proceed
> > in preemption disabled context.
> 
> yes.
> 
> > Removing cond_resched() is in sync with printk() priorities.
> 
> hmm, not sure. we have sleeping console_lock()->console_unlock() path
> for PREEMPT kernels, that cond_resched() makes the !PREEMPT kernels to
> have the same sleeping console_lock()->console_unlock().
> 
> printk()->console_unlock() seems to be a pretty independent thing,
> unfortunately (!), yet sleeping console_lock()->console_unlock()
> messes up with it a lot.

IMHO, the problem here is that console_lock is used to synchronize
too many things. It would be great to separate printk() duties
into a separate lock in the long term.

Anyway, I see it the following way. Most console_lock() callers
do the following things:

void foo()
{
console_lock()
foo_specific_work();
console_unlock();
}

where console_unlock() flushes the printk buffer before actually
releasing the lock.

IMHO, it would make sense if flushing the printk buffer behaves
the same when called either from printk() or from any other path.
I mean that it should be aggressive and allow an effective
hand off.

It should be safe as long as foo_specific_work() does not take
too much time.

>From other side. The cond_resched() in console_unlock() should
be obsoleted by the hand-shake code.


> > The highest one is to get the messages out.
> > 
> > Finally, removing cond_resched() should make the behavior more
> > predictable (never preempted)
> 
> but we are always preempted in PREEMPT kernels when the current
> console_sem owner acquired the lock via console_lock(), not via
> console_trylock(). cond_resched() does the same, but for !PREEMPT.

I agree that the situation is more complicated for cond_resched()
called after console_lock(). I do not resist on removing it now.

Just one more thing. The time axe looks like:

+ cond_resched added into console_unlock in v4.5-rc1, Jan 15, 2016
 (commit 8d91f8b15361dfb438ab6)

+ preemtion enabled in printk in, v4.6-rc1, Mar 17, 2016
 (commit 6b97a20d3a7909daa0662)

They both were obvious solutions that helped to reduce the risk
of soft-lockups. The first one handled evidently safe scenarios.
The second one was even more aggressive. I would say that
they both were more or less add-hoc solutions that did not
take into account the other side effects (delaying output,
even loosing messages).

I would not say that one is a diametric difference between them.
Therefore if we remove one for a reason, we should think about
reverting the other as well. But again. I am fine if we remove
only one now.

Does this make any sense?

Best Regard,
Petr


  1   2   3   >