Re: [RFC 0/5] printk: Implement WARN_*DEFERRED()

2016-10-05 Thread Petr Mladek
On Fri 2016-09-30 09:48:32, Sergey Senozhatsky wrote:
> On (09/29/16 13:28), Petr Mladek wrote:
> > On Wed 2016-09-28 10:18:45, Sergey Senozhatsky wrote:
> > > On (09/27/16 18:02), Petr Mladek wrote:
> > > > The main trick is that we replace the per-CPU function pointer
> > > > by a preempt_count-like variable that could track the printk context.
> > > > 
> > > > I know that Sergey has another ideas in this area. But I wanted to see
> > > > how this approach would look like.
> > > 
> > > well, yes. I was looking at WARN_*_DEFERRED [1] for some time, and, I
> > > think, the maintenance cost of that solution is just too high:
> > > 
> > > a) every existing WARN_* in sched/timekeeping/who knows where else
> > >must be evaluated to ensure that in can't be called from printk()
> > >path. if `false' - then the corresponding macro must be replaced
> > >with _DEFERRED flavor.
> > > 
> > > b) any patch that adds new WARN_* usages must be additionally checked
> > >to ensure that each of new WARN_* macros cannot be called from printk
> > >path. if `false' -- the corresponding macro must be replaced with
> > >_DEFERRED flavor.
> > > 
> > > c) any patch that refactors the code or moves some function calls around
> > >etc. must be additionally checked for any accidental WARN_* from printk
> > >path. even though if none of the patches added any new WARN_* to the 
> > > code.
> > > 
> > > b) apart from WARN_* there can be `accidental' pr_err/pr_debug/etc. not
> > >necessarily newly added (see 'c').
> > > 
> > > 
> > > that's too much.
> > > 
> > > it takes a lot of additional effort, because both reviewer and contributor
> > > must consider printk() internals. and, what's worse, if something goes
> > > unnoticed we end up having a printk() deadlock again.
> > > 
> > > so I decided to address some of printk() issues in printk.c, not in
> > > kernel/time/timekeeping.c or kernel/sched/core.c or anywhere else.

I do not longer see how this might be achieved. If a printk()/WARN()
in the scheduler/timekeeping code can be reached from printk() then
it might too be reached outside printk. In this case, printk()
will not know about it and will happily call the scheduler/timekeeping
code recursively. This might still cause deadlock. 


> > I see the point.
> 
> well, just my 5 cents.
> 
> > Your approach (alt buffer) adds some complexity to the printk code
> 
> it does.
> the other thing is that there are several ways to deadlock printk().
> alt_printk is addressing deadlocks that were caused by printk()
> recursion only.
> 
>printk()
>  acquire_lock()
>printk()
>  acquire_lock()

This looks theoretical. The recursion in printk() is not easily
possible at the moment. It is prevented by logbuf_cpu check when
logbug_lock is taken. It is prevented by console_trylock() when
console_sem is taken.

> which is a sub-set of all of the printk() deadlock scenarios. all of
> the locks that printk() acquires can be taken outside of printk() path.
> 
> for example, cat /proc/console locks the console_lock() for seq output.
> thus we can have something like
> 
> console_unlock()  // lock  >lock
>   up()
> activate_task()
>   WARN_ON()
> printk()
>   console_trylock() // lock >lock

The WARN_ON() here is called under >pi_lock that is taken
by try_to_wake_up(). This WARN_ON() can be triggered also
outside printk()/console_unlock(). Therefore it needs to get
replaced by WARN_DEFERRED() anyway.


> DEFERRED_WARN is a good thing; it's just quite hard to keep everything
> working, given that any of those "9 patches per hour" can break something
> with just one WARN_ON().
> 
> 
> I assume that doing something like this
> 
> #define WARN_ON(condition, format...) ({  \
>   printk_deferred_enter();\
>   WARN(condition, ##format);  \
>   printk_deferred_exit(); \
> })
>
> is less than exciting because WARN_ON from irq won't immediately print
> the backtrace anymore.

Yup, we might need WARN_ON_DEFERRED() variant.

Best Regards,
Petr


Re: [RFC 0/5] printk: Implement WARN_*DEFERRED()

2016-10-05 Thread Petr Mladek
On Fri 2016-09-30 09:48:32, Sergey Senozhatsky wrote:
> On (09/29/16 13:28), Petr Mladek wrote:
> > On Wed 2016-09-28 10:18:45, Sergey Senozhatsky wrote:
> > > On (09/27/16 18:02), Petr Mladek wrote:
> > > > The main trick is that we replace the per-CPU function pointer
> > > > by a preempt_count-like variable that could track the printk context.
> > > > 
> > > > I know that Sergey has another ideas in this area. But I wanted to see
> > > > how this approach would look like.
> > > 
> > > well, yes. I was looking at WARN_*_DEFERRED [1] for some time, and, I
> > > think, the maintenance cost of that solution is just too high:
> > > 
> > > a) every existing WARN_* in sched/timekeeping/who knows where else
> > >must be evaluated to ensure that in can't be called from printk()
> > >path. if `false' - then the corresponding macro must be replaced
> > >with _DEFERRED flavor.
> > > 
> > > b) any patch that adds new WARN_* usages must be additionally checked
> > >to ensure that each of new WARN_* macros cannot be called from printk
> > >path. if `false' -- the corresponding macro must be replaced with
> > >_DEFERRED flavor.
> > > 
> > > c) any patch that refactors the code or moves some function calls around
> > >etc. must be additionally checked for any accidental WARN_* from printk
> > >path. even though if none of the patches added any new WARN_* to the 
> > > code.
> > > 
> > > b) apart from WARN_* there can be `accidental' pr_err/pr_debug/etc. not
> > >necessarily newly added (see 'c').
> > > 
> > > 
> > > that's too much.
> > > 
> > > it takes a lot of additional effort, because both reviewer and contributor
> > > must consider printk() internals. and, what's worse, if something goes
> > > unnoticed we end up having a printk() deadlock again.
> > > 
> > > so I decided to address some of printk() issues in printk.c, not in
> > > kernel/time/timekeeping.c or kernel/sched/core.c or anywhere else.

I do not longer see how this might be achieved. If a printk()/WARN()
in the scheduler/timekeeping code can be reached from printk() then
it might too be reached outside printk. In this case, printk()
will not know about it and will happily call the scheduler/timekeeping
code recursively. This might still cause deadlock. 


> > I see the point.
> 
> well, just my 5 cents.
> 
> > Your approach (alt buffer) adds some complexity to the printk code
> 
> it does.
> the other thing is that there are several ways to deadlock printk().
> alt_printk is addressing deadlocks that were caused by printk()
> recursion only.
> 
>printk()
>  acquire_lock()
>printk()
>  acquire_lock()

This looks theoretical. The recursion in printk() is not easily
possible at the moment. It is prevented by logbuf_cpu check when
logbug_lock is taken. It is prevented by console_trylock() when
console_sem is taken.

> which is a sub-set of all of the printk() deadlock scenarios. all of
> the locks that printk() acquires can be taken outside of printk() path.
> 
> for example, cat /proc/console locks the console_lock() for seq output.
> thus we can have something like
> 
> console_unlock()  // lock  >lock
>   up()
> activate_task()
>   WARN_ON()
> printk()
>   console_trylock() // lock >lock

The WARN_ON() here is called under >pi_lock that is taken
by try_to_wake_up(). This WARN_ON() can be triggered also
outside printk()/console_unlock(). Therefore it needs to get
replaced by WARN_DEFERRED() anyway.


> DEFERRED_WARN is a good thing; it's just quite hard to keep everything
> working, given that any of those "9 patches per hour" can break something
> with just one WARN_ON().
> 
> 
> I assume that doing something like this
> 
> #define WARN_ON(condition, format...) ({  \
>   printk_deferred_enter();\
>   WARN(condition, ##format);  \
>   printk_deferred_exit(); \
> })
>
> is less than exciting because WARN_ON from irq won't immediately print
> the backtrace anymore.

Yup, we might need WARN_ON_DEFERRED() variant.

Best Regards,
Petr


Re: [RFC 0/5] printk: Implement WARN_*DEFERRED()

2016-09-29 Thread Sergey Senozhatsky
On (09/29/16 13:28), Petr Mladek wrote:
> On Wed 2016-09-28 10:18:45, Sergey Senozhatsky wrote:
> > On (09/27/16 18:02), Petr Mladek wrote:
> > > The main trick is that we replace the per-CPU function pointer
> > > by a preempt_count-like variable that could track the printk context.
> > > 
> > > I know that Sergey has another ideas in this area. But I wanted to see
> > > how this approach would look like.
> > 
> > well, yes. I was looking at WARN_*_DEFERRED [1] for some time, and, I
> > think, the maintenance cost of that solution is just too high:
> > 
> > a) every existing WARN_* in sched/timekeeping/who knows where else
> >must be evaluated to ensure that in can't be called from printk()
> >path. if `false' - then the corresponding macro must be replaced
> >with _DEFERRED flavor.
> > 
> > b) any patch that adds new WARN_* usages must be additionally checked
> >to ensure that each of new WARN_* macros cannot be called from printk
> >path. if `false' -- the corresponding macro must be replaced with
> >_DEFERRED flavor.
> > 
> > c) any patch that refactors the code or moves some function calls around
> >etc. must be additionally checked for any accidental WARN_* from printk
> >path. even though if none of the patches added any new WARN_* to the 
> > code.
> > 
> > b) apart from WARN_* there can be `accidental' pr_err/pr_debug/etc. not
> >necessarily newly added (see 'c').
> > 
> > 
> > that's too much.
> > 
> > it takes a lot of additional effort, because both reviewer and contributor
> > must consider printk() internals. and, what's worse, if something goes
> > unnoticed we end up having a printk() deadlock again.
> > 
> > so I decided to address some of printk() issues in printk.c, not in
> > kernel/time/timekeeping.c or kernel/sched/core.c or anywhere else.
> 
> I see the point.

well, just my 5 cents.

> Your approach (alt buffer) adds some complexity to the printk code

it does.
the other thing is that there are several ways to deadlock printk().
alt_printk is addressing deadlocks that were caused by printk()
recursion only.

   printk()
 acquire_lock()
   printk()
 acquire_lock()

which is a sub-set of all of the printk() deadlock scenarios. all of
the locks that printk() acquires can be taken outside of printk() path.

for example, cat /proc/console locks the console_lock() for seq output.
thus we can have something like

console_unlock()// lock  >lock
  up()
activate_task()
  WARN_ON()
printk()
  console_trylock() // lock >lock


DEFERRED_WARN is a good thing; it's just quite hard to keep everything
working, given that any of those "9 patches per hour" can break something
with just one WARN_ON().


I assume that doing something like this

#define WARN_ON(condition, format...) ({\
printk_deferred_enter();\
WARN(condition, ##format);  \
printk_deferred_exit(); \
})

is less than exciting because WARN_ON from irq won't immediately print
the backtrace anymore.

thoughts?

> but it allows to remove printk_deferred()/WARN_DEFERRED() and all
> the risk of it.

at some point we even can drop the entire deferred_printk() thing.
but alt_printk needs some love and care first.

> I am going to look closely on it.

thanks.

-ss


Re: [RFC 0/5] printk: Implement WARN_*DEFERRED()

2016-09-29 Thread Sergey Senozhatsky
On (09/29/16 13:28), Petr Mladek wrote:
> On Wed 2016-09-28 10:18:45, Sergey Senozhatsky wrote:
> > On (09/27/16 18:02), Petr Mladek wrote:
> > > The main trick is that we replace the per-CPU function pointer
> > > by a preempt_count-like variable that could track the printk context.
> > > 
> > > I know that Sergey has another ideas in this area. But I wanted to see
> > > how this approach would look like.
> > 
> > well, yes. I was looking at WARN_*_DEFERRED [1] for some time, and, I
> > think, the maintenance cost of that solution is just too high:
> > 
> > a) every existing WARN_* in sched/timekeeping/who knows where else
> >must be evaluated to ensure that in can't be called from printk()
> >path. if `false' - then the corresponding macro must be replaced
> >with _DEFERRED flavor.
> > 
> > b) any patch that adds new WARN_* usages must be additionally checked
> >to ensure that each of new WARN_* macros cannot be called from printk
> >path. if `false' -- the corresponding macro must be replaced with
> >_DEFERRED flavor.
> > 
> > c) any patch that refactors the code or moves some function calls around
> >etc. must be additionally checked for any accidental WARN_* from printk
> >path. even though if none of the patches added any new WARN_* to the 
> > code.
> > 
> > b) apart from WARN_* there can be `accidental' pr_err/pr_debug/etc. not
> >necessarily newly added (see 'c').
> > 
> > 
> > that's too much.
> > 
> > it takes a lot of additional effort, because both reviewer and contributor
> > must consider printk() internals. and, what's worse, if something goes
> > unnoticed we end up having a printk() deadlock again.
> > 
> > so I decided to address some of printk() issues in printk.c, not in
> > kernel/time/timekeeping.c or kernel/sched/core.c or anywhere else.
> 
> I see the point.

well, just my 5 cents.

> Your approach (alt buffer) adds some complexity to the printk code

it does.
the other thing is that there are several ways to deadlock printk().
alt_printk is addressing deadlocks that were caused by printk()
recursion only.

   printk()
 acquire_lock()
   printk()
 acquire_lock()

which is a sub-set of all of the printk() deadlock scenarios. all of
the locks that printk() acquires can be taken outside of printk() path.

for example, cat /proc/console locks the console_lock() for seq output.
thus we can have something like

console_unlock()// lock  >lock
  up()
activate_task()
  WARN_ON()
printk()
  console_trylock() // lock >lock


DEFERRED_WARN is a good thing; it's just quite hard to keep everything
working, given that any of those "9 patches per hour" can break something
with just one WARN_ON().


I assume that doing something like this

#define WARN_ON(condition, format...) ({\
printk_deferred_enter();\
WARN(condition, ##format);  \
printk_deferred_exit(); \
})

is less than exciting because WARN_ON from irq won't immediately print
the backtrace anymore.

thoughts?

> but it allows to remove printk_deferred()/WARN_DEFERRED() and all
> the risk of it.

at some point we even can drop the entire deferred_printk() thing.
but alt_printk needs some love and care first.

> I am going to look closely on it.

thanks.

-ss


Re: [RFC 0/5] printk: Implement WARN_*DEFERRED()

2016-09-29 Thread Petr Mladek
On Wed 2016-09-28 10:18:45, Sergey Senozhatsky wrote:
> On (09/27/16 18:02), Petr Mladek wrote:
> > The main trick is that we replace the per-CPU function pointer
> > by a preempt_count-like variable that could track the printk context.
> > 
> > I know that Sergey has another ideas in this area. But I wanted to see
> > how this approach would look like.
> 
> well, yes. I was looking at WARN_*_DEFERRED [1] for some time, and, I
> think, the maintenance cost of that solution is just too high:
> 
> a) every existing WARN_* in sched/timekeeping/who knows where else
>must be evaluated to ensure that in can't be called from printk()
>path. if `false' - then the corresponding macro must be replaced
>with _DEFERRED flavor.
> 
> b) any patch that adds new WARN_* usages must be additionally checked
>to ensure that each of new WARN_* macros cannot be called from printk
>path. if `false' -- the corresponding macro must be replaced with
>_DEFERRED flavor.
> 
> c) any patch that refactors the code or moves some function calls around
>etc. must be additionally checked for any accidental WARN_* from printk
>path. even though if none of the patches added any new WARN_* to the code.
> 
> b) apart from WARN_* there can be `accidental' pr_err/pr_debug/etc. not
>necessarily newly added (see 'c').
> 
> 
> that's too much.
> 
> it takes a lot of additional effort, because both reviewer and contributor
> must consider printk() internals. and, what's worse, if something goes
> unnoticed we end up having a printk() deadlock again.
> 
> so I decided to address some of printk() issues in printk.c, not in
> kernel/time/timekeeping.c or kernel/sched/core.c or anywhere else.

I see the point.

Your approach (alt buffer) adds some complexity to the printk code but
it allows to remove printk_deferred()/WARN_DEFERRED() and all the risk
of it. I am going to look closely on it.

Best Regards,
Petr


Re: [RFC 0/5] printk: Implement WARN_*DEFERRED()

2016-09-29 Thread Petr Mladek
On Wed 2016-09-28 10:18:45, Sergey Senozhatsky wrote:
> On (09/27/16 18:02), Petr Mladek wrote:
> > The main trick is that we replace the per-CPU function pointer
> > by a preempt_count-like variable that could track the printk context.
> > 
> > I know that Sergey has another ideas in this area. But I wanted to see
> > how this approach would look like.
> 
> well, yes. I was looking at WARN_*_DEFERRED [1] for some time, and, I
> think, the maintenance cost of that solution is just too high:
> 
> a) every existing WARN_* in sched/timekeeping/who knows where else
>must be evaluated to ensure that in can't be called from printk()
>path. if `false' - then the corresponding macro must be replaced
>with _DEFERRED flavor.
> 
> b) any patch that adds new WARN_* usages must be additionally checked
>to ensure that each of new WARN_* macros cannot be called from printk
>path. if `false' -- the corresponding macro must be replaced with
>_DEFERRED flavor.
> 
> c) any patch that refactors the code or moves some function calls around
>etc. must be additionally checked for any accidental WARN_* from printk
>path. even though if none of the patches added any new WARN_* to the code.
> 
> b) apart from WARN_* there can be `accidental' pr_err/pr_debug/etc. not
>necessarily newly added (see 'c').
> 
> 
> that's too much.
> 
> it takes a lot of additional effort, because both reviewer and contributor
> must consider printk() internals. and, what's worse, if something goes
> unnoticed we end up having a printk() deadlock again.
> 
> so I decided to address some of printk() issues in printk.c, not in
> kernel/time/timekeeping.c or kernel/sched/core.c or anywhere else.

I see the point.

Your approach (alt buffer) adds some complexity to the printk code but
it allows to remove printk_deferred()/WARN_DEFERRED() and all the risk
of it. I am going to look closely on it.

Best Regards,
Petr


Re: [RFC 0/5] printk: Implement WARN_*DEFERRED()

2016-09-27 Thread Sergey Senozhatsky
On (09/27/16 18:02), Petr Mladek wrote:
> The main trick is that we replace the per-CPU function pointer
> by a preempt_count-like variable that could track the printk context.
> 
> I know that Sergey has another ideas in this area. But I wanted to see
> how this approach would look like.

well, yes. I was looking at WARN_*_DEFERRED [1] for some time, and, I
think, the maintenance cost of that solution is just too high:

a) every existing WARN_* in sched/timekeeping/who knows where else
   must be evaluated to ensure that in can't be called from printk()
   path. if `false' - then the corresponding macro must be replaced
   with _DEFERRED flavor.

b) any patch that adds new WARN_* usages must be additionally checked
   to ensure that each of new WARN_* macros cannot be called from printk
   path. if `false' -- the corresponding macro must be replaced with
   _DEFERRED flavor.

c) any patch that refactors the code or moves some function calls around
   etc. must be additionally checked for any accidental WARN_* from printk
   path. even though if none of the patches added any new WARN_* to the code.

b) apart from WARN_* there can be `accidental' pr_err/pr_debug/etc. not
   necessarily newly added (see 'c').


that's too much.
for example [not blaming anyone], a recent patch [2] that added a reasonable
WARN_ON_ONCE to assert_clock_updated() which, however, can result in a
possible printk() deadlock scenario that you, Petr, outlined [3]:

:+ printk()
:  + vprintk_func -> vprintk_default()
:+ vprinkt_emit()
:  + console_unlock()
:+ up_console_sem()
:  + up()# takes >lock
:+ __up()
:  + wake_up_process()
:+ try_to_wake_up()
:  + ttwu_queue()
:+ ttwu_do_activate()
:  + ttwu_do_wakeup()
:+ rq_clock()
:  + lockdep_assert_held()
:+ WARN_ON_ONCE()
:  + printk()
:+ vprintk_func -> vprintk_default()
:  + vprintk_emit()
:+ console_try_lock()
:  + down_trylock_console_sem()
:+ __down_trylock_console_sem()
:  + down_trylock()


it takes a lot of additional effort, because both reviewer and contributor
must consider printk() internals. and, what's worse, if something goes
unnoticed we end up having a printk() deadlock again.

so I decided to address some of printk() issues in printk.c, not in
kernel/time/timekeeping.c or kernel/sched/core.c or anywhere else.


> Mid-air collision:
>
> I have just realized that Sergey sent another patchset that was
> more generic, complicated, and had some similarities, see
> https://lkml.kernel.org/r/20160927142237.5539-1-sergey.senozhat...@gmail.com

yeah, I should have Cc-ed a wider audience. do I need to resend the
patch set with the `extended' Cc list?


[1] https://marc.info/?l=linux-kernel=147158843319944
[2] https://marc.info/?l=linux-kernel=147446511924573
[3] https://marc.info/?l=linux-kernel=147447352127741

-ss


Re: [RFC 0/5] printk: Implement WARN_*DEFERRED()

2016-09-27 Thread Sergey Senozhatsky
On (09/27/16 18:02), Petr Mladek wrote:
> The main trick is that we replace the per-CPU function pointer
> by a preempt_count-like variable that could track the printk context.
> 
> I know that Sergey has another ideas in this area. But I wanted to see
> how this approach would look like.

well, yes. I was looking at WARN_*_DEFERRED [1] for some time, and, I
think, the maintenance cost of that solution is just too high:

a) every existing WARN_* in sched/timekeeping/who knows where else
   must be evaluated to ensure that in can't be called from printk()
   path. if `false' - then the corresponding macro must be replaced
   with _DEFERRED flavor.

b) any patch that adds new WARN_* usages must be additionally checked
   to ensure that each of new WARN_* macros cannot be called from printk
   path. if `false' -- the corresponding macro must be replaced with
   _DEFERRED flavor.

c) any patch that refactors the code or moves some function calls around
   etc. must be additionally checked for any accidental WARN_* from printk
   path. even though if none of the patches added any new WARN_* to the code.

b) apart from WARN_* there can be `accidental' pr_err/pr_debug/etc. not
   necessarily newly added (see 'c').


that's too much.
for example [not blaming anyone], a recent patch [2] that added a reasonable
WARN_ON_ONCE to assert_clock_updated() which, however, can result in a
possible printk() deadlock scenario that you, Petr, outlined [3]:

:+ printk()
:  + vprintk_func -> vprintk_default()
:+ vprinkt_emit()
:  + console_unlock()
:+ up_console_sem()
:  + up()# takes >lock
:+ __up()
:  + wake_up_process()
:+ try_to_wake_up()
:  + ttwu_queue()
:+ ttwu_do_activate()
:  + ttwu_do_wakeup()
:+ rq_clock()
:  + lockdep_assert_held()
:+ WARN_ON_ONCE()
:  + printk()
:+ vprintk_func -> vprintk_default()
:  + vprintk_emit()
:+ console_try_lock()
:  + down_trylock_console_sem()
:+ __down_trylock_console_sem()
:  + down_trylock()


it takes a lot of additional effort, because both reviewer and contributor
must consider printk() internals. and, what's worse, if something goes
unnoticed we end up having a printk() deadlock again.

so I decided to address some of printk() issues in printk.c, not in
kernel/time/timekeeping.c or kernel/sched/core.c or anywhere else.


> Mid-air collision:
>
> I have just realized that Sergey sent another patchset that was
> more generic, complicated, and had some similarities, see
> https://lkml.kernel.org/r/20160927142237.5539-1-sergey.senozhat...@gmail.com

yeah, I should have Cc-ed a wider audience. do I need to resend the
patch set with the `extended' Cc list?


[1] https://marc.info/?l=linux-kernel=147158843319944
[2] https://marc.info/?l=linux-kernel=147446511924573
[3] https://marc.info/?l=linux-kernel=147447352127741

-ss