Re: [RFC][PATCHv4 4/6] printk: report lost messages in printk safe/nmi contexts

2016-12-01 Thread Petr Mladek
On Thu 2016-12-01 11:10:42, Sergey Senozhatsky wrote:
> On (11/25/16 12:07), Petr Mladek wrote:
> [..]
> > > +static void report_message_lost(atomic_t *num_lost, char *fmt)
> > > +{
> > > + int lost = atomic_xchg(num_lost, 0);
> > > +
> > > + if (lost) {
> > > + char msg[56];
>
> > I would really like to avoid a hard coded buffer size. Such things
> > are likely to bite us in the future.
> 
> why would scnprintf() overflow.

Sure, it will not overflow. But still, such a small hard coded buffer
size looks too hacky to me.


> > I thought about reshuffling a lot of logic, adding more wrappers,
> > ... But the solution might be easy in the end, see below.
> > 
> > > +
> > > + scnprintf(msg, sizeof(msg), fmt, lost);
> > > +
> > > + printk_safe_flush_line(msg, strlen(msg));
> > 
> > This made my brain spin a lot. I wondered if it did what we wanted
> > and it was safe.
> > 
> > On one hand, it is supposed to work because use exactly this
> > function in __printk_safe_flush() where you call this from.
> > 
> > One question is if it does what we want in different contexts.
> > Let's look at it:
> > 
> > 1. It calls printk_deferred() in NMI context. There is a risk
> >of a deadlock. But it is called only from
> >printk_safe_flush_on_panic() which is the last resort. Therefore
> >it does exactly what we want.
> > 
> > 2. It calls printk()->printk_func()->vprintk_emit() in normal context.
> >It is what we want in normal context.
> > 
> > 3. It calls printk()->printk_func()->v printk_safe() in printk_safe
> >context. This does not look correct. IMHO, this might happen
> >only printk_safe_flush_on_panic() and we want to use
> >printk_deferred() here as well.
> [..]
> > The easiest solution would be to simply call printk_deferred()
> > here. Everything will be deferred after the async printk() patchset
> > anyway.
> > 
> > I would even use printk_deferred() in printk_safe_flush_line()
> > for each context. It is not optimal but it works very well
> > and it makes the code much more straightforward.
> 
> yes, good point.
> we can call deferred printk for anything there; or replace that in_nmi()
> check with the `printk_safe_context != 0' one, and then route the message
> via printk or printk_deferred.

Yup, it might be an option and sounds good.

Anyway, I would use printk_deferred() to print the warnings about lost
messages. It is perfectly fine and you will not need the hard coded
temporary buffer.

Best regards,
Petr


Re: [RFC][PATCHv4 4/6] printk: report lost messages in printk safe/nmi contexts

2016-12-01 Thread Petr Mladek
On Thu 2016-12-01 11:10:42, Sergey Senozhatsky wrote:
> On (11/25/16 12:07), Petr Mladek wrote:
> [..]
> > > +static void report_message_lost(atomic_t *num_lost, char *fmt)
> > > +{
> > > + int lost = atomic_xchg(num_lost, 0);
> > > +
> > > + if (lost) {
> > > + char msg[56];
>
> > I would really like to avoid a hard coded buffer size. Such things
> > are likely to bite us in the future.
> 
> why would scnprintf() overflow.

Sure, it will not overflow. But still, such a small hard coded buffer
size looks too hacky to me.


> > I thought about reshuffling a lot of logic, adding more wrappers,
> > ... But the solution might be easy in the end, see below.
> > 
> > > +
> > > + scnprintf(msg, sizeof(msg), fmt, lost);
> > > +
> > > + printk_safe_flush_line(msg, strlen(msg));
> > 
> > This made my brain spin a lot. I wondered if it did what we wanted
> > and it was safe.
> > 
> > On one hand, it is supposed to work because use exactly this
> > function in __printk_safe_flush() where you call this from.
> > 
> > One question is if it does what we want in different contexts.
> > Let's look at it:
> > 
> > 1. It calls printk_deferred() in NMI context. There is a risk
> >of a deadlock. But it is called only from
> >printk_safe_flush_on_panic() which is the last resort. Therefore
> >it does exactly what we want.
> > 
> > 2. It calls printk()->printk_func()->vprintk_emit() in normal context.
> >It is what we want in normal context.
> > 
> > 3. It calls printk()->printk_func()->v printk_safe() in printk_safe
> >context. This does not look correct. IMHO, this might happen
> >only printk_safe_flush_on_panic() and we want to use
> >printk_deferred() here as well.
> [..]
> > The easiest solution would be to simply call printk_deferred()
> > here. Everything will be deferred after the async printk() patchset
> > anyway.
> > 
> > I would even use printk_deferred() in printk_safe_flush_line()
> > for each context. It is not optimal but it works very well
> > and it makes the code much more straightforward.
> 
> yes, good point.
> we can call deferred printk for anything there; or replace that in_nmi()
> check with the `printk_safe_context != 0' one, and then route the message
> via printk or printk_deferred.

Yup, it might be an option and sounds good.

Anyway, I would use printk_deferred() to print the warnings about lost
messages. It is perfectly fine and you will not need the hard coded
temporary buffer.

Best regards,
Petr


Re: [RFC][PATCHv4 4/6] printk: report lost messages in printk safe/nmi contexts

2016-11-30 Thread Sergey Senozhatsky
On (11/25/16 12:07), Petr Mladek wrote:
[..]
> > +static void report_message_lost(atomic_t *num_lost, char *fmt)
> > +{
> > +   int lost = atomic_xchg(num_lost, 0);
> > +
> > +   if (lost) {
> > +   char msg[56];
> 
> I would really like to avoid a hard coded buffer size. Such things
> are likely to bite us in the future.

why would scnprintf() overflow.

> I thought about reshuffling a lot of logic, adding more wrappers,
> ... But the solution might be easy in the end, see below.
> 
> > +
> > +   scnprintf(msg, sizeof(msg), fmt, lost);
> > +
> > +   printk_safe_flush_line(msg, strlen(msg));
> 
> This made my brain spin a lot. I wondered if it did what we wanted
> and it was safe.
> 
> On one hand, it is supposed to work because use exactly this
> function in __printk_safe_flush() where you call this from.
> 
> One question is if it does what we want in different contexts.
> Let's look at it:
> 
> 1. It calls printk_deferred() in NMI context. There is a risk
>of a deadlock. But it is called only from
>printk_safe_flush_on_panic() which is the last resort. Therefore
>it does exactly what we want.
> 
> 2. It calls printk()->printk_func()->vprintk_emit() in normal context.
>It is what we want in normal context.
> 
> 3. It calls printk()->printk_func()->v printk_safe() in printk_safe
>context. This does not look correct. IMHO, this might happen
>only printk_safe_flush_on_panic() and we want to use
>printk_deferred() here as well.
[..]
> The easiest solution would be to simply call printk_deferred()
> here. Everything will be deferred after the async printk() patchset
> anyway.
> 
> I would even use printk_deferred() in printk_safe_flush_line()
> for each context. It is not optimal but it works very well
> and it makes the code much more straightforward.

yes, good point.
we can call deferred printk for anything there; or replace that in_nmi()
check with the `printk_safe_context != 0' one, and then route the message
via printk or printk_deferred.


[..]
> >   * Flush data from the associated per-CPU buffer. The function
> >   * can be called either via IRQ work or independently.
> > @@ -147,6 +183,9 @@ static void __printk_safe_flush(struct irq_work *work)
> >  
> > i = 0;
> >  more:
> > +   report_nmi_message_lost();
> > +   report_safe_message_lost();
> 
> Please, move this at the end of this function.

ok.

-ss


Re: [RFC][PATCHv4 4/6] printk: report lost messages in printk safe/nmi contexts

2016-11-30 Thread Sergey Senozhatsky
On (11/25/16 12:07), Petr Mladek wrote:
[..]
> > +static void report_message_lost(atomic_t *num_lost, char *fmt)
> > +{
> > +   int lost = atomic_xchg(num_lost, 0);
> > +
> > +   if (lost) {
> > +   char msg[56];
> 
> I would really like to avoid a hard coded buffer size. Such things
> are likely to bite us in the future.

why would scnprintf() overflow.

> I thought about reshuffling a lot of logic, adding more wrappers,
> ... But the solution might be easy in the end, see below.
> 
> > +
> > +   scnprintf(msg, sizeof(msg), fmt, lost);
> > +
> > +   printk_safe_flush_line(msg, strlen(msg));
> 
> This made my brain spin a lot. I wondered if it did what we wanted
> and it was safe.
> 
> On one hand, it is supposed to work because use exactly this
> function in __printk_safe_flush() where you call this from.
> 
> One question is if it does what we want in different contexts.
> Let's look at it:
> 
> 1. It calls printk_deferred() in NMI context. There is a risk
>of a deadlock. But it is called only from
>printk_safe_flush_on_panic() which is the last resort. Therefore
>it does exactly what we want.
> 
> 2. It calls printk()->printk_func()->vprintk_emit() in normal context.
>It is what we want in normal context.
> 
> 3. It calls printk()->printk_func()->v printk_safe() in printk_safe
>context. This does not look correct. IMHO, this might happen
>only printk_safe_flush_on_panic() and we want to use
>printk_deferred() here as well.
[..]
> The easiest solution would be to simply call printk_deferred()
> here. Everything will be deferred after the async printk() patchset
> anyway.
> 
> I would even use printk_deferred() in printk_safe_flush_line()
> for each context. It is not optimal but it works very well
> and it makes the code much more straightforward.

yes, good point.
we can call deferred printk for anything there; or replace that in_nmi()
check with the `printk_safe_context != 0' one, and then route the message
via printk or printk_deferred.


[..]
> >   * Flush data from the associated per-CPU buffer. The function
> >   * can be called either via IRQ work or independently.
> > @@ -147,6 +183,9 @@ static void __printk_safe_flush(struct irq_work *work)
> >  
> > i = 0;
> >  more:
> > +   report_nmi_message_lost();
> > +   report_safe_message_lost();
> 
> Please, move this at the end of this function.

ok.

-ss


Re: [RFC][PATCHv4 4/6] printk: report lost messages in printk safe/nmi contexts

2016-11-25 Thread Petr Mladek
On Fri 2016-10-28 00:49:31, Sergey Senozhatsky wrote:
> Account lost messages in pritk-safe and printk-safe-nmi
> contexts and report those numbers during printk_safe_flush().
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  kernel/printk/internal.h| 17 ---
>  kernel/printk/printk.c  | 10 -
>  kernel/printk/printk_safe.c | 50 
> +++--
>  3 files changed, 48 insertions(+), 29 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1878b2d..4675b8d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1795,7 +1795,6 @@ asmlinkage int vprintk_emit(int facility, int level,
>   unsigned long flags;
>   int this_cpu;
>   int printed_len = 0;
> - int nmi_message_lost;
>   bool in_sched = false;
>   /* cpu currently holding logbuf_lock in this function */
>   static unsigned int logbuf_cpu = UINT_MAX;
> @@ -1846,15 +1845,6 @@ asmlinkage int vprintk_emit(int facility, int level,
>strlen(recursion_msg));
>   }
>  
> - nmi_message_lost = get_nmi_message_lost();
> - if (unlikely(nmi_message_lost)) {
> - text_len = scnprintf(textbuf, sizeof(textbuf),
> -  "BAD LUCK: lost %d message(s) from NMI 
> context!",
> -  nmi_message_lost);
> - printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
> -  NULL, 0, textbuf, text_len);
> - }

I really like that we are moving this out of vprintk_emit() but...

> -
>   /*
>* The printf needs to come first; we need the syslog
>* prefix which might be passed-in as a parameter.
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index af74d9c..721b091 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -52,10 +52,11 @@ struct printk_safe_seq_buf {
>  
>  static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
>  static DEFINE_PER_CPU(int, printk_safe_context);
> +static atomic_t safe_message_lost;
>  
>  #ifdef CONFIG_PRINTK_NMI
>  static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
> -atomic_t nmi_message_lost;
> +static atomic_t nmi_message_lost;
>  #endif
>  
>  static int printk_safe_log_store(struct printk_safe_seq_buf *s,
> @@ -122,6 +123,41 @@ static void printk_safe_flush_seq_line(struct 
> printk_safe_seq_buf *s,
>   printk_safe_flush_line(buf, (end - start) + 1);
>  }
>  
> +static void report_message_lost(atomic_t *num_lost, char *fmt)
> +{
> + int lost = atomic_xchg(num_lost, 0);
> +
> + if (lost) {
> + char msg[56];

I would really like to avoid a hard coded buffer size. Such things
are likely to bite us in the future.

I thought about reshuffling a lot of logic, adding more wrappers,
... But the solution might be easy in the end, see below.

> +
> + scnprintf(msg, sizeof(msg), fmt, lost);
> +
> + printk_safe_flush_line(msg, strlen(msg));

This made my brain spin a lot. I wondered if it did what we wanted
and it was safe.

On one hand, it is supposed to work because use exactly this
function in __printk_safe_flush() where you call this from.

One question is if it does what we want in different contexts.
Let's look at it:

1. It calls printk_deferred() in NMI context. There is a risk
   of a deadlock. But it is called only from
   printk_safe_flush_on_panic() which is the last resort. Therefore
   it does exactly what we want.

2. It calls printk()->printk_func()->vprintk_emit() in normal context.
   It is what we want in normal context.

3. It calls printk()->printk_func()->v printk_safe() in printk_safe
   context. This does not look correct. IMHO, this might happen
   only printk_safe_flush_on_panic() and we want to use
   printk_deferred() here as well.

   Also note that printk_safe buffer might still be full at this
   point. It means that the warning will not be stored. There also
   might be a risk of an infinite loop. Both problems are
   solved by printk_deferred().


The easiest solution would be to simply call printk_deferred()
here. Everything will be deferred after the async printk() patchset
anyway.

I would even use printk_deferred() in printk_safe_flush_line()
for each context. It is not optimal but it works very well
and it makes the code much more straightforward.


> + }
> +}
> +
> +#ifdef CONFIG_PRINTK_NMI
> +
> +static void report_nmi_message_lost(void)
> +{
> + report_message_lost(_message_lost,
> + "Lost %d message(s) from NMI context!");
> +}
> +
> +#else
> +
> +static void report_nmi_message_lost(void)
> +{
> +}
> +
> +#endif /* CONFIG_PRINTK_NMI */
> +
> +static void report_safe_message_lost(void)
> +{
> + report_message_lost(_message_lost,
> + "Lost %d message(s) from printk-safe context!");
> +}

Re: [RFC][PATCHv4 4/6] printk: report lost messages in printk safe/nmi contexts

2016-11-25 Thread Petr Mladek
On Fri 2016-10-28 00:49:31, Sergey Senozhatsky wrote:
> Account lost messages in pritk-safe and printk-safe-nmi
> contexts and report those numbers during printk_safe_flush().
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  kernel/printk/internal.h| 17 ---
>  kernel/printk/printk.c  | 10 -
>  kernel/printk/printk_safe.c | 50 
> +++--
>  3 files changed, 48 insertions(+), 29 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1878b2d..4675b8d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1795,7 +1795,6 @@ asmlinkage int vprintk_emit(int facility, int level,
>   unsigned long flags;
>   int this_cpu;
>   int printed_len = 0;
> - int nmi_message_lost;
>   bool in_sched = false;
>   /* cpu currently holding logbuf_lock in this function */
>   static unsigned int logbuf_cpu = UINT_MAX;
> @@ -1846,15 +1845,6 @@ asmlinkage int vprintk_emit(int facility, int level,
>strlen(recursion_msg));
>   }
>  
> - nmi_message_lost = get_nmi_message_lost();
> - if (unlikely(nmi_message_lost)) {
> - text_len = scnprintf(textbuf, sizeof(textbuf),
> -  "BAD LUCK: lost %d message(s) from NMI 
> context!",
> -  nmi_message_lost);
> - printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
> -  NULL, 0, textbuf, text_len);
> - }

I really like that we are moving this out of vprintk_emit() but...

> -
>   /*
>* The printf needs to come first; we need the syslog
>* prefix which might be passed-in as a parameter.
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index af74d9c..721b091 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -52,10 +52,11 @@ struct printk_safe_seq_buf {
>  
>  static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
>  static DEFINE_PER_CPU(int, printk_safe_context);
> +static atomic_t safe_message_lost;
>  
>  #ifdef CONFIG_PRINTK_NMI
>  static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
> -atomic_t nmi_message_lost;
> +static atomic_t nmi_message_lost;
>  #endif
>  
>  static int printk_safe_log_store(struct printk_safe_seq_buf *s,
> @@ -122,6 +123,41 @@ static void printk_safe_flush_seq_line(struct 
> printk_safe_seq_buf *s,
>   printk_safe_flush_line(buf, (end - start) + 1);
>  }
>  
> +static void report_message_lost(atomic_t *num_lost, char *fmt)
> +{
> + int lost = atomic_xchg(num_lost, 0);
> +
> + if (lost) {
> + char msg[56];

I would really like to avoid a hard coded buffer size. Such things
are likely to bite us in the future.

I thought about reshuffling a lot of logic, adding more wrappers,
... But the solution might be easy in the end, see below.

> +
> + scnprintf(msg, sizeof(msg), fmt, lost);
> +
> + printk_safe_flush_line(msg, strlen(msg));

This made my brain spin a lot. I wondered if it did what we wanted
and it was safe.

On one hand, it is supposed to work because use exactly this
function in __printk_safe_flush() where you call this from.

One question is if it does what we want in different contexts.
Let's look at it:

1. It calls printk_deferred() in NMI context. There is a risk
   of a deadlock. But it is called only from
   printk_safe_flush_on_panic() which is the last resort. Therefore
   it does exactly what we want.

2. It calls printk()->printk_func()->vprintk_emit() in normal context.
   It is what we want in normal context.

3. It calls printk()->printk_func()->v printk_safe() in printk_safe
   context. This does not look correct. IMHO, this might happen
   only printk_safe_flush_on_panic() and we want to use
   printk_deferred() here as well.

   Also note that printk_safe buffer might still be full at this
   point. It means that the warning will not be stored. There also
   might be a risk of an infinite loop. Both problems are
   solved by printk_deferred().


The easiest solution would be to simply call printk_deferred()
here. Everything will be deferred after the async printk() patchset
anyway.

I would even use printk_deferred() in printk_safe_flush_line()
for each context. It is not optimal but it works very well
and it makes the code much more straightforward.


> + }
> +}
> +
> +#ifdef CONFIG_PRINTK_NMI
> +
> +static void report_nmi_message_lost(void)
> +{
> + report_message_lost(_message_lost,
> + "Lost %d message(s) from NMI context!");
> +}
> +
> +#else
> +
> +static void report_nmi_message_lost(void)
> +{
> +}
> +
> +#endif /* CONFIG_PRINTK_NMI */
> +
> +static void report_safe_message_lost(void)
> +{
> + report_message_lost(_message_lost,
> + "Lost %d message(s) from printk-safe context!");
> +}
> +
>  /*
>   * Flush data