Re: linux.git: printk() problem

2016-11-09 Thread Petr Mladek
On Mon 2016-10-24 19:22:59, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
>  wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> >  wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").
> 
> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().
> 
> So here's a new version. How does this look to you?
> 
> Again, this still tests "cont.len" outside the lock (not just in
> console_unlock(), but also in deferred_cont_flush()). And it's fine:
> even if it sees the "wrong" value due to some race, it does so either
> because cont.len was just set to non-zero (and whoever set it will
> force the re-check anyway), or it got cleared just as it was tested
> (and at worst you end up with an extra timer invocation).

This patch really seems to reduce the number of too-early flushed
continuous lines. It reduces the scattered output. But I am not sure
if we want to add a timer code into the printk calls at this stage
(for 4.9-rc5).

Well, the patch looks fine, except that we call cont_flush() without
poking console. It is not a regression because only newlines triggered
console in the past and they still do but...

I would suggest to revert the commit bfd8d3f23b51018388be
("printk: make reading the kernel log flush pending lines") for
4.9. Then we could  test/fix it properly for 4.10. Me and Sergey would
happily help with it.

Just in case, you still want to commit this patch. I would suggest
to apply the one below on top.


>From 7a0ad7ce2347346fc81872fe42a95ad5dfba4098 Mon Sep 17 00:00:00 2001
From: Petr Mladek 
Date: Tue, 25 Oct 2016 15:23:13 +0200
Subject: [PATCH] printk: Poke console and other loggers when cont buffer is
 flushed

The commit bfd8d3f23b51018388be041 ("printk: make reading the kernel
log flush pending lines") allows to add new message into the log
buffer without flushing it to the console and waking other loggers.

This patch adds wake_up_console() and calls it when the cont
buffer is flushed. The function name will make more sense once
we switch to the async printk.

Note that it is enough to poke console. The other loggers
are waken from console_unlock() when there is a new message.

The patch also renames PRINTK_PENDING flags and wake_up_klogd_work*
stuff to reduce confusion. First, there are more possible log
daemons, e.g. klogd, syslogd. Second, the word "console" is more
descriptive than "output".

This patch is based on top of the fix proposed at
https://lkml.kernel.org/r/CA+55aFwKYnrMJr_vSE+GfDGszeUGyd=cpud15-zz8ywqw61...@mail.gmail.com

Signed-off-by: Petr Mladek 
---
 kernel/printk/printk.c | 60 ++
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e63aa679614e..f0e72de6ddbc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -780,6 +780,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
 }
 
 static void deferred_cont_flush(void);
+static void __wake_up_console(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -1620,13 +1621,11 @@ static bool cont_flush(void)
 static void flush_timer(unsigned long data)
 {
unsigned long flags;
-   bool did_flush;
 
raw_spin_lock_irqsave(_lock, flags);
-   did_flush = cont_flush();
+   if (cont_flush())
+   __wake_up_console();
raw_spin_unlock_irqrestore(_lock, flags);
-   if (did_flush)
-   wake_up_klogd();
 }
 
 static void deferred_cont_flush(void)
@@ -2730,40 +2729,57 @@ static int __init printk_late_init(void)
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Handle console and wakeup loggers via IRQ work so that it can be done
+ * from any context.
  */
-#define PRINTK_PENDING_WAKEUP  0x01
-#define PRINTK_PENDING_OUTPUT  0x02
+#define PRINTK_PENDING_LOGGERS 0x01
+#define PRINTK_PENDING_CONSOLE 0x02
 
 static DEFINE_PER_CPU(int, printk_pending);
 
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
+static void printk_pending_func(struct irq_work *irq_work)
 {
int pending = __this_cpu_xchg(printk_pending, 0);
 
-   if (pending & PRINTK_PENDING_OUTPUT) {
+   if (pending & PRINTK_PENDING_CONSOLE) {
/* If trylock fails, someone else is doing the printing */
if (console_trylock())
console_unlock();
}
 
-   if 

Re: linux.git: printk() problem

2016-11-09 Thread Petr Mladek
On Mon 2016-10-24 19:22:59, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
>  wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> >  wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").
> 
> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().
> 
> So here's a new version. How does this look to you?
> 
> Again, this still tests "cont.len" outside the lock (not just in
> console_unlock(), but also in deferred_cont_flush()). And it's fine:
> even if it sees the "wrong" value due to some race, it does so either
> because cont.len was just set to non-zero (and whoever set it will
> force the re-check anyway), or it got cleared just as it was tested
> (and at worst you end up with an extra timer invocation).

This patch really seems to reduce the number of too-early flushed
continuous lines. It reduces the scattered output. But I am not sure
if we want to add a timer code into the printk calls at this stage
(for 4.9-rc5).

Well, the patch looks fine, except that we call cont_flush() without
poking console. It is not a regression because only newlines triggered
console in the past and they still do but...

I would suggest to revert the commit bfd8d3f23b51018388be
("printk: make reading the kernel log flush pending lines") for
4.9. Then we could  test/fix it properly for 4.10. Me and Sergey would
happily help with it.

Just in case, you still want to commit this patch. I would suggest
to apply the one below on top.


>From 7a0ad7ce2347346fc81872fe42a95ad5dfba4098 Mon Sep 17 00:00:00 2001
From: Petr Mladek 
Date: Tue, 25 Oct 2016 15:23:13 +0200
Subject: [PATCH] printk: Poke console and other loggers when cont buffer is
 flushed

The commit bfd8d3f23b51018388be041 ("printk: make reading the kernel
log flush pending lines") allows to add new message into the log
buffer without flushing it to the console and waking other loggers.

This patch adds wake_up_console() and calls it when the cont
buffer is flushed. The function name will make more sense once
we switch to the async printk.

Note that it is enough to poke console. The other loggers
are waken from console_unlock() when there is a new message.

The patch also renames PRINTK_PENDING flags and wake_up_klogd_work*
stuff to reduce confusion. First, there are more possible log
daemons, e.g. klogd, syslogd. Second, the word "console" is more
descriptive than "output".

This patch is based on top of the fix proposed at
https://lkml.kernel.org/r/CA+55aFwKYnrMJr_vSE+GfDGszeUGyd=cpud15-zz8ywqw61...@mail.gmail.com

Signed-off-by: Petr Mladek 
---
 kernel/printk/printk.c | 60 ++
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e63aa679614e..f0e72de6ddbc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -780,6 +780,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
 }
 
 static void deferred_cont_flush(void);
+static void __wake_up_console(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -1620,13 +1621,11 @@ static bool cont_flush(void)
 static void flush_timer(unsigned long data)
 {
unsigned long flags;
-   bool did_flush;
 
raw_spin_lock_irqsave(_lock, flags);
-   did_flush = cont_flush();
+   if (cont_flush())
+   __wake_up_console();
raw_spin_unlock_irqrestore(_lock, flags);
-   if (did_flush)
-   wake_up_klogd();
 }
 
 static void deferred_cont_flush(void)
@@ -2730,40 +2729,57 @@ static int __init printk_late_init(void)
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Handle console and wakeup loggers via IRQ work so that it can be done
+ * from any context.
  */
-#define PRINTK_PENDING_WAKEUP  0x01
-#define PRINTK_PENDING_OUTPUT  0x02
+#define PRINTK_PENDING_LOGGERS 0x01
+#define PRINTK_PENDING_CONSOLE 0x02
 
 static DEFINE_PER_CPU(int, printk_pending);
 
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
+static void printk_pending_func(struct irq_work *irq_work)
 {
int pending = __this_cpu_xchg(printk_pending, 0);
 
-   if (pending & PRINTK_PENDING_OUTPUT) {
+   if (pending & PRINTK_PENDING_CONSOLE) {
/* If trylock fails, someone else is doing the printing */
if (console_trylock())
console_unlock();
}
 
-   if (pending & PRINTK_PENDING_WAKEUP)
+   if (pending & PRINTK_PENDING_LOGGERS)

Re: linux.git: printk() problem

2016-10-25 Thread Petr Mladek
On Mon 2016-10-24 19:22:59, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
>  wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> >  wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").
> 
> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().
> 
> So here's a new version. How does this look to you?
> 
> Again, this still tests "cont.len" outside the lock (not just in
> console_unlock(), but also in deferred_cont_flush()). And it's fine:
> even if it sees the "wrong" value due to some race, it does so either
> because cont.len was just set to non-zero (and whoever set it will
> force the re-check anyway), or it got cleared just as it was tested
> (and at worst you end up with an extra timer invocation).
> 
> +static void flush_timer(unsigned long data)
> +{
> + unsigned long flags;
> + bool did_flush;
> +
> + raw_spin_lock_irqsave(_lock, flags);
> + did_flush = cont_flush();
> + raw_spin_unlock_irqrestore(_lock, flags);
> + if (did_flush)
> + wake_up_klogd();

We wake only klogd and syslog but not console here. Same problem is
also with some other recently added cont_flush() calls, e.g. in
kmsg_dump(), kmsg_dump_get_line().

BTW: It should be safe to call wake_up_klogd() in the locked area.
It just modifies a per-CPU variable and queues IRQ work.

I have played with it a bit and the result is below. Feel free
to just use the pieces or the idea if you like a part of it.


>From 226c4bfbe9f81bdd3b91b6b0ff3e268b1c09085f Mon Sep 17 00:00:00 2001
From: Petr Mladek 
Date: Tue, 25 Oct 2016 15:23:13 +0200
Subject: [PATCH] printk: Poke console when cont buffer is flushed

The commit bfd8d3f23b51018388be041 ("printk: make reading the kernel
log flush pending lines") allows to add new message into the log
buffer without flushing it to the console.

This patch reworks wake_up_klogd() to a generic printk_poke().
It pokes the loggers or console or both where necessary.

I have renamed some some variables and functions. I am not
if it is better readable now. Anyway, the old names came
from times before printk_deferred() when the irq work
handled only klogd.

I am not super happy the new state because the irq work is
called from many locations and there are cycles.
The cycle breaks when cont.len is zero but it might
be worth some clean up. For example, deferred_cont_flush()
call in printk_poke_func() looks superfluous. But
I wanted to be rather conservative for the moment.

Also I wonder if we really need per-CPU variable for
the printk_pending/printk_need_poke flag. A global
variable might help to reduce some calls and it would
not complicate the code.

This patch is based on top of the fix proposed at
https://lkml.kernel.org/r/CA+55aFwKYnrMJr_vSE+GfDGszeUGyd=cpud15-zz8ywqw61...@mail.gmail.com

Signed-off-by: Petr Mladek 
---
 include/linux/printk.h |  7 --
 kernel/printk/printk.c | 67 --
 lib/bust_spinlocks.c   |  2 +-
 3 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index eac1af8502bb..6fdced13d2c9 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -31,6 +31,9 @@ static inline const char *printk_skip_level(const char 
*buffer)
return buffer;
 }
 
+#define PRINTK_POKE_CONSOLE0x01
+#define PRINTK_POKE_LOGGERS0x02
+
 #define CONSOLE_EXT_LOG_MAX8192
 
 /* printk's without a loglevel use this.. */
@@ -185,7 +188,7 @@ extern bool printk_timed_ratelimit(unsigned long 
*caller_jiffies,
 devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void __user *buf,
  size_t *lenp, loff_t *ppos);
 
-extern void wake_up_klogd(void);
+extern void printk_poke(int poke);
 
 char *log_buf_addr_get(void);
 u32 log_buf_len_get(void);
@@ -220,7 +223,7 @@ static inline bool printk_timed_ratelimit(unsigned long 
*caller_jiffies,
return false;
 }
 
-static inline void wake_up_klogd(void)
+static inline void printk_poke(int poke)
 {
 }
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e63aa679614e..2c231bb73744 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1620,13 +1620,11 @@ static bool cont_flush(void)
 static void flush_timer(unsigned long data)
 {
unsigned long flags;
-   bool did_flush;
 
raw_spin_lock_irqsave(_lock, flags);
-   did_flush = cont_flush();
+   if (cont_flush())
+   printk_poke(PRINTK_POKE_CONSOLE | PRINTK_POKE_LOGGERS);
   

Re: linux.git: printk() problem

2016-10-25 Thread Petr Mladek
On Mon 2016-10-24 19:22:59, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
>  wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> >  wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").
> 
> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().
> 
> So here's a new version. How does this look to you?
> 
> Again, this still tests "cont.len" outside the lock (not just in
> console_unlock(), but also in deferred_cont_flush()). And it's fine:
> even if it sees the "wrong" value due to some race, it does so either
> because cont.len was just set to non-zero (and whoever set it will
> force the re-check anyway), or it got cleared just as it was tested
> (and at worst you end up with an extra timer invocation).
> 
> +static void flush_timer(unsigned long data)
> +{
> + unsigned long flags;
> + bool did_flush;
> +
> + raw_spin_lock_irqsave(_lock, flags);
> + did_flush = cont_flush();
> + raw_spin_unlock_irqrestore(_lock, flags);
> + if (did_flush)
> + wake_up_klogd();

We wake only klogd and syslog but not console here. Same problem is
also with some other recently added cont_flush() calls, e.g. in
kmsg_dump(), kmsg_dump_get_line().

BTW: It should be safe to call wake_up_klogd() in the locked area.
It just modifies a per-CPU variable and queues IRQ work.

I have played with it a bit and the result is below. Feel free
to just use the pieces or the idea if you like a part of it.


>From 226c4bfbe9f81bdd3b91b6b0ff3e268b1c09085f Mon Sep 17 00:00:00 2001
From: Petr Mladek 
Date: Tue, 25 Oct 2016 15:23:13 +0200
Subject: [PATCH] printk: Poke console when cont buffer is flushed

The commit bfd8d3f23b51018388be041 ("printk: make reading the kernel
log flush pending lines") allows to add new message into the log
buffer without flushing it to the console.

This patch reworks wake_up_klogd() to a generic printk_poke().
It pokes the loggers or console or both where necessary.

I have renamed some some variables and functions. I am not
if it is better readable now. Anyway, the old names came
from times before printk_deferred() when the irq work
handled only klogd.

I am not super happy the new state because the irq work is
called from many locations and there are cycles.
The cycle breaks when cont.len is zero but it might
be worth some clean up. For example, deferred_cont_flush()
call in printk_poke_func() looks superfluous. But
I wanted to be rather conservative for the moment.

Also I wonder if we really need per-CPU variable for
the printk_pending/printk_need_poke flag. A global
variable might help to reduce some calls and it would
not complicate the code.

This patch is based on top of the fix proposed at
https://lkml.kernel.org/r/CA+55aFwKYnrMJr_vSE+GfDGszeUGyd=cpud15-zz8ywqw61...@mail.gmail.com

Signed-off-by: Petr Mladek 
---
 include/linux/printk.h |  7 --
 kernel/printk/printk.c | 67 --
 lib/bust_spinlocks.c   |  2 +-
 3 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index eac1af8502bb..6fdced13d2c9 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -31,6 +31,9 @@ static inline const char *printk_skip_level(const char 
*buffer)
return buffer;
 }
 
+#define PRINTK_POKE_CONSOLE0x01
+#define PRINTK_POKE_LOGGERS0x02
+
 #define CONSOLE_EXT_LOG_MAX8192
 
 /* printk's without a loglevel use this.. */
@@ -185,7 +188,7 @@ extern bool printk_timed_ratelimit(unsigned long 
*caller_jiffies,
 devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void __user *buf,
  size_t *lenp, loff_t *ppos);
 
-extern void wake_up_klogd(void);
+extern void printk_poke(int poke);
 
 char *log_buf_addr_get(void);
 u32 log_buf_len_get(void);
@@ -220,7 +223,7 @@ static inline bool printk_timed_ratelimit(unsigned long 
*caller_jiffies,
return false;
 }
 
-static inline void wake_up_klogd(void)
+static inline void printk_poke(int poke)
 {
 }
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e63aa679614e..2c231bb73744 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1620,13 +1620,11 @@ static bool cont_flush(void)
 static void flush_timer(unsigned long data)
 {
unsigned long flags;
-   bool did_flush;
 
raw_spin_lock_irqsave(_lock, flags);
-   did_flush = cont_flush();
+   if (cont_flush())
+   printk_poke(PRINTK_POKE_CONSOLE | PRINTK_POKE_LOGGERS);
raw_spin_unlock_irqrestore(_lock, flags);
-   if (did_flush)
-   

Re: linux.git: printk() problem

2016-10-25 Thread Steven Rostedt
On Sun, Oct 23, 2016 at 11:11:18AM -0700, Linus Torvalds wrote:
> 
> For example, one of the really historical uses for partial lines is this:
> 
>pr_info("Checking 'hlt' instruction... ");
> 
>if (!boot_cpu_data.hlt_works_ok) {
>pr_cont("disabled\n");
>return;
>}
>halt();
>halt();
>halt();
>halt();
>pr_cont("OK\n");
> 
> and the point was that there used to be some really old i386 machines
> that hung on the "hlt" instruction (probably not because of a CPU bug,
> but because of either power supply issues or some DMA issues).
> 
> To support that, we really *had* to print out the continuation lines
> even when they were partial. And that complicates the printk logic a
> lot.

Note, my ftrace start up tests still does exactly this (e.g.):

pr_info("Testing dynamic ftrace ops: #%d: ", cnt);

[ do lots of testing ]

printk(KERN_CONT "PASSED\n");

Previously a change was made to buffer lines without \n, and I wasted an
entire day debugging why a crash happened because I was looking at the wrong
test.

-- Steve

> 
> Now, that "hlt" case is long long gone, and maybe we should just say
> "screw that". It would be really quite easy to say "we don't print out
> continuation lines immediately, we just buffer them for 0.1s instead,
> and KERN_CONT only works for things that really happen more or less
> immediately".
> 
> Maybe that really is the right answer. Because the original cause of
> us having to bend over backwards in this case is really no longer
> there. And it would simplify printk a *lot*.
> 
> Let me whip up a minimal patch for you to try.


Re: linux.git: printk() problem

2016-10-25 Thread Steven Rostedt
On Sun, Oct 23, 2016 at 11:11:18AM -0700, Linus Torvalds wrote:
> 
> For example, one of the really historical uses for partial lines is this:
> 
>pr_info("Checking 'hlt' instruction... ");
> 
>if (!boot_cpu_data.hlt_works_ok) {
>pr_cont("disabled\n");
>return;
>}
>halt();
>halt();
>halt();
>halt();
>pr_cont("OK\n");
> 
> and the point was that there used to be some really old i386 machines
> that hung on the "hlt" instruction (probably not because of a CPU bug,
> but because of either power supply issues or some DMA issues).
> 
> To support that, we really *had* to print out the continuation lines
> even when they were partial. And that complicates the printk logic a
> lot.

Note, my ftrace start up tests still does exactly this (e.g.):

pr_info("Testing dynamic ftrace ops: #%d: ", cnt);

[ do lots of testing ]

printk(KERN_CONT "PASSED\n");

Previously a change was made to buffer lines without \n, and I wasted an
entire day debugging why a crash happened because I was looking at the wrong
test.

-- Steve

> 
> Now, that "hlt" case is long long gone, and maybe we should just say
> "screw that". It would be really quite easy to say "we don't print out
> continuation lines immediately, we just buffer them for 0.1s instead,
> and KERN_CONT only works for things that really happen more or less
> immediately".
> 
> Maybe that really is the right answer. Because the original cause of
> us having to bend over backwards in this case is really no longer
> there. And it would simplify printk a *lot*.
> 
> Let me whip up a minimal patch for you to try.


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
On (10/24/16 21:15), Linus Torvalds wrote:
[..]
> No. Most cont lines never hit the delay, because when the line is
> completed, it is flushed (and then printed synchronously, assuming it
> can get the console lock).
> 
> So the timeout only ever comes into effect if the line isn't completed
> in time at all. Which is actually very rare, and never happens for the
> "let's print things out in multiple chinks because we're using a
> loop".
> 
> Similarly, if a new printk() happens due to interleaving, the previous
> buffered line is always flushed first, so buffering never causes
> out-of-order behavior.

thanks. the patch works fine on my x86 box.

> Basically, the only time the timer actually does anything is if
> something just does a printk() without a newline, and no other
> printouts happen for the next 0.1s.

ok. perhaps, like slow serial console. will test on arm board later.

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
On (10/24/16 21:15), Linus Torvalds wrote:
[..]
> No. Most cont lines never hit the delay, because when the line is
> completed, it is flushed (and then printed synchronously, assuming it
> can get the console lock).
> 
> So the timeout only ever comes into effect if the line isn't completed
> in time at all. Which is actually very rare, and never happens for the
> "let's print things out in multiple chinks because we're using a
> loop".
> 
> Similarly, if a new printk() happens due to interleaving, the previous
> buffered line is always flushed first, so buffering never causes
> out-of-order behavior.

thanks. the patch works fine on my x86 box.

> Basically, the only time the timer actually does anything is if
> something just does a printk() without a newline, and no other
> printouts happen for the next 0.1s.

ok. perhaps, like slow serial console. will test on arm board later.

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 9:06 PM, Sergey Senozhatsky
 wrote:
>
> 1) the way we dumpstack on x86 (at least on x86) is a spaghetti of
> printk() and pr_cont() calls. for instance, arch/x86/kernel/dumpstack_64.c
> show_regs() does pr_cont() to print out the registers, while the stack and
> backtrace are printed with printk(). so, I assume, the backtrace now will
> look a bit upside-down, because cont lines are printed with the delay.
> correct?

No. Most cont lines never hit the delay, because when the line is
completed, it is flushed (and then printed synchronously, assuming it
can get the console lock).

So the timeout only ever comes into effect if the line isn't completed
in time at all. Which is actually very rare, and never happens for the
"let's print things out in multiple chinks because we're using a
loop".

Similarly, if a new printk() happens due to interleaving, the previous
buffered line is always flushed first, so buffering never causes
out-of-order behavior.

Basically, the only time the timer actually does anything is if
something just does a printk() without a newline, and no other
printouts happen for the next 0.1s.

> 2) flush on oops.

Again, really not an issue for the exact reason above: nothing is ever
buffered when something new is printed.

And all you need to guarantee that last line of the oops itself is
printed is that it has a newline.

So again, the timer only matters for the exceptional case, not for the
normal situation. It's literally there to guarantee basic timeliness.

 Linus


Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 9:06 PM, Sergey Senozhatsky
 wrote:
>
> 1) the way we dumpstack on x86 (at least on x86) is a spaghetti of
> printk() and pr_cont() calls. for instance, arch/x86/kernel/dumpstack_64.c
> show_regs() does pr_cont() to print out the registers, while the stack and
> backtrace are printed with printk(). so, I assume, the backtrace now will
> look a bit upside-down, because cont lines are printed with the delay.
> correct?

No. Most cont lines never hit the delay, because when the line is
completed, it is flushed (and then printed synchronously, assuming it
can get the console lock).

So the timeout only ever comes into effect if the line isn't completed
in time at all. Which is actually very rare, and never happens for the
"let's print things out in multiple chinks because we're using a
loop".

Similarly, if a new printk() happens due to interleaving, the previous
buffered line is always flushed first, so buffering never causes
out-of-order behavior.

Basically, the only time the timer actually does anything is if
something just does a printk() without a newline, and no other
printouts happen for the next 0.1s.

> 2) flush on oops.

Again, really not an issue for the exact reason above: nothing is ever
buffered when something new is printed.

And all you need to guarantee that last line of the oops itself is
printed is that it has a newline.

So again, the timer only matters for the exceptional case, not for the
normal situation. It's literally there to guarantee basic timeliness.

 Linus


Re: linux.git: printk() problem

2016-10-24 Thread Joe Perches
On Tue, 2016-10-25 at 13:06 +0900, Sergey Senozhatsky wrote:
>   so how about skipping mod_timer in deferred_cont_flush() and just
>   cont_flush() when we are in oops? here is probably one more thing we
>   need to "fix" first. oops_in_progress is unreliable. x86 oops_end()
>   does bust_spinlocks(0) before it calls panic(). panic() increments
>   oops_in_progress but decrements it back to 0 (bust_spinlocks(0)) before
>   it does console_flush_on_panic(). so there is (almost) no way
>   console_flush_on_panic() can see oops_in_progress != 0.

I think Linus' insistence on not requiring EOL/newline is misguided.
It'd be simpler to avoid the flushes and have a require EOL/newline.



Re: linux.git: printk() problem

2016-10-24 Thread Joe Perches
On Tue, 2016-10-25 at 13:06 +0900, Sergey Senozhatsky wrote:
>   so how about skipping mod_timer in deferred_cont_flush() and just
>   cont_flush() when we are in oops? here is probably one more thing we
>   need to "fix" first. oops_in_progress is unreliable. x86 oops_end()
>   does bust_spinlocks(0) before it calls panic(). panic() increments
>   oops_in_progress but decrements it back to 0 (bust_spinlocks(0)) before
>   it does console_flush_on_panic(). so there is (almost) no way
>   console_flush_on_panic() can see oops_in_progress != 0.

I think Linus' insistence on not requiring EOL/newline is misguided.
It'd be simpler to avoid the flushes and have a require EOL/newline.



Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
On (10/24/16 19:22), Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
>  wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> >  wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").

right. my bad, realized too late.

> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().

yes.

> So here's a new version. How does this look to you?

ok, looks much better.

there are several things that I want to mention here, just to make
sure we don't miss anything (just my 5 cents).

1) the way we dumpstack on x86 (at least on x86) is a spaghetti of
printk() and pr_cont() calls. for instance, arch/x86/kernel/dumpstack_64.c
show_regs() does pr_cont() to print out the registers, while the stack and
backtrace are printed with printk(). so, I assume, the backtrace now will
look a bit upside-down, because cont lines are printed with the delay.
correct?

2) flush on oops. not that panic printk is deadlock proof (not at all)
   but:
 a) rather unlikely, but what if BUG_ON() or panic() happens
  under lock_timer_base()?
 b) what if timer event never happens? (we are in panic, who knows)

  so how about skipping mod_timer in deferred_cont_flush() and just
  cont_flush() when we are in oops? here is probably one more thing we
  need to "fix" first. oops_in_progress is unreliable. x86 oops_end()
  does bust_spinlocks(0) before it calls panic(). panic() increments
  oops_in_progress but decrements it back to 0 (bust_spinlocks(0)) before
  it does console_flush_on_panic(). so there is (almost) no way
  console_flush_on_panic() can see oops_in_progress != 0.


diff --git a/kernel/panic.c b/kernel/panic.c
index e6480e2..8e17540 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -228,7 +228,6 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
 
-   bust_spinlocks(0);
 
/*
 * We may have ended up stopping the CPU holding the lock (in
@@ -240,6 +239,7 @@ void panic(const char *fmt, ...)
 */
debug_locks_off();
console_flush_on_panic();
+   bust_spinlocks(0);
 
if (!panic_blink)
panic_blink = no_blink;
---

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
On (10/24/16 19:22), Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
>  wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> >  wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").

right. my bad, realized too late.

> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().

yes.

> So here's a new version. How does this look to you?

ok, looks much better.

there are several things that I want to mention here, just to make
sure we don't miss anything (just my 5 cents).

1) the way we dumpstack on x86 (at least on x86) is a spaghetti of
printk() and pr_cont() calls. for instance, arch/x86/kernel/dumpstack_64.c
show_regs() does pr_cont() to print out the registers, while the stack and
backtrace are printed with printk(). so, I assume, the backtrace now will
look a bit upside-down, because cont lines are printed with the delay.
correct?

2) flush on oops. not that panic printk is deadlock proof (not at all)
   but:
 a) rather unlikely, but what if BUG_ON() or panic() happens
  under lock_timer_base()?
 b) what if timer event never happens? (we are in panic, who knows)

  so how about skipping mod_timer in deferred_cont_flush() and just
  cont_flush() when we are in oops? here is probably one more thing we
  need to "fix" first. oops_in_progress is unreliable. x86 oops_end()
  does bust_spinlocks(0) before it calls panic(). panic() increments
  oops_in_progress but decrements it back to 0 (bust_spinlocks(0)) before
  it does console_flush_on_panic(). so there is (almost) no way
  console_flush_on_panic() can see oops_in_progress != 0.


diff --git a/kernel/panic.c b/kernel/panic.c
index e6480e2..8e17540 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -228,7 +228,6 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
 
-   bust_spinlocks(0);
 
/*
 * We may have ended up stopping the CPU holding the lock (in
@@ -240,6 +239,7 @@ void panic(const char *fmt, ...)
 */
debug_locks_off();
console_flush_on_panic();
+   bust_spinlocks(0);
 
if (!panic_blink)
panic_blink = no_blink;
---

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
On (10/25/16 10:55), Sergey Senozhatsky wrote:
> I think cont_flush() should grab the logbuf_lock lock, because
> it does log_store() and touches the cont.len. so something like
> this perhaps
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c7f490f..47f887c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1608,13 +1608,20 @@ static struct cont {
>  
>  static bool cont_flush(void)
>  {
> +   unsigned long flags;
> +   bool flushed = false;
> +
> +   raw_spin_lock_irqsave(_lock, flags);
> if (!cont.len)
> -   return false;
> +   goto out;
>  
> log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
>   NULL, 0, cont.buf, cont.len);
> cont.len = 0;
> -   return true;
> +   flushed = true;
> +out:
> +   raw_spin_unlock_irqrestore(_lock, flags);
> +   return flushed;
>  }


... clearly, wasn't tested at all! sorry about that.
what I meant was cont_flush() from deferred_cont_flush()->flush_timer().


diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7f490f..bd7841c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1619,7 +1619,14 @@ static bool cont_flush(void)
 
 static void flush_timer(unsigned long data)
 {
-   if (cont_flush())
+   unsigned long flags;
+   bool flushed;
+
+   raw_spin_lock_irqsave(_lock, flags);
+   flushed = cont_flush();
+   raw_spin_unlock_irqrestore(_lock, flags);
+
+   if (flushed)
wake_up_klogd();
 }

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
On (10/25/16 10:55), Sergey Senozhatsky wrote:
> I think cont_flush() should grab the logbuf_lock lock, because
> it does log_store() and touches the cont.len. so something like
> this perhaps
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c7f490f..47f887c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1608,13 +1608,20 @@ static struct cont {
>  
>  static bool cont_flush(void)
>  {
> +   unsigned long flags;
> +   bool flushed = false;
> +
> +   raw_spin_lock_irqsave(_lock, flags);
> if (!cont.len)
> -   return false;
> +   goto out;
>  
> log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
>   NULL, 0, cont.buf, cont.len);
> cont.len = 0;
> -   return true;
> +   flushed = true;
> +out:
> +   raw_spin_unlock_irqrestore(_lock, flags);
> +   return flushed;
>  }


... clearly, wasn't tested at all! sorry about that.
what I meant was cont_flush() from deferred_cont_flush()->flush_timer().


diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7f490f..bd7841c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1619,7 +1619,14 @@ static bool cont_flush(void)
 
 static void flush_timer(unsigned long data)
 {
-   if (cont_flush())
+   unsigned long flags;
+   bool flushed;
+
+   raw_spin_lock_irqsave(_lock, flags);
+   flushed = cont_flush();
+   raw_spin_unlock_irqrestore(_lock, flags);
+
+   if (flushed)
wake_up_klogd();
 }

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
 wrote:
> On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
>  wrote:
>>
>> I think cont_flush() should grab the logbuf_lock lock, because
>> it does log_store() and touches the cont.len. so something like
>> this perhaps
>
> Absolutely. Good catch.

Actually, you can't do it the way you did (inside cont_flush), because
"cont_flush()" is already called with logbuf_lock held in most cases
(see "cont_add()").

So it's really just the timer function that needs to take the
logbuf_lock before it calls cont_flush().

So here's a new version. How does this look to you?

Again, this still tests "cont.len" outside the lock (not just in
console_unlock(), but also in deferred_cont_flush()). And it's fine:
even if it sees the "wrong" value due to some race, it does so either
because cont.len was just set to non-zero (and whoever set it will
force the re-check anyway), or it got cleared just as it was tested
(and at worst you end up with an extra timer invocation).

  Linus
 kernel/printk/printk.c | 255 +++--
 1 file changed, 58 insertions(+), 197 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..e63aa679614e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -367,7 +367,6 @@ DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
-static enum log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
@@ -381,7 +380,6 @@ static u32 log_next_idx;
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
@@ -650,27 +648,15 @@ static void append_char(char **pp, char *e, char c)
 }
 
 static ssize_t msg_print_ext_header(char *buf, size_t size,
-   struct printk_log *msg, u64 seq,
-   enum log_flags prev_flags)
+   struct printk_log *msg, u64 seq)
 {
u64 ts_usec = msg->ts_nsec;
-   char cont = '-';
 
do_div(ts_usec, 1000);
 
-   /*
-* If we couldn't merge continuation line fragments during the print,
-* export the stored flags to allow an optional external merge of the
-* records. Merging the records isn't always neccessarily correct, like
-* when we hit a race during printing. In most cases though, it produces
-* better readable output. 'c' in the record flags mark the first
-* fragment of a line, '+' the following.
-*/
-   if (msg->flags & LOG_CONT)
-   cont = (prev_flags & LOG_CONT) ? '+' : 'c';
-
return scnprintf(buf, size, "%u,%llu,%llu,%c;",
-  (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+  (msg->facility << 3) | msg->level, seq, ts_usec,
+  msg->flags & LOG_CONT ? 'c' : '-');
 }
 
 static ssize_t msg_print_ext_body(char *buf, size_t size,
@@ -725,7 +711,6 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 struct devkmsg_user {
u64 seq;
u32 idx;
-   enum log_flags prev;
struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
@@ -794,7 +779,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -811,7 +796,6 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (ret)
return ret;
raw_spin_lock_irq(_lock);
-   cont_flush();
while (user->seq == log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
@@ -838,12 +822,11 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
 
msg = log_from_idx(user->idx);
len = msg_print_ext_header(user->buf, sizeof(user->buf),
-  msg, user->seq, user->prev);
+  msg, user->seq);
len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
  log_dict(msg), msg->dict_len,
  log_text(msg), msg->text_len);
 
-   user->prev = msg->flags;
user->idx = log_next(user->idx);
user->seq++;
raw_spin_unlock_irq(_lock);
@@ -860,6 +843,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
ret = len;
 out:

Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
 wrote:
> On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
>  wrote:
>>
>> I think cont_flush() should grab the logbuf_lock lock, because
>> it does log_store() and touches the cont.len. so something like
>> this perhaps
>
> Absolutely. Good catch.

Actually, you can't do it the way you did (inside cont_flush), because
"cont_flush()" is already called with logbuf_lock held in most cases
(see "cont_add()").

So it's really just the timer function that needs to take the
logbuf_lock before it calls cont_flush().

So here's a new version. How does this look to you?

Again, this still tests "cont.len" outside the lock (not just in
console_unlock(), but also in deferred_cont_flush()). And it's fine:
even if it sees the "wrong" value due to some race, it does so either
because cont.len was just set to non-zero (and whoever set it will
force the re-check anyway), or it got cleared just as it was tested
(and at worst you end up with an extra timer invocation).

  Linus
 kernel/printk/printk.c | 255 +++--
 1 file changed, 58 insertions(+), 197 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..e63aa679614e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -367,7 +367,6 @@ DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
-static enum log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
@@ -381,7 +380,6 @@ static u32 log_next_idx;
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
@@ -650,27 +648,15 @@ static void append_char(char **pp, char *e, char c)
 }
 
 static ssize_t msg_print_ext_header(char *buf, size_t size,
-   struct printk_log *msg, u64 seq,
-   enum log_flags prev_flags)
+   struct printk_log *msg, u64 seq)
 {
u64 ts_usec = msg->ts_nsec;
-   char cont = '-';
 
do_div(ts_usec, 1000);
 
-   /*
-* If we couldn't merge continuation line fragments during the print,
-* export the stored flags to allow an optional external merge of the
-* records. Merging the records isn't always neccessarily correct, like
-* when we hit a race during printing. In most cases though, it produces
-* better readable output. 'c' in the record flags mark the first
-* fragment of a line, '+' the following.
-*/
-   if (msg->flags & LOG_CONT)
-   cont = (prev_flags & LOG_CONT) ? '+' : 'c';
-
return scnprintf(buf, size, "%u,%llu,%llu,%c;",
-  (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+  (msg->facility << 3) | msg->level, seq, ts_usec,
+  msg->flags & LOG_CONT ? 'c' : '-');
 }
 
 static ssize_t msg_print_ext_body(char *buf, size_t size,
@@ -725,7 +711,6 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 struct devkmsg_user {
u64 seq;
u32 idx;
-   enum log_flags prev;
struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
@@ -794,7 +779,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -811,7 +796,6 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (ret)
return ret;
raw_spin_lock_irq(_lock);
-   cont_flush();
while (user->seq == log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
@@ -838,12 +822,11 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
 
msg = log_from_idx(user->idx);
len = msg_print_ext_header(user->buf, sizeof(user->buf),
-  msg, user->seq, user->prev);
+  msg, user->seq);
len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
  log_dict(msg), msg->dict_len,
  log_text(msg), msg->text_len);
 
-   user->prev = msg->flags;
user->idx = log_next(user->idx);
user->seq++;
raw_spin_unlock_irq(_lock);
@@ -860,6 +843,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
ret = len;
 out:
mutex_unlock(>lock);
+   deferred_cont_flush();
return ret;
 }
 

Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
 wrote:
>
> I think cont_flush() should grab the logbuf_lock lock, because
> it does log_store() and touches the cont.len. so something like
> this perhaps

Absolutely. Good catch.

>> - if (wake_klogd)
>> + if (wake_klogd || cont.len)
> ^^
> this _technically_ can result in additional spurious wakeups - cont.len
> check is done outside of console_sem && logbuf_lock - but I don't think
> this is a huge problem.

Yes. We could easily just do it inside the spinlock and make it part
of the wake_klogd logic, but as you say, it doesn't actually matter.

And yes, it would be lovely if people who use the network console
(which I think is the only user of the whole extended console thing)
would test this too.

It really makes things potentially much simpler. It's not just that it
removes 150 lines, it's 150 lines of  really nasty complex special
case crud.

I'm obviously not going to apply it for 4.9, but it might be "do it
for 4.10 and mark it for stable if nobody notices any issues", because
apart from the code simplification it also gets more cases "right".
Right now the console flushing often means that we end up not merging
continuation lines as much as we could.

   Linus


Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
 wrote:
>
> I think cont_flush() should grab the logbuf_lock lock, because
> it does log_store() and touches the cont.len. so something like
> this perhaps

Absolutely. Good catch.

>> - if (wake_klogd)
>> + if (wake_klogd || cont.len)
> ^^
> this _technically_ can result in additional spurious wakeups - cont.len
> check is done outside of console_sem && logbuf_lock - but I don't think
> this is a huge problem.

Yes. We could easily just do it inside the spinlock and make it part
of the wake_klogd logic, but as you say, it doesn't actually matter.

And yes, it would be lovely if people who use the network console
(which I think is the only user of the whole extended console thing)
would test this too.

It really makes things potentially much simpler. It's not just that it
removes 150 lines, it's 150 lines of  really nasty complex special
case crud.

I'm obviously not going to apply it for 4.9, but it might be "do it
for 4.10 and mark it for stable if nobody notices any issues", because
apart from the code simplification it also gets more cases "right".
Right now the console flushing often means that we end up not merging
continuation lines as much as we could.

   Linus


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
Hello,

Cc more people

report: https://marc.info/?l=linux-kernel=147721454506634=2

patch: https://marc.info/?l=linux-kernel=147733173800859


FB is using ext header a lot (afaik), so may be Tejun or Calvin
will also be interested.

On (10/24/16 10:55), Linus Torvalds wrote:
> > Note the "totally untested" part. It compiles for me. But it may do
> > unspeakably stupid things. Caveat applior.
> 
> Well, it is hard to apply a patch that I didn't even attach. Blush.

[..]

> -static void cont_flush(void)
> +static bool cont_flush(void)
>  {
> - if (cont.flushed)
> - return;
> - if (cont.len == 0)
> + if (!cont.len)
> + return false;
> +
> + log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
> +   NULL, 0, cont.buf, cont.len);
> + cont.len = 0;
> + return true;
> +}


I think cont_flush() should grab the logbuf_lock lock, because
it does log_store() and touches the cont.len. so something like
this perhaps

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7f490f..47f887c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1608,13 +1608,20 @@ static struct cont {
 
 static bool cont_flush(void)
 {
+   unsigned long flags;
+   bool flushed = false;
+
+   raw_spin_lock_irqsave(_lock, flags);
if (!cont.len)
-   return false;
+   goto out;
 
log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
  NULL, 0, cont.buf, cont.len);
cont.len = 0;
-   return true;
+   flushed = true;
+out:
+   raw_spin_unlock_irqrestore(_lock, flags);
+   return flushed;
 }


[..]
> @@ -2449,7 +2311,6 @@ void console_unlock(void)
>   }
>   console_idx = log_next(console_idx);
>   console_seq++;
> - console_prev = msg->flags;
>   raw_spin_unlock(_lock);
>  
>   stop_critical_timings();/* don't trace print latency */
> @@ -2483,7 +2344,7 @@ void console_unlock(void)
>   if (retry && console_trylock())
>   goto again;
>  
> - if (wake_klogd)
> + if (wake_klogd || cont.len)
^^
this _technically_ can result in additional spurious wakeups - cont.len
check is done outside of console_sem && logbuf_lock - but I don't think
this is a huge problem.

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
Hello,

Cc more people

report: https://marc.info/?l=linux-kernel=147721454506634=2

patch: https://marc.info/?l=linux-kernel=147733173800859


FB is using ext header a lot (afaik), so may be Tejun or Calvin
will also be interested.

On (10/24/16 10:55), Linus Torvalds wrote:
> > Note the "totally untested" part. It compiles for me. But it may do
> > unspeakably stupid things. Caveat applior.
> 
> Well, it is hard to apply a patch that I didn't even attach. Blush.

[..]

> -static void cont_flush(void)
> +static bool cont_flush(void)
>  {
> - if (cont.flushed)
> - return;
> - if (cont.len == 0)
> + if (!cont.len)
> + return false;
> +
> + log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
> +   NULL, 0, cont.buf, cont.len);
> + cont.len = 0;
> + return true;
> +}


I think cont_flush() should grab the logbuf_lock lock, because
it does log_store() and touches the cont.len. so something like
this perhaps

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7f490f..47f887c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1608,13 +1608,20 @@ static struct cont {
 
 static bool cont_flush(void)
 {
+   unsigned long flags;
+   bool flushed = false;
+
+   raw_spin_lock_irqsave(_lock, flags);
if (!cont.len)
-   return false;
+   goto out;
 
log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
  NULL, 0, cont.buf, cont.len);
cont.len = 0;
-   return true;
+   flushed = true;
+out:
+   raw_spin_unlock_irqrestore(_lock, flags);
+   return flushed;
 }


[..]
> @@ -2449,7 +2311,6 @@ void console_unlock(void)
>   }
>   console_idx = log_next(console_idx);
>   console_seq++;
> - console_prev = msg->flags;
>   raw_spin_unlock(_lock);
>  
>   stop_critical_timings();/* don't trace print latency */
> @@ -2483,7 +2344,7 @@ void console_unlock(void)
>   if (retry && console_trylock())
>   goto again;
>  
> - if (wake_klogd)
> + if (wake_klogd || cont.len)
^^
this _technically_ can result in additional spurious wakeups - cont.len
check is done outside of console_sem && logbuf_lock - but I don't think
this is a huge problem.

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 10:54 AM, Linus Torvalds
 wrote:
>
> Note the "totally untested" part. It compiles for me. But it may do
> unspeakably stupid things. Caveat applior.

Well, it is hard to apply a patch that I didn't even attach. Blush.

   Linus
 kernel/printk/printk.c | 249 +++--
 1 file changed, 52 insertions(+), 197 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..6df4d3f6c276 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -367,7 +367,6 @@ DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
-static enum log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
@@ -381,7 +380,6 @@ static u32 log_next_idx;
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
@@ -650,27 +648,15 @@ static void append_char(char **pp, char *e, char c)
 }
 
 static ssize_t msg_print_ext_header(char *buf, size_t size,
-   struct printk_log *msg, u64 seq,
-   enum log_flags prev_flags)
+   struct printk_log *msg, u64 seq)
 {
u64 ts_usec = msg->ts_nsec;
-   char cont = '-';
 
do_div(ts_usec, 1000);
 
-   /*
-* If we couldn't merge continuation line fragments during the print,
-* export the stored flags to allow an optional external merge of the
-* records. Merging the records isn't always neccessarily correct, like
-* when we hit a race during printing. In most cases though, it produces
-* better readable output. 'c' in the record flags mark the first
-* fragment of a line, '+' the following.
-*/
-   if (msg->flags & LOG_CONT)
-   cont = (prev_flags & LOG_CONT) ? '+' : 'c';
-
return scnprintf(buf, size, "%u,%llu,%llu,%c;",
-  (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+  (msg->facility << 3) | msg->level, seq, ts_usec,
+  msg->flags & LOG_CONT ? 'c' : '-');
 }
 
 static ssize_t msg_print_ext_body(char *buf, size_t size,
@@ -725,7 +711,6 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 struct devkmsg_user {
u64 seq;
u32 idx;
-   enum log_flags prev;
struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
@@ -794,7 +779,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -811,7 +796,6 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (ret)
return ret;
raw_spin_lock_irq(_lock);
-   cont_flush();
while (user->seq == log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
@@ -838,12 +822,11 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
 
msg = log_from_idx(user->idx);
len = msg_print_ext_header(user->buf, sizeof(user->buf),
-  msg, user->seq, user->prev);
+  msg, user->seq);
len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
  log_dict(msg), msg->dict_len,
  log_text(msg), msg->text_len);
 
-   user->prev = msg->flags;
user->idx = log_next(user->idx);
user->seq++;
raw_spin_unlock_irq(_lock);
@@ -860,6 +843,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
ret = len;
 out:
mutex_unlock(>lock);
+   deferred_cont_flush();
return ret;
 }
 
@@ -874,7 +858,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
return -ESPIPE;
 
raw_spin_lock_irq(_lock);
-   cont_flush();
switch (whence) {
case SEEK_SET:
/* the first record */
@@ -913,7 +896,6 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
poll_wait(file, _wait, wait);
 
raw_spin_lock_irq(_lock);
-   cont_flush();
if (user->seq < log_next_seq) {
/* return error when data has vanished underneath us */
if (user->seq < log_first_seq)
@@ -922,6 +904,7 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
ret = 

Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 10:54 AM, Linus Torvalds
 wrote:
>
> Note the "totally untested" part. It compiles for me. But it may do
> unspeakably stupid things. Caveat applior.

Well, it is hard to apply a patch that I didn't even attach. Blush.

   Linus
 kernel/printk/printk.c | 249 +++--
 1 file changed, 52 insertions(+), 197 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..6df4d3f6c276 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -367,7 +367,6 @@ DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
-static enum log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
@@ -381,7 +380,6 @@ static u32 log_next_idx;
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
@@ -650,27 +648,15 @@ static void append_char(char **pp, char *e, char c)
 }
 
 static ssize_t msg_print_ext_header(char *buf, size_t size,
-   struct printk_log *msg, u64 seq,
-   enum log_flags prev_flags)
+   struct printk_log *msg, u64 seq)
 {
u64 ts_usec = msg->ts_nsec;
-   char cont = '-';
 
do_div(ts_usec, 1000);
 
-   /*
-* If we couldn't merge continuation line fragments during the print,
-* export the stored flags to allow an optional external merge of the
-* records. Merging the records isn't always neccessarily correct, like
-* when we hit a race during printing. In most cases though, it produces
-* better readable output. 'c' in the record flags mark the first
-* fragment of a line, '+' the following.
-*/
-   if (msg->flags & LOG_CONT)
-   cont = (prev_flags & LOG_CONT) ? '+' : 'c';
-
return scnprintf(buf, size, "%u,%llu,%llu,%c;",
-  (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+  (msg->facility << 3) | msg->level, seq, ts_usec,
+  msg->flags & LOG_CONT ? 'c' : '-');
 }
 
 static ssize_t msg_print_ext_body(char *buf, size_t size,
@@ -725,7 +711,6 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 struct devkmsg_user {
u64 seq;
u32 idx;
-   enum log_flags prev;
struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
@@ -794,7 +779,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -811,7 +796,6 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (ret)
return ret;
raw_spin_lock_irq(_lock);
-   cont_flush();
while (user->seq == log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
@@ -838,12 +822,11 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
 
msg = log_from_idx(user->idx);
len = msg_print_ext_header(user->buf, sizeof(user->buf),
-  msg, user->seq, user->prev);
+  msg, user->seq);
len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
  log_dict(msg), msg->dict_len,
  log_text(msg), msg->text_len);
 
-   user->prev = msg->flags;
user->idx = log_next(user->idx);
user->seq++;
raw_spin_unlock_irq(_lock);
@@ -860,6 +843,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
ret = len;
 out:
mutex_unlock(>lock);
+   deferred_cont_flush();
return ret;
 }
 
@@ -874,7 +858,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
return -ESPIPE;
 
raw_spin_lock_irq(_lock);
-   cont_flush();
switch (whence) {
case SEEK_SET:
/* the first record */
@@ -913,7 +896,6 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
poll_wait(file, _wait, wait);
 
raw_spin_lock_irq(_lock);
-   cont_flush();
if (user->seq < log_next_seq) {
/* return error when data has vanished underneath us */
if (user->seq < log_first_seq)
@@ -922,6 +904,7 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
ret = POLLIN|POLLRDNORM;
}
 

Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 7:08 AM, Sergey Senozhatsky
 wrote:
>
> is mod_timer() safe enough to rely on/call from
> panic()->console_flush_on_panic()->console_unlock() ?

I don't think that's a big issue: the whole "panic()" siotuation is
very much about best effort.

> shouldn't deferred_cont_flush() be called every time we jump
> to `again' label in console_unlock()?

actually, I think we should just do it at the end.

And your "spinlock bad magic" reports says to me that we should
actually delay it even more, and just do it in klogd, to avoid the
whole issue with "printk() is used very _very_ early during boot".

Because if klogd isn't even running yet, then we clearly don't care
about timely flushing to the logs.

So here's a new version - TOTALLY UNTESTED! - that removes even more
redundant code (all the "test previous record flags" crud - it is not
sensible to think that the previous record was what a continuation was
all about, since the whole reason the continuation didn't get merged
was that something else happened in between).

Note the "totally untested" part. It compiles for me. But it may do
unspeakably stupid things. Caveat applior.

 Linus


Re: linux.git: printk() problem

2016-10-24 Thread Linus Torvalds
On Mon, Oct 24, 2016 at 7:08 AM, Sergey Senozhatsky
 wrote:
>
> is mod_timer() safe enough to rely on/call from
> panic()->console_flush_on_panic()->console_unlock() ?

I don't think that's a big issue: the whole "panic()" siotuation is
very much about best effort.

> shouldn't deferred_cont_flush() be called every time we jump
> to `again' label in console_unlock()?

actually, I think we should just do it at the end.

And your "spinlock bad magic" reports says to me that we should
actually delay it even more, and just do it in klogd, to avoid the
whole issue with "printk() is used very _very_ early during boot".

Because if klogd isn't even running yet, then we clearly don't care
about timely flushing to the logs.

So here's a new version - TOTALLY UNTESTED! - that removes even more
redundant code (all the "test previous record flags" crud - it is not
sensible to think that the previous record was what a continuation was
all about, since the whole reason the continuation didn't get merged
was that something else happened in between).

Note the "totally untested" part. It compiles for me. But it may do
unspeakably stupid things. Caveat applior.

 Linus


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
> On (10/23/16 12:46), Linus Torvalds wrote:
> > +static void deferred_cont_flush(void)
> > +{
> > +   static DEFINE_TIMER(timer, flush_timer, 0, 0);
> > +
> > +   if (!cont.len)
> > return;
> > +   mod_timer(, jiffies + HZ/10);
> >  }
> 
> [..]
> 
> > @@ -2360,6 +2285,8 @@ void console_unlock(void)
> > return;
> > }
> >  
> > +   deferred_cont_flush();
> > +
> 

[0.00] BUG: spinlock bad magic on CPU#0, swapper/0
[0.00]  lock: __init_begin+0x1cebc0/0x1da000, .magic: , .owner: 
swapper/0, .owner_cpu: 0
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
4.9.0-rc1-next-20161024-dbg-dirty #150
[0.00]  81803c18 81221216 8180a540 
81ac7bc0
[0.00]  81803c38 81081414 81ac7bc0 
0046
[0.00]  81803c58 81081444 81ac7bc0 
81737e85
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4f/0x65
[0.00]  [] spin_dump+0x8a/0x8f
[0.00]  [] spin_bug+0x2b/0x2d
[0.00]  [] do_raw_spin_unlock+0x22/0x81
[0.00]  [] _raw_spin_unlock_irqrestore+0x2c/0x4e
[0.00]  [] mod_timer+0x266/0x2a0
[0.00]  [] deferred_cont_flush+0x2a/0x2c
[0.00]  [] console_unlock+0x28/0x45c
[0.00]  [] ? vprintk_emit+0x3a6/0x3c5
[0.00]  [] vprintk_emit+0x3af/0x3c5
[0.00]  [] vprintk_default+0x1d/0x1f
[0.00]  [] printk+0x48/0x50
[0.00]  [] ? vprintk_default+0x1d/0x1f
[0.00]  [] e820_print_map+0x4a/0x70
[0.00]  [] ? 0x8100
[0.00]  [] setup_memory_map+0x37/0x3a
[0.00]  [] setup_arch+0x16d/0x890
[0.00]  [] start_kernel+0x59/0x40c
[0.00]  [] x86_64_start_reservations+0x2a/0x2c
[0.00]  [] x86_64_start_kernel+0x17a/0x18d
[0.00]  [] start_cpu+0x5/0x14
[0.00]  [] ? start_cpu+0x5/0x14
[0.00] usable

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
> On (10/23/16 12:46), Linus Torvalds wrote:
> > +static void deferred_cont_flush(void)
> > +{
> > +   static DEFINE_TIMER(timer, flush_timer, 0, 0);
> > +
> > +   if (!cont.len)
> > return;
> > +   mod_timer(, jiffies + HZ/10);
> >  }
> 
> [..]
> 
> > @@ -2360,6 +2285,8 @@ void console_unlock(void)
> > return;
> > }
> >  
> > +   deferred_cont_flush();
> > +
> 

[0.00] BUG: spinlock bad magic on CPU#0, swapper/0
[0.00]  lock: __init_begin+0x1cebc0/0x1da000, .magic: , .owner: 
swapper/0, .owner_cpu: 0
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
4.9.0-rc1-next-20161024-dbg-dirty #150
[0.00]  81803c18 81221216 8180a540 
81ac7bc0
[0.00]  81803c38 81081414 81ac7bc0 
0046
[0.00]  81803c58 81081444 81ac7bc0 
81737e85
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4f/0x65
[0.00]  [] spin_dump+0x8a/0x8f
[0.00]  [] spin_bug+0x2b/0x2d
[0.00]  [] do_raw_spin_unlock+0x22/0x81
[0.00]  [] _raw_spin_unlock_irqrestore+0x2c/0x4e
[0.00]  [] mod_timer+0x266/0x2a0
[0.00]  [] deferred_cont_flush+0x2a/0x2c
[0.00]  [] console_unlock+0x28/0x45c
[0.00]  [] ? vprintk_emit+0x3a6/0x3c5
[0.00]  [] vprintk_emit+0x3af/0x3c5
[0.00]  [] vprintk_default+0x1d/0x1f
[0.00]  [] printk+0x48/0x50
[0.00]  [] ? vprintk_default+0x1d/0x1f
[0.00]  [] e820_print_map+0x4a/0x70
[0.00]  [] ? 0x8100
[0.00]  [] setup_memory_map+0x37/0x3a
[0.00]  [] setup_arch+0x16d/0x890
[0.00]  [] start_kernel+0x59/0x40c
[0.00]  [] x86_64_start_reservations+0x2a/0x2c
[0.00]  [] x86_64_start_kernel+0x17a/0x18d
[0.00]  [] start_cpu+0x5/0x14
[0.00]  [] ? start_cpu+0x5/0x14
[0.00] usable

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
Hello,

thanks for Cc-ing.

On (10/23/16 12:46), Linus Torvalds wrote:
> +static void deferred_cont_flush(void)
> +{
> + static DEFINE_TIMER(timer, flush_timer, 0, 0);
> +
> + if (!cont.len)
>   return;
> + mod_timer(, jiffies + HZ/10);
>  }

[..]

> @@ -2360,6 +2285,8 @@ void console_unlock(void)
>   return;
>   }
>  
> + deferred_cont_flush();
> +

is mod_timer() safe enough to rely on/call from
panic()->console_flush_on_panic()->console_unlock() ?

shouldn't deferred_cont_flush() be called every time we jump
to `again' label in console_unlock()?

timer has debug object support, which probably can printk(), but
that shouldn't cause any troubles, I suppose.

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Sergey Senozhatsky
Hello,

thanks for Cc-ing.

On (10/23/16 12:46), Linus Torvalds wrote:
> +static void deferred_cont_flush(void)
> +{
> + static DEFINE_TIMER(timer, flush_timer, 0, 0);
> +
> + if (!cont.len)
>   return;
> + mod_timer(, jiffies + HZ/10);
>  }

[..]

> @@ -2360,6 +2285,8 @@ void console_unlock(void)
>   return;
>   }
>  
> + deferred_cont_flush();
> +

is mod_timer() safe enough to rely on/call from
panic()->console_flush_on_panic()->console_unlock() ?

shouldn't deferred_cont_flush() be called every time we jump
to `again' label in console_unlock()?

timer has debug object support, which probably can printk(), but
that shouldn't cause any troubles, I suppose.

-ss


Re: linux.git: printk() problem

2016-10-24 Thread Geert Uytterhoeven
Hi Linus,

On Sun, Oct 23, 2016 at 9:46 PM, Linus Torvalds
 wrote:
> On Sun, Oct 23, 2016 at 12:32 PM, Linus Torvalds
>  wrote:
>>
>> No, the real complexity comes from that interaction with the console
>> output, which is done outside the core log locks, and which currently
>> has the added thing where we have a "has this line fragment been
>> flushed or not".
>
> Ok, so here's the stupid patch that removes all the partial line flushing.
>
> NOTE! It still leaves all the games with LOG_NEWLINE and LOG_NOCONS
> that are pretty much pointless with it. So there's room for more
> simplification here.
>
> In particular, the games with LOG_NEWLINE is what Geert's "console and
> dmesg output looks different" at least partially comes from. What
> happens is that "dmesg" always shows the records as one line (so it
> effectively ignores LOG_NEWLINE), but the console output (in
> msg_print_text() still has that LOG_NEWLINE logic.
>
> In particular, msg_print_text() looks at the *previous* logged line to
> decide whether it should do newlines etc, which is why Geert gets that
> odd "two continuations per line" pattern on the console, but "one
> continuation per line" in dmesg. That comes from the interaction with
> flushing to the console and LOG_NEWLINE and just general complexity.

Thanks, Linux kernel output is again in sync with dmesg output.

Tested-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: linux.git: printk() problem

2016-10-24 Thread Geert Uytterhoeven
Hi Linus,

On Sun, Oct 23, 2016 at 9:46 PM, Linus Torvalds
 wrote:
> On Sun, Oct 23, 2016 at 12:32 PM, Linus Torvalds
>  wrote:
>>
>> No, the real complexity comes from that interaction with the console
>> output, which is done outside the core log locks, and which currently
>> has the added thing where we have a "has this line fragment been
>> flushed or not".
>
> Ok, so here's the stupid patch that removes all the partial line flushing.
>
> NOTE! It still leaves all the games with LOG_NEWLINE and LOG_NOCONS
> that are pretty much pointless with it. So there's room for more
> simplification here.
>
> In particular, the games with LOG_NEWLINE is what Geert's "console and
> dmesg output looks different" at least partially comes from. What
> happens is that "dmesg" always shows the records as one line (so it
> effectively ignores LOG_NEWLINE), but the console output (in
> msg_print_text() still has that LOG_NEWLINE logic.
>
> In particular, msg_print_text() looks at the *previous* logged line to
> decide whether it should do newlines etc, which is why Geert gets that
> odd "two continuations per line" pattern on the console, but "one
> continuation per line" in dmesg. That comes from the interaction with
> flushing to the console and LOG_NEWLINE and just general complexity.

Thanks, Linux kernel output is again in sync with dmesg output.

Tested-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: linux.git: printk() problem

2016-10-23 Thread Linus Torvalds
On Sun, Oct 23, 2016 at 1:33 PM, Joe Perches  wrote:
>
> Perhaps it could be a pool of active thread
> continuation buffers.

Yes, we could probably do with just a couple of entries. Even just two
would probably catch almost all cases.

That said, having dealt with that code, I'd hate to make it more
complex again for something where the preferred solution is "don't do
it".

 Linus


Re: linux.git: printk() problem

2016-10-23 Thread Linus Torvalds
On Sun, Oct 23, 2016 at 1:33 PM, Joe Perches  wrote:
>
> Perhaps it could be a pool of active thread
> continuation buffers.

Yes, we could probably do with just a couple of entries. Even just two
would probably catch almost all cases.

That said, having dealt with that code, I'd hate to make it more
complex again for something where the preferred solution is "don't do
it".

 Linus


Re: linux.git: printk() problem

2016-10-23 Thread Joe Perches
On Sun, 2016-10-23 at 12:32 -0700, Linus Torvalds wrote:
> On Sun, Oct 23, 2016 at 12:06 PM, Joe Perches  wrote:
> > On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
> > > 
> > > And those two per se sound fairly easy to handle ("KERN_CONT means
> > > append to the line buffer, otherwise flush the line buffer and move to
> > > the record buffer").
> > > 
> > > But what complicates things more is then the "console output", which
> > > has two issues:
> > > 
> > >  - it is done outside the locking regime for the line buffer and the
> > > record buffer.
> > > 
> > >  - it is done on _partial_ line buffers.
> > 
> > 
> > EOL KERN_ and thread interleaving still exists.
> 
> 
> Note that the thread interleaving is still trivial: it's easily done
> at the point where we decide "can we append to the line buffer or
> not". That's pretty simple. Just flush the record when the thread
> changes.
> 
> So the interleaving will never go away, it's very fundamental - unless
> we make the line buffer just be a per-thread thing. And yes, that
> would be the cleanest solution, but it's also an extra buffer for each
> thread, so realistically it's just not going to happen.

I doubt there are cases where more than a few of
these interleaving threads are simultaneous.

Perhaps it could be a pool of active thread
continuation buffers.  

> End result: I'm not worried about the interleaving. It will cause ugly
> output, but we've always had that, and the solution to it is "if you
> absolutely don't want interleaving, then don't try to print partial
> lines!".
> The classic "don't do that then" response, in other world.

Yup, best solution.



Re: linux.git: printk() problem

2016-10-23 Thread Joe Perches
On Sun, 2016-10-23 at 12:32 -0700, Linus Torvalds wrote:
> On Sun, Oct 23, 2016 at 12:06 PM, Joe Perches  wrote:
> > On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
> > > 
> > > And those two per se sound fairly easy to handle ("KERN_CONT means
> > > append to the line buffer, otherwise flush the line buffer and move to
> > > the record buffer").
> > > 
> > > But what complicates things more is then the "console output", which
> > > has two issues:
> > > 
> > >  - it is done outside the locking regime for the line buffer and the
> > > record buffer.
> > > 
> > >  - it is done on _partial_ line buffers.
> > 
> > 
> > EOL KERN_ and thread interleaving still exists.
> 
> 
> Note that the thread interleaving is still trivial: it's easily done
> at the point where we decide "can we append to the line buffer or
> not". That's pretty simple. Just flush the record when the thread
> changes.
> 
> So the interleaving will never go away, it's very fundamental - unless
> we make the line buffer just be a per-thread thing. And yes, that
> would be the cleanest solution, but it's also an extra buffer for each
> thread, so realistically it's just not going to happen.

I doubt there are cases where more than a few of
these interleaving threads are simultaneous.

Perhaps it could be a pool of active thread
continuation buffers.  

> End result: I'm not worried about the interleaving. It will cause ugly
> output, but we've always had that, and the solution to it is "if you
> absolutely don't want interleaving, then don't try to print partial
> lines!".
> The classic "don't do that then" response, in other world.

Yup, best solution.



Re: linux.git: printk() problem

2016-10-23 Thread Linus Torvalds
On Sun, Oct 23, 2016 at 12:32 PM, Linus Torvalds
 wrote:
>
> No, the real complexity comes from that interaction with the console
> output, which is done outside the core log locks, and which currently
> has the added thing where we have a "has this line fragment been
> flushed or not".

Ok, so here's the stupid patch that removes all the partial line flushing.

NOTE! It still leaves all the games with LOG_NEWLINE and LOG_NOCONS
that are pretty much pointless with it. So there's room for more
simplification here.

In particular, the games with LOG_NEWLINE is what Geert's "console and
dmesg output looks different" at least partially comes from. What
happens is that "dmesg" always shows the records as one line (so it
effectively ignores LOG_NEWLINE), but the console output (in
msg_print_text() still has that LOG_NEWLINE logic.

In particular, msg_print_text() looks at the *previous* logged line to
decide whether it should do newlines etc, which is why Geert gets that
odd "two continuations per line" pattern on the console, but "one
continuation per line" in dmesg. That comes from the interaction with
flushing to the console and LOG_NEWLINE and just general complexity.

All of that LOG_NEWLINE code could be removed. But again, this patch
doesn't do that removal. It just removes the partial console flushing
and simplifies that part of the code.

(This patch removes way more lines than it adds, but the *real*
advantage is that it removes complexity. The rules for
console_cont_flush() really were _very_ hard to grok, it has subtle
interactions with cont_add() and cont_flush() through that "cont.cons"
and "cont.flushed" logic that is all removed by this patch).

Linus
 kernel/printk/printk.c | 132 +++--
 1 file changed, 28 insertions(+), 104 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..ed1bfa774427 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -794,7 +794,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -811,7 +811,6 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (ret)
return ret;
raw_spin_lock_irq(_lock);
-   cont_flush();
while (user->seq == log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
@@ -860,6 +859,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
ret = len;
 out:
mutex_unlock(>lock);
+   deferred_cont_flush();
return ret;
 }
 
@@ -874,7 +874,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
return -ESPIPE;
 
raw_spin_lock_irq(_lock);
-   cont_flush();
switch (whence) {
case SEEK_SET:
/* the first record */
@@ -913,7 +912,6 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
poll_wait(file, _wait, wait);
 
raw_spin_lock_irq(_lock);
-   cont_flush();
if (user->seq < log_next_seq) {
/* return error when data has vanished underneath us */
if (user->seq < log_first_seq)
@@ -922,6 +920,7 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
ret = POLLIN|POLLRDNORM;
}
raw_spin_unlock_irq(_lock);
+   deferred_cont_flush();
 
return ret;
 }
@@ -1300,7 +1299,6 @@ static int syslog_print(char __user *buf, int size)
size_t skip;
 
raw_spin_lock_irq(_lock);
-   cont_flush();
if (syslog_seq < log_first_seq) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq;
@@ -1360,7 +1358,6 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
return -ENOMEM;
 
raw_spin_lock_irq(_lock);
-   cont_flush();
if (buf) {
u64 next_seq;
u64 seq;
@@ -1522,7 +1519,6 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
raw_spin_lock_irq(_lock);
-   cont_flush();
if (syslog_seq < log_first_seq) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq;
@@ -1563,6 +1559,7 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
error = -EINVAL;
break;
}
+   deferred_cont_flush();
 out:
return error;
 }
@@ -1650,46 +1647,41 @@ static inline void 

Re: linux.git: printk() problem

2016-10-23 Thread Linus Torvalds
On Sun, Oct 23, 2016 at 12:32 PM, Linus Torvalds
 wrote:
>
> No, the real complexity comes from that interaction with the console
> output, which is done outside the core log locks, and which currently
> has the added thing where we have a "has this line fragment been
> flushed or not".

Ok, so here's the stupid patch that removes all the partial line flushing.

NOTE! It still leaves all the games with LOG_NEWLINE and LOG_NOCONS
that are pretty much pointless with it. So there's room for more
simplification here.

In particular, the games with LOG_NEWLINE is what Geert's "console and
dmesg output looks different" at least partially comes from. What
happens is that "dmesg" always shows the records as one line (so it
effectively ignores LOG_NEWLINE), but the console output (in
msg_print_text() still has that LOG_NEWLINE logic.

In particular, msg_print_text() looks at the *previous* logged line to
decide whether it should do newlines etc, which is why Geert gets that
odd "two continuations per line" pattern on the console, but "one
continuation per line" in dmesg. That comes from the interaction with
flushing to the console and LOG_NEWLINE and just general complexity.

All of that LOG_NEWLINE code could be removed. But again, this patch
doesn't do that removal. It just removes the partial console flushing
and simplifies that part of the code.

(This patch removes way more lines than it adds, but the *real*
advantage is that it removes complexity. The rules for
console_cont_flush() really were _very_ hard to grok, it has subtle
interactions with cont_add() and cont_flush() through that "cont.cons"
and "cont.flushed" logic that is all removed by this patch).

Linus
 kernel/printk/printk.c | 132 +++--
 1 file changed, 28 insertions(+), 104 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..ed1bfa774427 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -794,7 +794,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -811,7 +811,6 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (ret)
return ret;
raw_spin_lock_irq(_lock);
-   cont_flush();
while (user->seq == log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
@@ -860,6 +859,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
ret = len;
 out:
mutex_unlock(>lock);
+   deferred_cont_flush();
return ret;
 }
 
@@ -874,7 +874,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
return -ESPIPE;
 
raw_spin_lock_irq(_lock);
-   cont_flush();
switch (whence) {
case SEEK_SET:
/* the first record */
@@ -913,7 +912,6 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
poll_wait(file, _wait, wait);
 
raw_spin_lock_irq(_lock);
-   cont_flush();
if (user->seq < log_next_seq) {
/* return error when data has vanished underneath us */
if (user->seq < log_first_seq)
@@ -922,6 +920,7 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
ret = POLLIN|POLLRDNORM;
}
raw_spin_unlock_irq(_lock);
+   deferred_cont_flush();
 
return ret;
 }
@@ -1300,7 +1299,6 @@ static int syslog_print(char __user *buf, int size)
size_t skip;
 
raw_spin_lock_irq(_lock);
-   cont_flush();
if (syslog_seq < log_first_seq) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq;
@@ -1360,7 +1358,6 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
return -ENOMEM;
 
raw_spin_lock_irq(_lock);
-   cont_flush();
if (buf) {
u64 next_seq;
u64 seq;
@@ -1522,7 +1519,6 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
raw_spin_lock_irq(_lock);
-   cont_flush();
if (syslog_seq < log_first_seq) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq;
@@ -1563,6 +1559,7 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
error = -EINVAL;
break;
}
+   deferred_cont_flush();
 out:
return error;
 }
@@ -1650,46 +1647,41 @@ static inline void printk_delay(void)
 static struct 

Re: linux.git: printk() problem

2016-10-23 Thread Linus Torvalds
On Sun, Oct 23, 2016 at 12:06 PM, Joe Perches  wrote:
> On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
>>
>> And those two per se sound fairly easy to handle ("KERN_CONT means
>> append to the line buffer, otherwise flush the line buffer and move to
>> the record buffer").
>>
>> But what complicates things more is then the "console output", which
>> has two issues:
>>
>>  - it is done outside the locking regime for the line buffer and the
>> record buffer.
>>
>>  - it is done on _partial_ line buffers.
>
> EOL KERN_ and thread interleaving still exists.

Note that the thread interleaving is still trivial: it's easily done
at the point where we decide "can we append to the line buffer or
not". That's pretty simple. Just flush the record when the thread
changes.

So the interleaving will never go away, it's very fundamental - unless
we make the line buffer just be a per-thread thing. And yes, that
would be the cleanest solution, but it's also an extra buffer for each
thread, so realistically it's just not going to happen.

End result: I'm not worried about the interleaving. It will cause ugly
output, but we've always had that, and the solution to it is "if you
absolutely don't want interleaving, then don't try to print partial
lines!".

The classic "don't do that then" response, in other world.

No, the real complexity comes from that interaction with the console
output, which is done outside the core log locks, and which currently
has the added thing where we have a "has this line fragment been
flushed or not".

That "has this line fragment been flushed or not" is particularly
painful, because we may have flushed it *partially*. That "cont.cons"
thing is a counter of how many bytes have been flushed, and we can be
in the situation where we have had multiple continuations added to the
line buffer, and only *some* of them have been flushed to the console.

(Reasons for not flushing: we couldn't get the console lock because
another process held it due to logging or whatever).

And then the interface to the actual record logging only has a "all or
nothing was flushed" flag (LOG_NOCONS) to avoid flushing things twice.
So when we actually log the record, we lose the "this line was only
partially printed".

That whole "we've flushed part of the line to the console" thing is
why it would make things so much easier to just log full records to
the console. Getting rid of that gets rid of a lot of ugly and
hard-to-read crap. Yes, the line buffer would still remain, and yes,
you'd still see the interleaving with threads, but that's not
"complexity", that's just "visually ugly output".

 Linus


Re: linux.git: printk() problem

2016-10-23 Thread Linus Torvalds
On Sun, Oct 23, 2016 at 12:06 PM, Joe Perches  wrote:
> On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
>>
>> And those two per se sound fairly easy to handle ("KERN_CONT means
>> append to the line buffer, otherwise flush the line buffer and move to
>> the record buffer").
>>
>> But what complicates things more is then the "console output", which
>> has two issues:
>>
>>  - it is done outside the locking regime for the line buffer and the
>> record buffer.
>>
>>  - it is done on _partial_ line buffers.
>
> EOL KERN_ and thread interleaving still exists.

Note that the thread interleaving is still trivial: it's easily done
at the point where we decide "can we append to the line buffer or
not". That's pretty simple. Just flush the record when the thread
changes.

So the interleaving will never go away, it's very fundamental - unless
we make the line buffer just be a per-thread thing. And yes, that
would be the cleanest solution, but it's also an extra buffer for each
thread, so realistically it's just not going to happen.

End result: I'm not worried about the interleaving. It will cause ugly
output, but we've always had that, and the solution to it is "if you
absolutely don't want interleaving, then don't try to print partial
lines!".

The classic "don't do that then" response, in other world.

No, the real complexity comes from that interaction with the console
output, which is done outside the core log locks, and which currently
has the added thing where we have a "has this line fragment been
flushed or not".

That "has this line fragment been flushed or not" is particularly
painful, because we may have flushed it *partially*. That "cont.cons"
thing is a counter of how many bytes have been flushed, and we can be
in the situation where we have had multiple continuations added to the
line buffer, and only *some* of them have been flushed to the console.

(Reasons for not flushing: we couldn't get the console lock because
another process held it due to logging or whatever).

And then the interface to the actual record logging only has a "all or
nothing was flushed" flag (LOG_NOCONS) to avoid flushing things twice.
So when we actually log the record, we lose the "this line was only
partially printed".

That whole "we've flushed part of the line to the console" thing is
why it would make things so much easier to just log full records to
the console. Getting rid of that gets rid of a lot of ugly and
hard-to-read crap. Yes, the line buffer would still remain, and yes,
you'd still see the interleaving with threads, but that's not
"complexity", that's just "visually ugly output".

 Linus


Re: linux.git: printk() problem

2016-10-23 Thread Joe Perches
On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
> On Sun, Oct 23, 2016 at 2:22 AM, Geert Uytterhoeven
>  wrote:
> > 
> > These changes have an interesting side-effect on sequences of printk()s that
> > lack proper continuation: they introduced a discrepancy between dmesg output
> > and the actual kernel output.
> 
> Yes.
> 
> So the "print vs log" handling is really really horrible. I cleaned up
> some of it, but left the really fundamental problems. I wanted to just
> rewrite it all, but didn't quite have the heart for it.
> 
> The best solution by far would be to just not support KERN_CONT at
> all,  but there's too many "silly details" things that keep it from
> being possible.
> 
> The basic issue is that we have the line buffer that is used for
> continuations, and then the record buffer that is used for logging.
> 
> And those two per se sound fairly easy to handle ("KERN_CONT means
> append to the line buffer, otherwise flush the line buffer and move to
> the record buffer").
> 
> But what complicates things more is then the "console output", which
> has two issues:
> 
>  - it is done outside the locking regime for the line buffer and the
> record buffer.
> 
>  - it is done on _partial_ line buffers.

EOL KERN_ and thread interleaving still exists.

> It would be really quite easy to say "we don't print out
> continuation lines immediately, we just buffer them for 0.1s instead,
> and KERN_CONT only works for things that really happen more or less
> immediately".

Or use to a start/stop buffer (maybe via KERN_ and \n) with
PID/TIDs added to /dev/kmsg and that short-term timer to reassemble.

> Maybe that really is the right answer. Because the original cause of
> us having to bend over backwards in this case is really no longer
> there. And it would simplify printk a *lot*.

A timer might be a good idea, but perhaps Sergey and Petr might
have some interest in that too. (added to cc's)


Re: linux.git: printk() problem

2016-10-23 Thread Joe Perches
On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
> On Sun, Oct 23, 2016 at 2:22 AM, Geert Uytterhoeven
>  wrote:
> > 
> > These changes have an interesting side-effect on sequences of printk()s that
> > lack proper continuation: they introduced a discrepancy between dmesg output
> > and the actual kernel output.
> 
> Yes.
> 
> So the "print vs log" handling is really really horrible. I cleaned up
> some of it, but left the really fundamental problems. I wanted to just
> rewrite it all, but didn't quite have the heart for it.
> 
> The best solution by far would be to just not support KERN_CONT at
> all,  but there's too many "silly details" things that keep it from
> being possible.
> 
> The basic issue is that we have the line buffer that is used for
> continuations, and then the record buffer that is used for logging.
> 
> And those two per se sound fairly easy to handle ("KERN_CONT means
> append to the line buffer, otherwise flush the line buffer and move to
> the record buffer").
> 
> But what complicates things more is then the "console output", which
> has two issues:
> 
>  - it is done outside the locking regime for the line buffer and the
> record buffer.
> 
>  - it is done on _partial_ line buffers.

EOL KERN_ and thread interleaving still exists.

> It would be really quite easy to say "we don't print out
> continuation lines immediately, we just buffer them for 0.1s instead,
> and KERN_CONT only works for things that really happen more or less
> immediately".

Or use to a start/stop buffer (maybe via KERN_ and \n) with
PID/TIDs added to /dev/kmsg and that short-term timer to reassemble.

> Maybe that really is the right answer. Because the original cause of
> us having to bend over backwards in this case is really no longer
> there. And it would simplify printk a *lot*.

A timer might be a good idea, but perhaps Sergey and Petr might
have some interest in that too. (added to cc's)


Re: linux.git: printk() problem

2016-10-23 Thread Linus Torvalds
On Sun, Oct 23, 2016 at 2:22 AM, Geert Uytterhoeven
 wrote:
>
> These changes have an interesting side-effect on sequences of printk()s that
> lack proper continuation: they introduced a discrepancy between dmesg output
> and the actual kernel output.

Yes.

So the "print vs log" handling is really really horrible. I cleaned up
some of it, but left the really fundamental problems. I wanted to just
rewrite it all, but didn't quite have the heart for it.

The best solution by far would be to just not support KERN_CONT at
all, but there's too many "silly details" things that keep it from
being possible.

The basic issue is that we have the line buffer that is used for
continuations, and then the record buffer that is used for logging.

And those two per se sound fairly easy to handle ("KERN_CONT means
append to the line buffer, otherwise flush the line buffer and move to
the record buffer").

But what complicates things more is then the "console output", which
has two issues:

 - it is done outside the locking regime for the line buffer and the
record buffer.

 - it is done on _partial_ line buffers.

Again, this would be absolutely trivial if we just said "we only print
the record buffer". Easy, and solves all the problems. Except for
_one_ problem:

 - if a hang occurs in the middle of a continuation, we historically
*really* want that continuation to have been printed out.

For example, one of the really historical uses for partial lines is this:

   pr_info("Checking 'hlt' instruction... ");

   if (!boot_cpu_data.hlt_works_ok) {
   pr_cont("disabled\n");
   return;
   }
   halt();
   halt();
   halt();
   halt();
   pr_cont("OK\n");

and the point was that there used to be some really old i386 machines
that hung on the "hlt" instruction (probably not because of a CPU bug,
but because of either power supply issues or some DMA issues).

To support that, we really *had* to print out the continuation lines
even when they were partial. And that complicates the printk logic a
lot.

Now, that "hlt" case is long long gone, and maybe we should just say
"screw that". It would be really quite easy to say "we don't print out
continuation lines immediately, we just buffer them for 0.1s instead,
and KERN_CONT only works for things that really happen more or less
immediately".

Maybe that really is the right answer. Because the original cause of
us having to bend over backwards in this case is really no longer
there. And it would simplify printk a *lot*.

Let me whip up a minimal patch for you to try.

Linus


Re: linux.git: printk() problem

2016-10-23 Thread Linus Torvalds
On Sun, Oct 23, 2016 at 2:22 AM, Geert Uytterhoeven
 wrote:
>
> These changes have an interesting side-effect on sequences of printk()s that
> lack proper continuation: they introduced a discrepancy between dmesg output
> and the actual kernel output.

Yes.

So the "print vs log" handling is really really horrible. I cleaned up
some of it, but left the really fundamental problems. I wanted to just
rewrite it all, but didn't quite have the heart for it.

The best solution by far would be to just not support KERN_CONT at
all, but there's too many "silly details" things that keep it from
being possible.

The basic issue is that we have the line buffer that is used for
continuations, and then the record buffer that is used for logging.

And those two per se sound fairly easy to handle ("KERN_CONT means
append to the line buffer, otherwise flush the line buffer and move to
the record buffer").

But what complicates things more is then the "console output", which
has two issues:

 - it is done outside the locking regime for the line buffer and the
record buffer.

 - it is done on _partial_ line buffers.

Again, this would be absolutely trivial if we just said "we only print
the record buffer". Easy, and solves all the problems. Except for
_one_ problem:

 - if a hang occurs in the middle of a continuation, we historically
*really* want that continuation to have been printed out.

For example, one of the really historical uses for partial lines is this:

   pr_info("Checking 'hlt' instruction... ");

   if (!boot_cpu_data.hlt_works_ok) {
   pr_cont("disabled\n");
   return;
   }
   halt();
   halt();
   halt();
   halt();
   pr_cont("OK\n");

and the point was that there used to be some really old i386 machines
that hung on the "hlt" instruction (probably not because of a CPU bug,
but because of either power supply issues or some DMA issues).

To support that, we really *had* to print out the continuation lines
even when they were partial. And that complicates the printk logic a
lot.

Now, that "hlt" case is long long gone, and maybe we should just say
"screw that". It would be really quite easy to say "we don't print out
continuation lines immediately, we just buffer them for 0.1s instead,
and KERN_CONT only works for things that really happen more or less
immediately".

Maybe that really is the right answer. Because the original cause of
us having to bend over backwards in this case is really no longer
there. And it would simplify printk a *lot*.

Let me whip up a minimal patch for you to try.

Linus


Re: linux.git: printk() problem

2016-10-23 Thread Geert Uytterhoeven
Hi Linus,

On Wed, Oct 12, 2016 at 5:47 PM, Linus Torvalds
 wrote:
> On Wed, Oct 12, 2016 at 6:30 AM, Tetsuo Handa
>  wrote:
>>
>> I noticed that current linux.git generates hardly readable console output
>> due to KERN_CONT changes. Are you suggesting developers that output like
>> this be fixed?
>
> Yes. Needing to add a few KERN_CONT markers was pretty much expected,
> since it's about 5 years since we enfroced it and new code won't
> necessarily have it (and even back then I don't think we _always_ got
> it right).
>
> That said, looking at the printk's in the lowmem code, I think it
> could be useful if there was some effort to see if the code could
> somehow avoid the multi-printk thing. This is actually one area where
>
>  (a) the problem actually happens while the system is running, rather
> than booting
>
>  (b) I've seen line mixing in the past
>
> but the short term fix is to just add KERN_CONT markers to the lines
> that are continuations.
>
> NOTE! The reason I mention that (a) thing that it has traditionally
> made it much messier to do logging of continuation lines in the first
> place (because more things are going on and often one problem leads to
> another and then the mixing is much more likely), but I actually
> intentionally made it more likely to trigger the flushing issue in
> commit bfd8d3f23b51 ("printk: make reading the kernel log flush
> pending lines").
>
> So if there is an active system logger that is reading messages *when*
> one of those "one line in multiple printk's" things happen, that log
> reader will now potentially cause the logging to be broken up because
> the act of reading will flush the pending lines.

These changes have an interesting side-effect on sequences of printk()s that
lack proper continuation: they introduced a discrepancy between dmesg output
and the actual kernel output.

Before:

Atari hardware found: VIDEL STDMA-SCSI ST_MFP YM2149 PCM CODEC
DSP56K SCC ANALOG_JOY BLITTER IDE TT_CLK FDC_SPEED

Output of "dmesg" after:

Atari hardware found:
VIDEL
STDMA-SCSI
ST_MFP
YM2149
PCM
CODEC
DSP56K
SCC
ANALOG_JOY
BLITTER
IDE
TT_CLK
FDC_SPEED

Actual kernel output after:

Atari hardware found: VIDEL
STDMA-SCSI ST_MFP
YM2149 PCM
CODEC DSP56K
SCC ANALOG_JOY
BLITTER IDE
TT_CLK FDC_SPEED

Note that the above is really early in the boot process, right after the debug
console is enabled, and before any system log consumer is running,

Of course I'm converting this code to use pr_cont() anyway...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: linux.git: printk() problem

2016-10-23 Thread Geert Uytterhoeven
Hi Linus,

On Wed, Oct 12, 2016 at 5:47 PM, Linus Torvalds
 wrote:
> On Wed, Oct 12, 2016 at 6:30 AM, Tetsuo Handa
>  wrote:
>>
>> I noticed that current linux.git generates hardly readable console output
>> due to KERN_CONT changes. Are you suggesting developers that output like
>> this be fixed?
>
> Yes. Needing to add a few KERN_CONT markers was pretty much expected,
> since it's about 5 years since we enfroced it and new code won't
> necessarily have it (and even back then I don't think we _always_ got
> it right).
>
> That said, looking at the printk's in the lowmem code, I think it
> could be useful if there was some effort to see if the code could
> somehow avoid the multi-printk thing. This is actually one area where
>
>  (a) the problem actually happens while the system is running, rather
> than booting
>
>  (b) I've seen line mixing in the past
>
> but the short term fix is to just add KERN_CONT markers to the lines
> that are continuations.
>
> NOTE! The reason I mention that (a) thing that it has traditionally
> made it much messier to do logging of continuation lines in the first
> place (because more things are going on and often one problem leads to
> another and then the mixing is much more likely), but I actually
> intentionally made it more likely to trigger the flushing issue in
> commit bfd8d3f23b51 ("printk: make reading the kernel log flush
> pending lines").
>
> So if there is an active system logger that is reading messages *when*
> one of those "one line in multiple printk's" things happen, that log
> reader will now potentially cause the logging to be broken up because
> the act of reading will flush the pending lines.

These changes have an interesting side-effect on sequences of printk()s that
lack proper continuation: they introduced a discrepancy between dmesg output
and the actual kernel output.

Before:

Atari hardware found: VIDEL STDMA-SCSI ST_MFP YM2149 PCM CODEC
DSP56K SCC ANALOG_JOY BLITTER IDE TT_CLK FDC_SPEED

Output of "dmesg" after:

Atari hardware found:
VIDEL
STDMA-SCSI
ST_MFP
YM2149
PCM
CODEC
DSP56K
SCC
ANALOG_JOY
BLITTER
IDE
TT_CLK
FDC_SPEED

Actual kernel output after:

Atari hardware found: VIDEL
STDMA-SCSI ST_MFP
YM2149 PCM
CODEC DSP56K
SCC ANALOG_JOY
BLITTER IDE
TT_CLK FDC_SPEED

Note that the above is really early in the boot process, right after the debug
console is enabled, and before any system log consumer is running,

Of course I'm converting this code to use pr_cont() anyway...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: linux.git: printk() problem

2016-10-13 Thread Michal Hocko
On Thu 13-10-16 03:20:50, Joe Perches wrote:
> On Thu, 2016-10-13 at 12:04 +0200, Michal Hocko wrote:
[...]
> > I would be really surprised if we really had that many continuation
> > lines. They should be avoided as much as possible. Hundreds of thousands
> > just sounds more than over exaggerated...
> 
> Hey Michal.
> 
> "Hundreds _to_ thousands" of instances.  Not "hundreds _of_ thousands".

my bad I misread your words. Sorry about that

-- 
Michal Hocko
SUSE Labs


Re: linux.git: printk() problem

2016-10-13 Thread Michal Hocko
On Thu 13-10-16 03:20:50, Joe Perches wrote:
> On Thu, 2016-10-13 at 12:04 +0200, Michal Hocko wrote:
[...]
> > I would be really surprised if we really had that many continuation
> > lines. They should be avoided as much as possible. Hundreds of thousands
> > just sounds more than over exaggerated...
> 
> Hey Michal.
> 
> "Hundreds _to_ thousands" of instances.  Not "hundreds _of_ thousands".

my bad I misread your words. Sorry about that

-- 
Michal Hocko
SUSE Labs


Re: linux.git: printk() problem

2016-10-13 Thread Joe Perches
On Thu, 2016-10-13 at 12:04 +0200, Michal Hocko wrote:
> On Thu 13-10-16 02:29:46, Joe Perches wrote:
> > On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
> 
> [...]
> > > I think they are not critical and can be fix once somebody notices.
> > 
> >  
> > As do I, but Linus objected to applying a patch when Colin Ian King
> > noticed one.
> > 
> > I think the 250,000 or so uses with newlines are enough of a
> > precedence to keep using newlines everywhere.
> 
> 
> or simply fix missing KERN_CONTs and simply do not add any new missing \n
> 
> > Now we'll have to have patches adding hundreds to thousands of the
> > missing KERN_CONTs for continuation lines that weren't previously a
> > problem in logging output but are now.
> 
> 
> I would be really surprised if we really had that many continuation
> lines. They should be avoided as much as possible. Hundreds of thousands
> just sounds more than over exaggerated...

Hey Michal.

"Hundreds _to_ thousands" of instances.  Not "hundreds _of_ thousands".

> Not requiring \n at the end of strings just makes a lot of sense if we
> have a KERN_CONT with a well defined semantic.

True enough.  And I am not at all arguing against having a
well defined KERN_CONT semantic.

But using KERN_CONT alone is not enough information to be able to
perfectly reassemble message fragments post hoc given multiple
threads possibly interleaving KERN_CONT.

I do think the inconsistency of mixing styles with and without
newlines not particularly good.

cheers, Joe


Re: linux.git: printk() problem

2016-10-13 Thread Joe Perches
On Thu, 2016-10-13 at 12:04 +0200, Michal Hocko wrote:
> On Thu 13-10-16 02:29:46, Joe Perches wrote:
> > On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
> 
> [...]
> > > I think they are not critical and can be fix once somebody notices.
> > 
> >  
> > As do I, but Linus objected to applying a patch when Colin Ian King
> > noticed one.
> > 
> > I think the 250,000 or so uses with newlines are enough of a
> > precedence to keep using newlines everywhere.
> 
> 
> or simply fix missing KERN_CONTs and simply do not add any new missing \n
> 
> > Now we'll have to have patches adding hundreds to thousands of the
> > missing KERN_CONTs for continuation lines that weren't previously a
> > problem in logging output but are now.
> 
> 
> I would be really surprised if we really had that many continuation
> lines. They should be avoided as much as possible. Hundreds of thousands
> just sounds more than over exaggerated...

Hey Michal.

"Hundreds _to_ thousands" of instances.  Not "hundreds _of_ thousands".

> Not requiring \n at the end of strings just makes a lot of sense if we
> have a KERN_CONT with a well defined semantic.

True enough.  And I am not at all arguing against having a
well defined KERN_CONT semantic.

But using KERN_CONT alone is not enough information to be able to
perfectly reassemble message fragments post hoc given multiple
threads possibly interleaving KERN_CONT.

I do think the inconsistency of mixing styles with and without
newlines not particularly good.

cheers, Joe


Re: linux.git: printk() problem

2016-10-13 Thread Michal Hocko
On Thu 13-10-16 02:29:46, Joe Perches wrote:
> On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
[...]
> > I think they are not critical and can be fix once somebody notices.
>  
> As do I, but Linus objected to applying a patch when Colin Ian King
> noticed one.
> 
> I think the 250,000 or so uses with newlines are enough of a
> precedence to keep using newlines everywhere.

or simply fix missing KERN_CONTs and simply do not add any new missing \n

> Now we'll have to have patches adding hundreds to thousands of the
> missing KERN_CONTs for continuation lines that weren't previously a
> problem in logging output but are now.

I would be really surprised if we really had that many continuation
lines. They should be avoided as much as possible. Hundreds of thousands
just sounds more than over exaggerated...

I think you are just making much bigger deal from this than necessary.
Not requiring \n at the end of strings just makes a lot of sense if we
have a KERN_CONT with a well defined semantic. Which was the whole point
of the patch from Linus AFAIU. If there are some left overs, so what, we
can fix them as soon as somebody notices. The worst thing we will get is
a messy output but no information should be lost. We used to have a
messy output in the past regardless and we could live with it...

That being said I would be happier to know about this change before I
had to scratch my head when seeing this for the first time so a heads up
would be more than appreciated but fixing these issues is trivial not
not worth making a lot of noise about.

-- 
Michal Hocko
SUSE Labs


Re: linux.git: printk() problem

2016-10-13 Thread Michal Hocko
On Thu 13-10-16 02:29:46, Joe Perches wrote:
> On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
[...]
> > I think they are not critical and can be fix once somebody notices.
>  
> As do I, but Linus objected to applying a patch when Colin Ian King
> noticed one.
> 
> I think the 250,000 or so uses with newlines are enough of a
> precedence to keep using newlines everywhere.

or simply fix missing KERN_CONTs and simply do not add any new missing \n

> Now we'll have to have patches adding hundreds to thousands of the
> missing KERN_CONTs for continuation lines that weren't previously a
> problem in logging output but are now.

I would be really surprised if we really had that many continuation
lines. They should be avoided as much as possible. Hundreds of thousands
just sounds more than over exaggerated...

I think you are just making much bigger deal from this than necessary.
Not requiring \n at the end of strings just makes a lot of sense if we
have a KERN_CONT with a well defined semantic. Which was the whole point
of the patch from Linus AFAIU. If there are some left overs, so what, we
can fix them as soon as somebody notices. The worst thing we will get is
a messy output but no information should be lost. We used to have a
messy output in the past regardless and we could live with it...

That being said I would be happier to know about this change before I
had to scratch my head when seeing this for the first time so a heads up
would be more than appreciated but fixing these issues is trivial not
not worth making a lot of noise about.

-- 
Michal Hocko
SUSE Labs


Re: linux.git: printk() problem

2016-10-13 Thread Joe Perches
On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
> On Wed 12-10-16 09:08:58, Joe Perches wrote:
> > On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> > > On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > > > Hello.
> > > > 
> > > > I noticed that current linux.git generates hardly readable console 
> > > > output
> > > > due to KERN_CONT changes. Are you suggesting developers that output like
> > > > this be fixed?
> > > 
> > > 
> > > 
> > > Joe has already posted a patch
> > > http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git@perches.com
> > > 
> > > And in general I think that adding those KERN_CONT is a good thing to
> > > do.
> > 
> > 
> > As do I, but there are about a quarter _million_ uses of
> > printk/logging messages in the kernel tree with newlines.
> > 
> > Maybe 100 or so of the existing messages lack terminating
> > newlines.
> 
> 
> I think they are not critical and can be fix once somebody notices.
 
As do I, but Linus objected to applying a patch when Colin Ian King
noticed one.

I think the 250,000 or so uses with newlines are enough of a
precedence to keep using newlines everywhere.

Now we'll have to have patches adding hundreds to thousands of the
missing KERN_CONTs for continuation lines that weren't previously a
problem in logging output but are now.




Re: linux.git: printk() problem

2016-10-13 Thread Joe Perches
On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
> On Wed 12-10-16 09:08:58, Joe Perches wrote:
> > On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> > > On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > > > Hello.
> > > > 
> > > > I noticed that current linux.git generates hardly readable console 
> > > > output
> > > > due to KERN_CONT changes. Are you suggesting developers that output like
> > > > this be fixed?
> > > 
> > > 
> > > 
> > > Joe has already posted a patch
> > > http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git@perches.com
> > > 
> > > And in general I think that adding those KERN_CONT is a good thing to
> > > do.
> > 
> > 
> > As do I, but there are about a quarter _million_ uses of
> > printk/logging messages in the kernel tree with newlines.
> > 
> > Maybe 100 or so of the existing messages lack terminating
> > newlines.
> 
> 
> I think they are not critical and can be fix once somebody notices.
 
As do I, but Linus objected to applying a patch when Colin Ian King
noticed one.

I think the 250,000 or so uses with newlines are enough of a
precedence to keep using newlines everywhere.

Now we'll have to have patches adding hundreds to thousands of the
missing KERN_CONTs for continuation lines that weren't previously a
problem in logging output but are now.




Re: linux.git: printk() problem

2016-10-13 Thread Michal Hocko
On Wed 12-10-16 09:08:58, Joe Perches wrote:
> On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> > On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > > Hello.
> > > 
> > > I noticed that current linux.git generates hardly readable console output
> > > due to KERN_CONT changes. Are you suggesting developers that output like
> > > this be fixed?
> > 
> > 
> > Joe has already posted a patch
> > http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git@perches.com
> > 
> > And in general I think that adding those KERN_CONT is a good thing to
> > do.
> 
> As do I, but there are about a quarter _million_ uses of
> printk/logging messages in the kernel tree with newlines.
> 
> Maybe 100 or so of the existing messages lack terminating
> newlines.

I think they are not critical and can be fix once somebody notices.

-- 
Michal Hocko
SUSE Labs


Re: linux.git: printk() problem

2016-10-13 Thread Michal Hocko
On Wed 12-10-16 09:08:58, Joe Perches wrote:
> On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> > On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > > Hello.
> > > 
> > > I noticed that current linux.git generates hardly readable console output
> > > due to KERN_CONT changes. Are you suggesting developers that output like
> > > this be fixed?
> > 
> > 
> > Joe has already posted a patch
> > http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git@perches.com
> > 
> > And in general I think that adding those KERN_CONT is a good thing to
> > do.
> 
> As do I, but there are about a quarter _million_ uses of
> printk/logging messages in the kernel tree with newlines.
> 
> Maybe 100 or so of the existing messages lack terminating
> newlines.

I think they are not critical and can be fix once somebody notices.

-- 
Michal Hocko
SUSE Labs


Re: linux.git: printk() problem

2016-10-12 Thread Linus Torvalds
On Wed, Oct 12, 2016 at 9:16 AM, Joe Perches  wrote:
>> (but with the new world order you actually *can* combine KERN_CONT
>> with a loglevel, so that if the beginning od the line got flushed, the
>> continuation can still be printed with the right log level).
>
> I think that might not be a good idea.
>
> Anything that uses a KERN_CONT with a new log level
> might as well be converted into multiple printks.

Generally yes.

The immediate reason for it was some really really nasty and horrible
indirection in the ACPI layer, which goes through something like
fifteen million different abstraction layers before it actually hits
"printk()", and some of them do vsprintf() in the middle.

It was something like ACPI_INFO() -> acpi_os_printf() ->
acpi_os_vprintf() which is completely broken and *always* has that
KERN_CONT marker at the beginning, but earlier phases can have the
level marker in it already..

In other words, it was a terminally broken piece of nasty code, but
there was no way I was going to touch the ACPI "OS independent"
layers, so I just said "screw this, if you want to mix KERN_CONT with
a level marker, you can".

So I agree, you normally shouldn't need to. In fact, the fewer
KERN_CONT's I see in the kernel, the better off we are. But
_sometimes_ KERN_CONT makes sense (and the OOM code is likely one of
the few places where it really might be the best of a number of bad
options: lots of information that needs to be pretty dense, and we
obviously can't afford to allocate a buffer for it dynamically, and
doing so statically isn't a great option either..)

 Linus


Re: linux.git: printk() problem

2016-10-12 Thread Linus Torvalds
On Wed, Oct 12, 2016 at 9:16 AM, Joe Perches  wrote:
>> (but with the new world order you actually *can* combine KERN_CONT
>> with a loglevel, so that if the beginning od the line got flushed, the
>> continuation can still be printed with the right log level).
>
> I think that might not be a good idea.
>
> Anything that uses a KERN_CONT with a new log level
> might as well be converted into multiple printks.

Generally yes.

The immediate reason for it was some really really nasty and horrible
indirection in the ACPI layer, which goes through something like
fifteen million different abstraction layers before it actually hits
"printk()", and some of them do vsprintf() in the middle.

It was something like ACPI_INFO() -> acpi_os_printf() ->
acpi_os_vprintf() which is completely broken and *always* has that
KERN_CONT marker at the beginning, but earlier phases can have the
level marker in it already..

In other words, it was a terminally broken piece of nasty code, but
there was no way I was going to touch the ACPI "OS independent"
layers, so I just said "screw this, if you want to mix KERN_CONT with
a level marker, you can".

So I agree, you normally shouldn't need to. In fact, the fewer
KERN_CONT's I see in the kernel, the better off we are. But
_sometimes_ KERN_CONT makes sense (and the OOM code is likely one of
the few places where it really might be the best of a number of bad
options: lots of information that needs to be pretty dense, and we
obviously can't afford to allocate a buffer for it dynamically, and
doing so statically isn't a great option either..)

 Linus


Re: linux.git: printk() problem

2016-10-12 Thread Joe Perches
On Wed, 2016-10-12 at 08:47 -0700, Linus Torvalds wrote:
< We'll see. But the other issues are easily fixed by just adding
> KERN_CONT where appropriate. It was actually very much what you were
> supposed to do before too, if only as a marker to others that "yes,
> I'm actually doing this, and no, I'm not supposed to have a log level"
> (but with the new world order you actually *can* combine KERN_CONT
> with a loglevel, so that if the beginning od the line got flushed, the
> continuation can still be printed with the right log level).

I think that might not be a good idea.

Anything that uses a KERN_CONT with a new log level
might as well be converted into multiple printks.



Re: linux.git: printk() problem

2016-10-12 Thread Joe Perches
On Wed, 2016-10-12 at 08:47 -0700, Linus Torvalds wrote:
< We'll see. But the other issues are easily fixed by just adding
> KERN_CONT where appropriate. It was actually very much what you were
> supposed to do before too, if only as a marker to others that "yes,
> I'm actually doing this, and no, I'm not supposed to have a log level"
> (but with the new world order you actually *can* combine KERN_CONT
> with a loglevel, so that if the beginning od the line got flushed, the
> continuation can still be printed with the right log level).

I think that might not be a good idea.

Anything that uses a KERN_CONT with a new log level
might as well be converted into multiple printks.



Re: linux.git: printk() problem

2016-10-12 Thread Joe Perches
On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > Hello.
> > 
> > I noticed that current linux.git generates hardly readable console output
> > due to KERN_CONT changes. Are you suggesting developers that output like
> > this be fixed?
> 
> 
> Joe has already posted a patch
> http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git@perches.com
> 
> And in general I think that adding those KERN_CONT is a good thing to
> do.

As do I, but there are about a quarter _million_ uses of
printk/logging messages in the kernel tree with newlines.

Maybe 100 or so of the existing messages lack terminating
newlines.

There are about 1800 uses of KERN_CONT/pr_cont today.

There are multiple thousands of uses of bare printks used as
printk line continuations that would need updating.

Most of those multiple thousands are in drivers of ancient and
out of manufacture devices or are subsystems that are old, crufty
and effectively unmaintained.  These might not need to be updated.

Still, there are probably hundreds to low thousands of actively
used device drivers that will need KERN?CONT or pr_cont update to
avoid logging output defects.



Re: linux.git: printk() problem

2016-10-12 Thread Joe Perches
On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > Hello.
> > 
> > I noticed that current linux.git generates hardly readable console output
> > due to KERN_CONT changes. Are you suggesting developers that output like
> > this be fixed?
> 
> 
> Joe has already posted a patch
> http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git@perches.com
> 
> And in general I think that adding those KERN_CONT is a good thing to
> do.

As do I, but there are about a quarter _million_ uses of
printk/logging messages in the kernel tree with newlines.

Maybe 100 or so of the existing messages lack terminating
newlines.

There are about 1800 uses of KERN_CONT/pr_cont today.

There are multiple thousands of uses of bare printks used as
printk line continuations that would need updating.

Most of those multiple thousands are in drivers of ancient and
out of manufacture devices or are subsystems that are old, crufty
and effectively unmaintained.  These might not need to be updated.

Still, there are probably hundreds to low thousands of actively
used device drivers that will need KERN?CONT or pr_cont update to
avoid logging output defects.



Re: linux.git: printk() problem

2016-10-12 Thread Linus Torvalds
On Wed, Oct 12, 2016 at 6:30 AM, Tetsuo Handa
 wrote:
>
> I noticed that current linux.git generates hardly readable console output
> due to KERN_CONT changes. Are you suggesting developers that output like
> this be fixed?

Yes. Needing to add a few KERN_CONT markers was pretty much expected,
since it's about 5 years since we enfroced it and new code won't
necessarily have it (and even back then I don't think we _always_ got
it right).

That said, looking at the printk's in the lowmem code, I think it
could be useful if there was some effort to see if the code could
somehow avoid the multi-printk thing. This is actually one area where

 (a) the problem actually happens while the system is running, rather
than booting

 (b) I've seen line mixing in the past

but the short term fix is to just add KERN_CONT markers to the lines
that are continuations.

NOTE! The reason I mention that (a) thing that it has traditionally
made it much messier to do logging of continuation lines in the first
place (because more things are going on and often one problem leads to
another and then the mixing is much more likely), but I actually
intentionally made it more likely to trigger the flushing issue in
commit bfd8d3f23b51 ("printk: make reading the kernel log flush
pending lines").

So if there is an active system logger that is reading messages *when*
one of those "one line in multiple printk's" things happen, that log
reader will now potentially cause the logging to be broken up because
the act of reading will flush the pending lines.

Now, honestly, that is something that we may end up reverting, but I'd
_like_ to try not to. Because without that flushing, there might be
one last partial line that the logger never sees. So it was me trying
to be aggressive about those partial lines, and the *hope* is that we
can just keep it, and that we can look at areas that have problems
with it.

We'll see. But the other issues are easily fixed by just adding
KERN_CONT where appropriate. It was actually very much what you were
supposed to do before too, if only as a marker to others that "yes,
I'm actually doing this, and no, I'm not supposed to have a log level"
(but with the new world order you actually *can* combine KERN_CONT
with a loglevel, so that if the beginning od the line got flushed, the
continuation can still be printed with the right log level).

Linus


Re: linux.git: printk() problem

2016-10-12 Thread Linus Torvalds
On Wed, Oct 12, 2016 at 6:30 AM, Tetsuo Handa
 wrote:
>
> I noticed that current linux.git generates hardly readable console output
> due to KERN_CONT changes. Are you suggesting developers that output like
> this be fixed?

Yes. Needing to add a few KERN_CONT markers was pretty much expected,
since it's about 5 years since we enfroced it and new code won't
necessarily have it (and even back then I don't think we _always_ got
it right).

That said, looking at the printk's in the lowmem code, I think it
could be useful if there was some effort to see if the code could
somehow avoid the multi-printk thing. This is actually one area where

 (a) the problem actually happens while the system is running, rather
than booting

 (b) I've seen line mixing in the past

but the short term fix is to just add KERN_CONT markers to the lines
that are continuations.

NOTE! The reason I mention that (a) thing that it has traditionally
made it much messier to do logging of continuation lines in the first
place (because more things are going on and often one problem leads to
another and then the mixing is much more likely), but I actually
intentionally made it more likely to trigger the flushing issue in
commit bfd8d3f23b51 ("printk: make reading the kernel log flush
pending lines").

So if there is an active system logger that is reading messages *when*
one of those "one line in multiple printk's" things happen, that log
reader will now potentially cause the logging to be broken up because
the act of reading will flush the pending lines.

Now, honestly, that is something that we may end up reverting, but I'd
_like_ to try not to. Because without that flushing, there might be
one last partial line that the logger never sees. So it was me trying
to be aggressive about those partial lines, and the *hope* is that we
can just keep it, and that we can look at areas that have problems
with it.

We'll see. But the other issues are easily fixed by just adding
KERN_CONT where appropriate. It was actually very much what you were
supposed to do before too, if only as a marker to others that "yes,
I'm actually doing this, and no, I'm not supposed to have a log level"
(but with the new world order you actually *can* combine KERN_CONT
with a loglevel, so that if the beginning od the line got flushed, the
continuation can still be printed with the right log level).

Linus


Re: linux.git: printk() problem

2016-10-12 Thread Michal Hocko
On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> Hello.
> 
> I noticed that current linux.git generates hardly readable console output
> due to KERN_CONT changes. Are you suggesting developers that output like
> this be fixed?

Joe has already posted a patch
http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git@perches.com

And in general I think that adding those KERN_CONT is a good thing to
do.
-- 
Michal Hocko
SUSE Labs


Re: linux.git: printk() problem

2016-10-12 Thread Michal Hocko
On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> Hello.
> 
> I noticed that current linux.git generates hardly readable console output
> due to KERN_CONT changes. Are you suggesting developers that output like
> this be fixed?

Joe has already posted a patch
http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git@perches.com

And in general I think that adding those KERN_CONT is a good thing to
do.
-- 
Michal Hocko
SUSE Labs


linux.git: printk() problem

2016-10-12 Thread Tetsuo Handa
Hello.

I noticed that current linux.git generates hardly readable console output
due to KERN_CONT changes. Are you suggesting developers that output like
this be fixed?

[   93.723582] Mem-Info:
[   93.725919] active_anon:380970 inactive_anon:2098 isolated_anon:0
[   93.725919]  active_file:242 inactive_file:275 isolated_file:0
[   93.725919]  unevictable:0 dirty:0 writeback:65 unstable:0
[   93.725919]  slab_reclaimable:3825 slab_unreclaimable:12527
[   93.725919]  mapped:760 shmem:2164 pagetables:9297 bounce:0
[   93.725919]  free:14046 free_pcp:74 free_cma:0
[   93.735626] Node 0 active_anon:1523880kB inactive_anon:8392kB 
active_file:968kB inactive_file:1100kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB mapped:3040kB dirty:0kB writeback:244kB shmem:0kB shmem_thp: 
0kB shmem_pmdmapped: 1056768kB anon_thp: 8656kB writeback_tmp:0kB unstable:0kB 
pages_scanned:0 all_unreclaimable? no
[   93.742894] Node 0 [   93.743319] DMA free:7156kB min:408kB low:508kB 
high:608kB active_anon:8672kB inactive_anon:0kB active_file:0kB 
inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB 
managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:40kB 
kernel_stack:0kB pagetables:36kB bounce:0kB free_pcp:0kB local_pcp:0kB 
free_cma:0kB
[   93.750951] lowmem_reserve[]:[   93.751597]  0
 1699[   93.752678]  1699
 1699[   93.753910] 
[   93.754822] Node 0 [   93.755216] DMA32 free:49028kB min:44644kB low:55804kB 
high:66964kB active_anon:1515208kB inactive_anon:8392kB active_file:968kB 
inactive_file:960kB unevictable:0kB writepending:232kB present:2080640kB 
managed:1760736kB mlocked:0kB slab_reclaimable:15300kB 
slab_unreclaimable:50068kB kernel_stack:20320kB pagetables:37152kB bounce:0kB 
free_pcp:436kB local_pcp:240kB free_cma:0kB
[   93.763709] lowmem_reserve[]:[   93.764235]  0
 0[   93.765630]  0
 0[   93.767544] 
[   93.768676] Node 0 [   93.769135] DMA: 
1*4kB [   93.770932] (U) 
0*8kB [   93.772192] 1*16kB 
(U) [   93.773438] 1*32kB 
(M) [   93.774574] 1*64kB 
(U) [   93.775997] 1*128kB 
(U) [   93.777665] 1*256kB 
(U) [   93.778861] 1*512kB 
(M) [   93.780004] 2*1024kB 
(UM) [   93.781225] 0*2048kB 
1*4096kB [   93.782422] (M) 
[   93.783423] = 7156kB
[   93.784445] Node 0 [   93.784844] DMA32: 
1265*4kB [   93.786204] (UMEH) 
790*8kB [   93.787412] (UMEH) 
335*16kB [   93.788634] (UEH) 
265*32kB [   93.790037] (UE) 
89*64kB [   93.791254] (UEH) 
23*128kB [   93.792541] (UME) 
9*256kB [   93.793787] (ME) 
13*512kB [   93.794956] (UME) 
6*1024kB [   93.796094] (UME) 
0*2048kB [   93.797435] 0*4096kB 
[   93.798413] = 48964kB
[   93.799357] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=1048576kB
[   93.802221] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
[   93.804476] 2700 total pagecache pages
[   93.805949] 0 pages in swap cache
[   93.807557] Swap cache stats: add 0, delete 0, find 0/0
[   93.809417] Free swap  = 0kB
[   93.810573] Total swap = 0kB
[   93.811788] 524157 pages RAM
[   93.812903] 0 pages HighMem/MovableOnly
[   93.814216] 79997 pages reserved
[   93.815292] 0 pages hwpoisoned

[  185.497015] Showing all locks held in the system:
[  185.59] 1 lock held by systemd/1:
[  185.501377]  #0: [  185.501717]  (
&(>i_mmaplock)->mr_lock[  185.503347] ){++}
, at: [  185.504566] [] xfs_ilock+0xd0/0xe0
[  185.506157] 2 locks held by khungtaskd/46:
[  185.507553]  #0: [  185.507896]  (
rcu_read_lock[  185.509091] ){..}
, at: [  185.510277] [] watchdog+0x9e/0x480
[  185.512072]  #1: [  185.512413]  (
tasklist_lock[  185.513602] ){.+.+..}
, at: [  185.514802] [] debug_show_all_locks+0x3f/0x1b0


linux.git: printk() problem

2016-10-12 Thread Tetsuo Handa
Hello.

I noticed that current linux.git generates hardly readable console output
due to KERN_CONT changes. Are you suggesting developers that output like
this be fixed?

[   93.723582] Mem-Info:
[   93.725919] active_anon:380970 inactive_anon:2098 isolated_anon:0
[   93.725919]  active_file:242 inactive_file:275 isolated_file:0
[   93.725919]  unevictable:0 dirty:0 writeback:65 unstable:0
[   93.725919]  slab_reclaimable:3825 slab_unreclaimable:12527
[   93.725919]  mapped:760 shmem:2164 pagetables:9297 bounce:0
[   93.725919]  free:14046 free_pcp:74 free_cma:0
[   93.735626] Node 0 active_anon:1523880kB inactive_anon:8392kB 
active_file:968kB inactive_file:1100kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB mapped:3040kB dirty:0kB writeback:244kB shmem:0kB shmem_thp: 
0kB shmem_pmdmapped: 1056768kB anon_thp: 8656kB writeback_tmp:0kB unstable:0kB 
pages_scanned:0 all_unreclaimable? no
[   93.742894] Node 0 [   93.743319] DMA free:7156kB min:408kB low:508kB 
high:608kB active_anon:8672kB inactive_anon:0kB active_file:0kB 
inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB 
managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:40kB 
kernel_stack:0kB pagetables:36kB bounce:0kB free_pcp:0kB local_pcp:0kB 
free_cma:0kB
[   93.750951] lowmem_reserve[]:[   93.751597]  0
 1699[   93.752678]  1699
 1699[   93.753910] 
[   93.754822] Node 0 [   93.755216] DMA32 free:49028kB min:44644kB low:55804kB 
high:66964kB active_anon:1515208kB inactive_anon:8392kB active_file:968kB 
inactive_file:960kB unevictable:0kB writepending:232kB present:2080640kB 
managed:1760736kB mlocked:0kB slab_reclaimable:15300kB 
slab_unreclaimable:50068kB kernel_stack:20320kB pagetables:37152kB bounce:0kB 
free_pcp:436kB local_pcp:240kB free_cma:0kB
[   93.763709] lowmem_reserve[]:[   93.764235]  0
 0[   93.765630]  0
 0[   93.767544] 
[   93.768676] Node 0 [   93.769135] DMA: 
1*4kB [   93.770932] (U) 
0*8kB [   93.772192] 1*16kB 
(U) [   93.773438] 1*32kB 
(M) [   93.774574] 1*64kB 
(U) [   93.775997] 1*128kB 
(U) [   93.777665] 1*256kB 
(U) [   93.778861] 1*512kB 
(M) [   93.780004] 2*1024kB 
(UM) [   93.781225] 0*2048kB 
1*4096kB [   93.782422] (M) 
[   93.783423] = 7156kB
[   93.784445] Node 0 [   93.784844] DMA32: 
1265*4kB [   93.786204] (UMEH) 
790*8kB [   93.787412] (UMEH) 
335*16kB [   93.788634] (UEH) 
265*32kB [   93.790037] (UE) 
89*64kB [   93.791254] (UEH) 
23*128kB [   93.792541] (UME) 
9*256kB [   93.793787] (ME) 
13*512kB [   93.794956] (UME) 
6*1024kB [   93.796094] (UME) 
0*2048kB [   93.797435] 0*4096kB 
[   93.798413] = 48964kB
[   93.799357] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=1048576kB
[   93.802221] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
[   93.804476] 2700 total pagecache pages
[   93.805949] 0 pages in swap cache
[   93.807557] Swap cache stats: add 0, delete 0, find 0/0
[   93.809417] Free swap  = 0kB
[   93.810573] Total swap = 0kB
[   93.811788] 524157 pages RAM
[   93.812903] 0 pages HighMem/MovableOnly
[   93.814216] 79997 pages reserved
[   93.815292] 0 pages hwpoisoned

[  185.497015] Showing all locks held in the system:
[  185.59] 1 lock held by systemd/1:
[  185.501377]  #0: [  185.501717]  (
&(>i_mmaplock)->mr_lock[  185.503347] ){++}
, at: [  185.504566] [] xfs_ilock+0xd0/0xe0
[  185.506157] 2 locks held by khungtaskd/46:
[  185.507553]  #0: [  185.507896]  (
rcu_read_lock[  185.509091] ){..}
, at: [  185.510277] [] watchdog+0x9e/0x480
[  185.512072]  #1: [  185.512413]  (
tasklist_lock[  185.513602] ){.+.+..}
, at: [  185.514802] [] debug_show_all_locks+0x3f/0x1b0