Re: [PATCH] printk: add cpu id information to printk() output

2023-09-15 Thread John Ogness
On 2023-09-15, Enlin Mu  wrote:
> Sometimes we want to print cpu id of printk() messages to consoles
>
> diff --git a/include/linux/threads.h b/include/linux/threads.h
> index c34173e6c5f1..6700bd9a174f 100644
> --- a/include/linux/threads.h
> +++ b/include/linux/threads.h
> @@ -34,6 +34,9 @@
>  #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
>   (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
>  
> +#define CPU_ID_SHIFT 23
> +#define CPU_ID_MASK  0xff80

This only supports 256 CPUs. I think it doesn't make sense to try to
squish CPU and Task IDs into 32 bits.

What about introducing a caller_id option to always only print the CPU
ID? Or do you really need Task _and_ CPU?

John Ogness


Re: [PATCH] iommu/amd: Fix extended features logging

2021-04-11 Thread John Ogness
On 2021-04-11, Alexander Monakov  wrote:
>>> The second line is emitted via 'pr_cont', which causes it to have a
>>> different ('warn') loglevel compared to the previous line ('info').
>>> 
>>> Commit 9a295ff0ffc9 attempted to rectify this by removing the newline
>>> from the pci_info format string, but this doesn't work, as pci_info
>>> calls implicitly append a newline anyway.
>> 
>> Hmm, did I really screw that up during my testing? I am sorry about that.
>> 
>> I tried to wrap my head around, where the newline is implicitly appended, and
>> only found the definitions below.
>> 
>> include/linux/pci.h:#define pci_info(pdev, fmt, arg...)
>> dev_info(&(pdev)->dev, fmt, ##arg)
>> 
>> include/linux/dev_printk.h:#define dev_info(dev, fmt, ...)
>> \
>> include/linux/dev_printk.h: _dev_info(dev, dev_fmt(fmt),
>> ##__VA_ARGS__)
>> 
>> include/linux/dev_printk.h:__printf(2, 3) __cold
>> include/linux/dev_printk.h:void _dev_info(const struct device *dev, const
>> char *fmt, ...);
>> 
>> include/linux/compiler_attributes.h:#define __printf(a, b)
>> __attribute__((__format__(printf, a, b)))
>
> Yeah, it's not obvious: it happens in kernel/printk/printk.c:vprintk_store
> where it does
>
>   if (dev_info)
>   lflags |= LOG_NEWLINE;
>
> It doesn't seem to be documented; I think it prevents using pr_cont with
> "rich" printk facilities that go via _dev_info.
>
> I suspect it quietly changed in commit c313af145b9bc ("printk() - isolate
> KERN_CONT users from ordinary complete lines").

Yes, this behavior has been around for a while. I see no reason why it
should be that way. These days printk does not care if there is dev_info
included or not.

>> In the discussion *smpboot: CPU numbers printed as warning* [1] John wrote:
>> 
>>> It is supported to provide loglevels for CONT messages. The loglevel is
>>> then only used if the append fails:
>>> 
>>> pr_cont(KERN_INFO "message part");
>>> 
>>> I don't know if we want to go down that path. But it is supported.
>
> Yeah, I saw that, but decided to go with the 'pr_info("")' solution, because
> it is less magic, and already used in two other drivers.

Note that what I was suggesting was to fix a different issue: If the
pr_cont() caller is interrupted by another printk user, then the
following pr_cont() calls will use the default loglevel. By explicitly
specifying the loglevel in pr_cont(), you can be sure that those pieces
get the desired loglevel, even if those pieces get separated off because
of an interrupting printk user.

So even if we fix dev_info to allow pr_cont continuation, it still may
be desirable to specify the loglevel in the pr_cont pieces.

> pr_info("") will also prepend 'AMD-Vi:' to the feature list, which is
> nice.

I'd rather fix dev_info callers to allow pr_cont and then fix any code
that is using this workaround.

And if the print maintainers agree it is OK to encourage
pr_cont(LOGLEVEL "...") usage, then people should really start using
that if the loglevel on those pieces is important.

John Ogness


Re: [PATCH printk v2 2/5] printk: remove safe buffers

2021-04-06 Thread John Ogness
On 2021-04-01, Petr Mladek  wrote:
>> Caller-id solves this problem and is easy to sort for anyone with
>> `grep'. Yes, it is a shame that `dmesg' does not show it, but
>> directly using any of the printk interfaces does show it (kmsg_dump,
>> /dev/kmsg, syslog, console).
>
> True but frankly, the current situation is _far_ from convenient:
>
>+ consoles do not show it by default
>+ none userspace tool (dmesg, journalctl, crash) is able to show it
>+ grep is a nightmare, especially if you have more than handful of CPUs
>
> Yes, everything is solvable but not easily.
>
>> > I get this with "echo l >/proc/sysrq-trigger" and this patchset:
>> 
>> Of course. Without caller-id, it is a mess. But this has nothing to do
>> with NMI. The same problem exists for WARN_ON() on multiple CPUs
>> simultaneously. If the user is not using caller-id, they are
>> lost. Caller-id is the current solution to the interlaced logs.
>
> Sure. But in reality, the risk of mixed WARN_ONs is small. While
> this patch makes backtraces from all CPUs always unusable without
> caller_id and non-trivial effort.

I would prefer we solve the situation for non-NMI as well, not just for
the sysrq "l" case.

>> For the long term, we should introduce a printk-context API that allows
>> callers to perfectly pack their multi-line output into a single
>> entry. We discussed [0][1] this back in August 2020.
>
> We need a "short" term solution. There are currently 3 solutions:
>
> 1. Keep nmi_safe() and all the hacks around.
>
> 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
>the special lock used also by atomic consoles.
>
> 3. Tell complaining people how to sort the messed logs.

Or we look into the long term solution now. If caller-id's cannot not be
used as the solution (because nobody turns it on, nobody knows about it,
and/or distros do not enable it), then we should look at how to make at
least the backtraces contiguous. I have a few ideas here.

John Ogness


Re: [PATCH printk v2 2/5] printk: remove safe buffers

2021-04-01 Thread John Ogness
On 2021-04-01, Petr Mladek  wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
>>   new_descs, ilog2(new_descs_count),
>>   new_infos);
>>  
>> -printk_safe_enter_irqsave(flags);
>> +local_irq_save(flags);
>
> IMHO, we actually do not have to disable IRQ here. We already copy
> messages that might appear in the small race window in NMI. It would
> work the same way also for IRQ context.

We do not have to, but why open up this window? We are still in early
boot and interrupts have always been disabled here. I am not happy that
this window even exists. I really prefer to keep it NMI-only.

>> --- a/lib/nmi_backtrace.c
>> +++ b/lib/nmi_backtrace.c
>> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>>  touch_softlockup_watchdog();
>>  }
>>  
>> -/*
>> - * Force flush any remote buffers that might be stuck in IRQ context
>> - * and therefore could not run their irq_work.
>> - */
>> -printk_safe_flush();
>
> Sigh, this reminds me that the nmi_safe buffers serialized backtraces
> from all CPUs.
>
> I am afraid that we have to put back the spinlock into
> nmi_cpu_backtrace().

Please no. That spinlock is a disaster. It can cause deadlocks with
other cpu-locks (such as in kdb) and it will cause a major problem for
atomic consoles. We need to be very careful about introducing locks
where NMIs are waiting on other CPUs.

> It has been repeatedly added and removed depending
> on whether the backtrace was printed into the main log buffer
> or into the per-CPU buffers. Last time it was removed by
> the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
> when accessing the main log buffer in NMI").
>
> It should be safe because there should not be any other locks in the
> code path. Note that only one backtrace might be triggered at the same
> time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().

It is adding a lock around a lockless ringbuffer. For me that is a step
backwards.

> We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
> looks less evil than the nmi_safe machinery. nmi_safe() shrinks
> too long backtraces, lose timestamps, needs to be explicitely
> flushed here and there, is a non-trivial code.
>
> [*] Non-serialized bactraces are real mess. Caller-id is visible
> only on consoles or via syslogd interface. And it is not much
> convenient.

Caller-id solves this problem and is easy to sort for anyone with
`grep'. Yes, it is a shame that `dmesg' does not show it, but directly
using any of the printk interfaces does show it (kmsg_dump, /dev/kmsg,
syslog, console).

> I get this with "echo l >/proc/sysrq-trigger" and this patchset:

Of course. Without caller-id, it is a mess. But this has nothing to do
with NMI. The same problem exists for WARN_ON() on multiple CPUs
simultaneously. If the user is not using caller-id, they are
lost. Caller-id is the current solution to the interlaced logs.

For the long term, we should introduce a printk-context API that allows
callers to perfectly pack their multi-line output into a single
entry. We discussed [0][1] this back in August 2020.

John Ogness

[0] 
https://lore.kernel.org/lkml/472f2e553805b52d9834d64e4056db965edee329.ca...@perches.com
[1] offlist message-id: 87d03k9ymz@jogness.linutronix.de


Re: [PATCH printk v2 2/5] printk: remove safe buffers

2021-03-31 Thread John Ogness
On 2021-03-30, John Ogness  wrote:
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index e971c0a9ec9e..f090d6a1b39e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1772,16 +1759,21 @@ static struct task_struct *console_owner;
>  static bool console_waiter;
>  
>  /**
> - * console_lock_spinning_enable - mark beginning of code where another
> + * console_lock_spinning_enable_irqsave - mark beginning of code where 
> another
>   *   thread might safely busy wait
>   *
>   * This basically converts console_lock into a spinlock. This marks
>   * the section where the console_lock owner can not sleep, because
>   * there may be a waiter spinning (like a spinlock). Also it must be
>   * ready to hand over the lock at the end of the section.
> + *
> + * This disables interrupts because the hand over to a waiter must not be
> + * interrupted until the hand over is completed (@console_waiter is cleared).
>   */
> -static void console_lock_spinning_enable(void)
> +static void console_lock_spinning_enable_irqsave(unsigned long *flags)

I missed the prototype change for the !CONFIG_PRINTK case, resulting in:

linux/kernel/printk/printk.c:2707:3: error: implicit declaration of function 
‘console_lock_spinning_enable_irqsave’; did you mean 
‘console_lock_spinning_enable’? [-Werror=implicit-function-declaration]
   console_lock_spinning_enable_irqsave();
   ^~~~
   console_lock_spinning_enable

Will be fixed for v3.

(I have now officially added !CONFIG_PRINTK to my CI tests.)

John Ogness


[PATCH printk v2 4/5] printk: convert @syslog_lock to mutex

2021-03-30 Thread John Ogness
@syslog_lock was a raw_spin_lock to simplify the transition of
removing @logbuf_lock and the safe buffers. With that transition
complete, and since all uses of @syslog_lock are within sleepable
contexts, @syslog_lock can become a mutex.

Note that until now register_console() would disable interrupts
using irqsave, which implies that it may be called with interrupts
disabled. And indeed, there is one possible call chain on parisc
where this happens:

handle_interruption(code=1) /* High-priority machine check (HPMC) */
  pdc_console_restart()
pdc_console_init_force()
  register_console()

However, register_console() calls console_lock(), which might sleep.
So it has never been allowed to call register_console() from an
atomic context and the above call chain is a bug.

Signed-off-by: John Ogness 
---
 Note: The removal of read_syslog_seq_irq() is technically a small
   step backwards. But the follow-up patch moves forward again
   and closes a window that existed with read_syslog_seq_irq()
   and @syslog_lock as a spin_lock.

 kernel/printk/printk.c | 49 +-
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f090d6a1b39e..b771aae46445 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,7 +356,7 @@ enum log_flags {
 };
 
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
-static DEFINE_RAW_SPINLOCK(syslog_lock);
+static DEFINE_MUTEX(syslog_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -1497,9 +1497,9 @@ static int syslog_print(char __user *buf, int size)
size_t n;
size_t skip;
 
-   raw_spin_lock_irq(_lock);
+   mutex_lock(_lock);
if (!prb_read_valid(prb, syslog_seq, )) {
-   raw_spin_unlock_irq(_lock);
+   mutex_unlock(_lock);
break;
}
if (r.info->seq != syslog_seq) {
@@ -1528,7 +1528,7 @@ static int syslog_print(char __user *buf, int size)
syslog_partial += n;
} else
n = 0;
-   raw_spin_unlock_irq(_lock);
+   mutex_unlock(_lock);
 
if (!n)
break;
@@ -1592,9 +1592,9 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
}
 
if (clear) {
-   raw_spin_lock_irq(_lock);
+   mutex_lock(_lock);
latched_seq_write(_seq, seq);
-   raw_spin_unlock_irq(_lock);
+   mutex_unlock(_lock);
}
 
kfree(text);
@@ -1603,21 +1603,9 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 
 static void syslog_clear(void)
 {
-   raw_spin_lock_irq(_lock);
+   mutex_lock(_lock);
latched_seq_write(_seq, prb_next_seq(prb));
-   raw_spin_unlock_irq(_lock);
-}
-
-/* Return a consistent copy of @syslog_seq. */
-static u64 read_syslog_seq_irq(void)
-{
-   u64 seq;
-
-   raw_spin_lock_irq(_lock);
-   seq = syslog_seq;
-   raw_spin_unlock_irq(_lock);
-
-   return seq;
+   mutex_unlock(_lock);
 }
 
 int do_syslog(int type, char __user *buf, int len, int source)
@@ -1626,6 +1614,7 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
bool clear = false;
static int saved_console_loglevel = LOGLEVEL_DEFAULT;
int error;
+   u64 seq;
 
error = check_syslog_permissions(type, source);
if (error)
@@ -1644,8 +1633,12 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
if (!access_ok(buf, len))
return -EFAULT;
 
-   error = wait_event_interruptible(log_wait,
-   prb_read_valid(prb, read_syslog_seq_irq(), 
NULL));
+   /* Get a consistent copy of @syslog_seq. */
+   mutex_lock(_lock);
+   seq = syslog_seq;
+   mutex_unlock(_lock);
+
+   error = wait_event_interruptible(log_wait, prb_read_valid(prb, 
seq, NULL));
if (error)
return error;
error = syslog_print(buf, len);
@@ -1693,10 +1686,10 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
break;
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
-   raw_spin_lock_irq(_lock);
+   mutex_lock(_lock);
if (!prb_read_valid_info(prb, syslog_seq, , NULL)) {
/* No unread messages. */
-   raw_spin_unlock_irq(_lock);
+   mutex_unlock(_lock);
return 0;
}
if (info.seq != syslog_seq) {
@@ -1714,7 +1707,6 @@ int do_syslog(int type, char __user *buf, int len, int 
sou

[PATCH printk v2 5/5] printk: syslog: close window between wait and read

2021-03-30 Thread John Ogness
Syslog's SYSLOG_ACTION_READ is supposed to block until the next
syslog record can be read, and then it should read that record.
However, because @syslog_lock is not held between waking up and
reading the record, another reader could read the record first,
thus causing SYSLOG_ACTION_READ to return with a value of 0, never
having read _anything_.

By holding @syslog_lock between waking up and reading, it can be
guaranteed that SYSLOG_ACTION_READ blocks until it successfully
reads a syslog record (or a real error occurs).

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 50 +++---
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b771aae46445..bd23f00ebc32 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1486,6 +1486,7 @@ static int syslog_print(char __user *buf, int size)
struct printk_record r;
char *text;
int len = 0;
+   u64 seq;
 
text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
if (!text)
@@ -1493,11 +1494,38 @@ static int syslog_print(char __user *buf, int size)
 
prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
-   while (size > 0) {
+   /* Get a consistent copy of @syslog_seq. */
+   mutex_lock(_lock);
+   seq = syslog_seq;
+   mutex_unlock(_lock);
+
+   /* Wait for the @syslog_seq record to be available. */
+   for (;;) {
+   len = wait_event_interruptible(log_wait, prb_read_valid(prb, 
seq, NULL));
+   if (len)
+   goto out;
+
+   /*
+* @syslog_seq may have changed while waiting. If so, wait
+* for the new @syslog_seq record.
+*/
+
+   mutex_lock(_lock);
+   if (syslog_seq == seq)
+   break;
+   seq = syslog_seq;
+   mutex_unlock(_lock);
+   }
+
+   /*
+* @syslog_lock is held when entering the read loop to prevent
+* another reader from modifying @syslog_seq.
+*/
+
+   for (;;) {
size_t n;
size_t skip;
 
-   mutex_lock(_lock);
if (!prb_read_valid(prb, syslog_seq, )) {
mutex_unlock(_lock);
break;
@@ -1542,8 +1570,13 @@ static int syslog_print(char __user *buf, int size)
len += n;
size -= n;
buf += n;
-   }
 
+   if (!size)
+   break;
+
+   mutex_lock(_lock);
+   }
+out:
kfree(text);
return len;
 }
@@ -1614,7 +1647,6 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
bool clear = false;
static int saved_console_loglevel = LOGLEVEL_DEFAULT;
int error;
-   u64 seq;
 
error = check_syslog_permissions(type, source);
if (error)
@@ -1632,15 +1664,6 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
return 0;
if (!access_ok(buf, len))
return -EFAULT;
-
-   /* Get a consistent copy of @syslog_seq. */
-   mutex_lock(_lock);
-   seq = syslog_seq;
-   mutex_unlock(_lock);
-
-   error = wait_event_interruptible(log_wait, prb_read_valid(prb, 
seq, NULL));
-   if (error)
-   return error;
error = syslog_print(buf, len);
break;
/* Read/clear last kernel messages */
@@ -1707,6 +1730,7 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
} else {
bool time = syslog_partial ? syslog_time : printk_time;
unsigned int line_count;
+   u64 seq;
 
prb_for_each_info(syslog_seq, prb, seq, ,
  _count) {
-- 
2.20.1



[PATCH printk v2 1/5] printk: track/limit recursion

2021-03-30 Thread John Ogness
Currently the printk safe buffers provide a form of recursion
protection by redirecting to the safe buffers whenever printk() is
recursively called.

In preparation for removal of the safe buffers, provide an alternate
explicit recursion protection. Recursion is limited to 3 levels
per-CPU and per-context.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 83 --
 1 file changed, 80 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 421c35571797..e971c0a9ec9e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1940,6 +1940,74 @@ static void call_console_drivers(const char *ext_text, 
size_t ext_len,
}
 }
 
+/*
+ * Recursion is tracked separately on each CPU. If NMIs are supported, an
+ * additional NMI context per CPU is also separately tracked. Until per-CPU
+ * is available, a separate "early tracking" is performed.
+ */
+static DEFINE_PER_CPU(u8, printk_count);
+static u8 printk_count_early;
+#ifdef CONFIG_HAVE_NMI
+static DEFINE_PER_CPU(u8, printk_count_nmi);
+static u8 printk_count_nmi_early;
+#endif
+
+/*
+ * Recursion is limited to keep the output sane. printk() should not require
+ * more than 1 level of recursion (allowing, for example, printk() to trigger
+ * a WARN), but a higher value is used in case some printk-internal errors
+ * exist, such as the ringbuffer validation checks failing.
+ */
+#define PRINTK_MAX_RECURSION 3
+
+/*
+ * Return a pointer to the dedicated counter for the CPU+context of the
+ * caller.
+ */
+static u8 *printk_recursion_counter(void)
+{
+#ifdef CONFIG_HAVE_NMI
+   if (in_nmi()) {
+   if (printk_percpu_data_ready())
+   return this_cpu_ptr(_count_nmi);
+   return _count_nmi_early;
+   }
+#endif
+   if (printk_percpu_data_ready())
+   return this_cpu_ptr(_count);
+   return _count_early;
+}
+
+/*
+ * Enter recursion tracking. Interrupts are disabled to simplify tracking.
+ * The caller must check the return value to see if the recursion is allowed.
+ * On failure, interrupts are not disabled.
+ */
+static bool printk_enter_irqsave(unsigned long *flags)
+{
+   u8 *count;
+
+   local_irq_save(*flags);
+   count = printk_recursion_counter();
+   if (*count > PRINTK_MAX_RECURSION) {
+   local_irq_restore(*flags);
+   return false;
+   }
+   (*count)++;
+
+   return true;
+}
+
+/* Exit recursion tracking, restoring interrupts. */
+static void printk_exit_irqrestore(unsigned long flags)
+{
+   u8 *count;
+
+   count = printk_recursion_counter();
+   (*count)--;
+   local_irq_restore(flags);
+}
+
 int printk_delay_msec __read_mostly;
 
 static inline void printk_delay(void)
@@ -2040,11 +2108,13 @@ int vprintk_store(int facility, int level,
struct prb_reserved_entry e;
enum log_flags lflags = 0;
struct printk_record r;
+   unsigned long irqflags;
u16 trunc_msg_len = 0;
char prefix_buf[8];
u16 reserve_size;
va_list args2;
u16 text_len;
+   int ret = 0;
u64 ts_nsec;
 
/*
@@ -2055,6 +2125,9 @@ int vprintk_store(int facility, int level,
 */
ts_nsec = local_clock();
 
+   if (!printk_enter_irqsave())
+   return 0;
+
/*
 * The sprintf needs to come first since the syslog prefix might be
 * passed in as a parameter. An extra byte must be reserved so that
@@ -2092,7 +2165,8 @@ int vprintk_store(int facility, int level,
prb_commit();
}
 
-   return text_len;
+   ret = text_len;
+   goto out;
}
}
 
@@ -2108,7 +2182,7 @@ int vprintk_store(int facility, int level,
 
prb_rec_init_wr(, reserve_size + trunc_msg_len);
if (!prb_reserve(, prb, ))
-   return 0;
+   goto out;
}
 
/* fill message */
@@ -2130,7 +2204,10 @@ int vprintk_store(int facility, int level,
else
prb_final_commit();
 
-   return (text_len + trunc_msg_len);
+   ret = text_len + trunc_msg_len;
+out:
+   printk_exit_irqrestore(irqflags);
+   return ret;
 }
 
 asmlinkage int vprintk_emit(int facility, int level,
-- 
2.20.1



[PATCH printk v2 0/5] printk: remove safe buffers

2021-03-30 Thread John Ogness
Hi,

Here is v2 of a series to remove the safe buffers. v1 can be
found here [0]. The safe buffers are no longer needed because
messages can be stored directly into the log buffer from any
context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
also implemented for this series.

And finally, with the removal of the safe buffers, there is no
need for extra NMI enter/exit tracking. So this is also removed
(which includes removing config option CONFIG_PRINTK_NMI).

This series is based on the printk-rework branch of
printk/linux.git:

commit acebb5597ff1 ("kernel/printk.c: Fixed mundane typos")

Changes since v1:

- remove the printk nmi enter/exit tracking

- remove CONFIG_PRINTK_NMI config option

- use in_nmi() to detect NMI context

- remove unused printk_safe_enter/exit macros

- after switching to the dynamic buffer, copy over NMI records
  that may have arrived during the switch window

- use local_irq_*() instead of printk_safe_*() for console
  spinning

- use separate variables rather than arrays for the per-cpu
  recursion tracking

- make @syslog_lock a mutex instead of a spin_lock

- close the wait-read window for SYSLOG_ACTION_READ

- adjust various comments and commit messages as requested

John Ogness

[0] 
https://lore.kernel.org/lkml/20210316233326.10778-1-john.ogn...@linutronix.de

John Ogness (5):
  printk: track/limit recursion
  printk: remove safe buffers
  printk: remove NMI tracking
  printk: convert @syslog_lock to mutex
  printk: syslog: close window between wait and read

 arch/arm/kernel/smp.c  |   2 -
 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 arch/powerpc/kexec/crash.c |   3 -
 include/linux/hardirq.h|   2 -
 include/linux/printk.h |  22 --
 init/Kconfig   |   5 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |  23 ---
 kernel/printk/printk.c | 281 +++--
 kernel/printk/printk_safe.c| 362 +
 kernel/trace/trace.c   |   2 -
 lib/nmi_backtrace.c|   6 -
 14 files changed, 171 insertions(+), 547 deletions(-)

-- 
2.20.1



[PATCH printk v2 2/5] printk: remove safe buffers

2021-03-30 Thread John Ogness
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console lock is left in place. This is
because the console lock is needed for the actual printing.

Signed-off-by: John Ogness 
---
 Note: The follow-up patch removes the NMI tracking.

 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h |  10 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |  17 --
 kernel/printk/printk.c | 137 +-
 kernel/printk/printk_safe.c| 333 +
 lib/nmi_backtrace.c|   6 -
 9 files changed, 56 insertions(+), 457 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3ec7b443fe6b..7d2b339afcb0 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -170,7 +170,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index af3c15a1d41e..8ae46c5945d0 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -181,11 +181,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
-   /*
-* printk_safe_flush() seems to require another print
-* before anything actually goes out to console.
-*/
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe7eb2351610..2476796c1150 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char 
*fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char 
*log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 extern int kptr_restrict;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a0b6780740c8..480d5f77ef4f 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -977,7 +977,6 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
__crash_kexec(NULL);
 
/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 */
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
-   /* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 51615c909b2f..6cc35c5de890 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,7 +22,6 @@ __printf(1, 0) int vprintk_deferred(const char *fmt, va_list 
args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)   \
@@ -37,18 +36,6 @@ bool printk_percpu_data_ready(void);
local_irq_restore(flags);   \
} while (0)
 
-#define printk_safe_enter_irq()\
-   do {\
-   local_irq_disable();\
-   __printk_safe_enter

[PATCH printk v2 3/5] printk: remove NMI tracking

2021-03-30 Thread John Ogness
All NMI contexts are handled the same as the safe context: store the
message and defer printing. There is no need to have special NMI
context tracking for this. Using in_nmi() is enough.

Signed-off-by: John Ogness 
---
 arch/arm/kernel/smp.c   |  2 --
 arch/powerpc/kexec/crash.c  |  3 ---
 include/linux/hardirq.h |  2 --
 include/linux/printk.h  | 12 
 init/Kconfig|  5 -
 kernel/printk/internal.h|  6 --
 kernel/printk/printk_safe.c | 37 +
 kernel/trace/trace.c|  2 --
 8 files changed, 1 insertion(+), 68 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 5c48eb4fd0e5..77a720c1f402 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -671,9 +671,7 @@ static void do_handle_IPI(int ipinr)
break;
 
case IPI_CPU_BACKTRACE:
-   printk_nmi_enter();
nmi_cpu_backtrace(get_irq_regs());
-   printk_nmi_exit();
break;
 
default:
diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index c9a889880214..d488311efab1 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -311,9 +311,6 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
unsigned int i;
int (*old_handler)(struct pt_regs *regs);
 
-   /* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
-   printk_nmi_enter();
-
/*
 * This function is only called after the system
 * has panicked or is otherwise in a critical state.
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 7c9d6a2d7e90..0926e9ca4d85 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -115,7 +115,6 @@ extern void rcu_nmi_exit(void);
do {\
lockdep_off();  \
arch_nmi_enter();   \
-   printk_nmi_enter(); \
BUG_ON(in_nmi() == NMI_MASK);   \
__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);   \
} while (0)
@@ -134,7 +133,6 @@ extern void rcu_nmi_exit(void);
do {\
BUG_ON(!in_nmi());  \
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);   \
-   printk_nmi_exit();  \
arch_nmi_exit();\
lockdep_on();   \
} while (0)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 2476796c1150..77f66625706e 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -149,18 +149,6 @@ static inline __printf(1, 2) __cold
 void early_printk(const char *s, ...) { }
 #endif
 
-#ifdef CONFIG_PRINTK_NMI
-extern void printk_nmi_enter(void);
-extern void printk_nmi_exit(void);
-extern void printk_nmi_direct_enter(void);
-extern void printk_nmi_direct_exit(void);
-#else
-static inline void printk_nmi_enter(void) { }
-static inline void printk_nmi_exit(void) { }
-static inline void printk_nmi_direct_enter(void) { }
-static inline void printk_nmi_direct_exit(void) { }
-#endif /* PRINTK_NMI */
-
 struct dev_printk_info;
 
 #ifdef CONFIG_PRINTK
diff --git a/init/Kconfig b/init/Kconfig
index 096e1af5c586..ea58c0d30a97 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1482,11 +1482,6 @@ config PRINTK
  very difficult to diagnose system problems, saying N here is
  strongly discouraged.
 
-config PRINTK_NMI
-   def_bool y
-   depends on PRINTK
-   depends on HAVE_NMI
-
 config BUG
bool "BUG() support" if EXPERT
default y
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6cc35c5de890..ff890ae3ee6a 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -6,12 +6,6 @@
 
 #ifdef CONFIG_PRINTK
 
-#define PRINTK_SAFE_CONTEXT_MASK   0x007ff
-#define PRINTK_NMI_DIRECT_CONTEXT_MASK 0x00800
-#define PRINTK_NMI_CONTEXT_MASK0xff000
-
-#define PRINTK_NMI_CONTEXT_OFFSET  0x01000
-
 __printf(4, 0)
 int vprintk_store(int facility, int level,
  const struct dev_printk_info *dev_info,
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 4b5df5c27334..4da1ab3332d6 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -4,12 +4,9 @@
  */
 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -17,35 +14,6 @@
 
 static DEFINE_PER_CPU(int, printk_context);
 
-#ifdef CONFIG_PRINTK_NMI
-void noinstr printk_nmi_enter(void)
-{
-   this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
-}
-
-void noinstr printk_nmi

Re: [PATCH] printk: rename vprintk_func to vprintk

2021-03-30 Thread John Ogness
On 2021-03-30, Petr Mladek  wrote:
> On Tue 2021-03-23 15:42:01, Rasmus Villemoes wrote:
>> The printk code is already hard enough to understand. Remove an
>> unnecessary indirection by renaming vprintk_func to vprintk (adding
>> the asmlinkage annotation), and removing the vprintk definition from
>> printk.c. That way, printk is implemented in terms of vprintk as one
>> would expect, and there's no "vprintk_func, what's that? Some function
>> pointer that gets set where?"
>> 
>> The declaration of vprintk in linux/printk.h already has the
>> __printf(1,0) attribute, there's no point repeating that with the
>> definition - it's for diagnostics in callers.
>> 
>> linux/printk.h already contains a static inline {return 0;} definition
>> of vprintk when !CONFIG_PRINTK.
>> 
>> Since the corresponding stub definition of vprintk_func was not marked
>> "static inline", any translation unit including internal.h would get a
>> definition of vprintk_func - it just so happens that for
>> !CONFIG_PRINTK, there is precisely one such TU, namely printk.c. Had
>> there been more, it would be a link error; now it's just a silly waste
>> of a few bytes of .text, which one must assume are rather precious to
>> anyone disabling PRINTK.
>> 
>> $ objdump -dr kernel/printk/printk.o
>> 0330 :
>>  330:   31 c0   xor%eax,%eax
>>  332:   c3  ret
>>  333:   8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi
>>  33a:   8d b6 00 00 00 00   lea0x0(%esi),%esi
>> 
>> Signed-off-by: Rasmus Villemoes 
>
> Nice clean up!
>
> Reviewed-by: Petr Mladek 
>
> John,
>
> it conflicts with the patchset removing printk safe buffers[1].
> Would you prefer to queue this into the patchset?
> Or should I push it into printk/linux.git, printk-rework and you would
> base v2 on top of it?

Please push it to printk-rework. I will base my v2 on top of it.

Thanks.

John


Re: [PATCH] kernel/printk.c: Fixed mundane typos

2021-03-30 Thread John Ogness
On 2021-03-30, Petr Mladek  wrote:
> On Sun 2021-03-28 10:09:32, Bhaskar Chowdhury wrote:
>> 
>> s/sempahore/semaphore/
>> s/exacly/exactly/
>> s/unregistred/unregistered/
>> s/interation/iteration/
>> 
>> 
>> Signed-off-by: Bhaskar Chowdhury 
>
> Reviewed-by: Petr Mladek 
>
> John,
>
> it conflicts with the patchset removing printk safe buffers[1].
> Would you prefer to queue this into the patchset?
> Or should I push it into printk/linux.git, printk-rework and you would
> base v2 on top of it?

Go ahead and push it to printk-rework. I'll base v2 on top of it.

Thanks.

John


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-29 Thread John Ogness
On 2021-03-29, John Ogness  wrote:
>> Will you call console write() callback with irq enabled from the
>> kthread?
>
> No. That defeats the fundamental purpose of this entire rework
> excercise. ;-)

Sorry, I misread your question. The answer is "yes". We want to avoid a
local_irq_save() when calling into console->write().

John Ogness


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-29 Thread John Ogness
On 2021-03-29, Petr Mladek  wrote:
> I wonder if some console drivers rely on the fact that the write()
> callback is called with interrupts disabled.
>
> IMHO, it would be a bug when any write() callback expects that
> callers disabled the interrupts.

Agreed.

> Do you plan to remove the console-spinning stuff after offloading
> consoles to the kthreads?

Yes. Although a similar concept will be introduced to allow the threaded
printers and the atomic consoles to compete.

> Will you call console write() callback with irq enabled from the
> kthread?

No. That defeats the fundamental purpose of this entire rework
excercise. ;-)

> Anyway, we should at least add a comment why the interrupts are
> disabled.

I decided to move the local_irq_save/restore inside the console-spinning
functions and added a comment for v2.

John Ogness


Re: [PATCH next v1 3/3] printk: convert @syslog_lock to spin_lock

2021-03-26 Thread John Ogness
On 2021-03-23, Petr Mladek  wrote:

> On Wed 2021-03-17 00:33:26, John Ogness wrote:
>> @syslog_log was a raw_spin_lock to simplify the transition of
>
> s/syslog_log/syslog_lock/
>
> Same problem is also below.

Right.

>> removing @logbuf_lock and the safe buffers. With that transition
>> complete, @syslog_log can become a spin_lock.
>
> I know that we already talked about this. But I want to be sure
> that this patch makes sense.
>
> It will actually not change the behavior because we always
> take the lock with interrupts disabled.
>
> We disable the interrupts because register_console() is called
> in IRQ context on parisc in handle_interruption() when it is
> going to panic (code == 1 => will call parisc_terminate()).

Yes. [0]

> Disabling IRQ will not help in parisc_terminate(). This code
> path is non-maskable and never returns. The deadlock might be
> prevented only by trylock.
>
> trylock on syslog_lock is only small problem. Much bigger is
> is a deadlock on console_lock. Fixing this is beyond this
> patchset.
>
> Summary:
>
> + disabling IRQ does not help for parisc
>
> + register_console() is not irq safe in general because
>   of the sleeping console_lock.
>
>
> I would personally prefer to remove both "raw" and "irq"
> in this patch and just document the problem with parisc
> in the commit message.
>
> IMHO, it does not make sense to keep _irq when it neither
> helps nor makes sense.

I agree. I will change it for v2 and note in the commit message that the
parisc call chain:

handle_interruption(code=1) /* High-priority machine check (HPMC) */
  pdc_console_restart()
pdc_console_init_force()
  register_console()

is unsafe and is the only register_console() user in atomic context.

John Ogness

[0] https://lore.kernel.org/lkml/8735xs10hi@jogness.linutronix.de


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-26 Thread John Ogness
On 2021-03-23, Petr Mladek  wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1142,8 +1126,6 @@ void __init setup_log_buf(int early)
>>   new_descs, ilog2(new_descs_count),
>>   new_infos);
>>  
>> -printk_safe_enter_irqsave(flags);
>> -
>>  log_buf_len = new_log_buf_len;
>>  log_buf = new_log_buf;
>>  new_log_buf_len = 0;
>> @@ -1159,8 +1141,6 @@ void __init setup_log_buf(int early)
>>   */
>>  prb = _rb_dynamic;
>>  
>> -printk_safe_exit_irqrestore(flags);
>
> This will allow to add new messages from the IRQ context when we
> are copying them to the new buffer. They might get lost in
> the small race window.
>
> Also the messages from NMI might get lost because they are not
> longer stored in the per-CPU buffer.
>
> A possible solution might be to do something like this:
>
>   prb_for_each_record(0, _rb_static, seq, )
>   free -= add_to_rb(_rb_dynamic, );
>
>   prb = _rb_dynamic;
>
>   /*
>* Copy the remaining messages that might have appeared
>* from IRQ or NMI context after we ended copying and
>* before we switched the buffers. They must be finalized
>* because only one CPU is up at this stage.
>*/
>   prb_for_each_record(seq, _rb_static, seq, )
>   free -= add_to_rb(_rb_dynamic, );

OK. I'll probably rework it some and combine it with the "dropped" test
so that we can identify if messages were dropped during the transition
(because of static ringbuffer overrun).

>> -
>>  if (seq != prb_next_seq(_rb_static)) {
>>  pr_err("dropped %llu messages\n",
>> prb_next_seq(_rb_static) - seq);
>> @@ -2666,7 +2631,6 @@ void console_unlock(void)
>>  size_t ext_len = 0;
>>  size_t len;
>>  
>> -printk_safe_enter_irqsave(flags);
>>  skip:
>>  if (!prb_read_valid(prb, console_seq, ))
>>  break;
>> @@ -2711,6 +2675,8 @@ void console_unlock(void)
>>  printk_time);
>>  console_seq++;
>>  
>> +printk_safe_enter_irqsave(flags);
>
> What is the purpose of the printk_safe context here, please?

console_lock_spinning_enable() needs to be called with interrupts
disabled. I should have just used local_irq_save().

I could add local_irq_save() to console_lock_spinning_enable() and
restore them at the end of console_lock_spinning_disable_and_check(),
but then I would need to add a @flags argument to both functions. I
think it is simpler to just do the disable/enable from the caller,
console_unlock().

BTW, I could not find any sane way of disabling interrupts via a
raw_spin_lock_irqsave() of @console_owner_lock because of the how it is
used with lockdep. In particular for
console_lock_spinning_disable_and_check().

John Ogness


Re: [PATCH next v1 1/3] printk: track/limit recursion

2021-03-23 Thread John Ogness
On 2021-03-22, Petr Mladek  wrote:
> On Wed 2021-03-17 00:33:24, John Ogness wrote:
>> Track printk() recursion and limit it to 3 levels per-CPU and per-context.
>
> Please, explain why it is added. I mean that it will
> allow remove printk_safe that provides recursion protection at the
> moment.

OK.

>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 2f829fbf0a13..c666e3e43f0c 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1940,6 +1940,71 @@ static void call_console_drivers(const char 
>> *ext_text, size_t ext_len,
>>  }
>>  }
>>  
>> +/*
>> + * Recursion is tracked separately on each CPU. If NMIs are supported, an
>> + * additional NMI context per CPU is also separately tracked. Until per-CPU
>> + * is available, a separate "early tracking" is performed.
>> + */
>> +#ifdef CONFIG_PRINTK_NMI
>
> CONFIG_PRINTK_NMI is a shortcut for CONFIG_PRINTK && CONFIG_HAVE_NMI.
> It should be possible to use CONFIG_HAVE_NMI here because this should
> be in section where CONFIG_PRINTK is defined.
>
> This would make sense if it allows to remove CONFIG_PRINTK_NMI
> entirely. IMHO, it would be nice to remove one layer in the
> config options of possible.

OK. I will remove CONFIG_PRINTK_NMI for v2.

>> +#define PRINTK_CTX_NUM 2
>> +#else
>> +#define PRINTK_CTX_NUM 1
>> +#endif
>> +static DEFINE_PER_CPU(char [PRINTK_CTX_NUM], printk_count);
>> +static char printk_count_early[PRINTK_CTX_NUM];
>> +
>> +/*
>> + * Recursion is limited to keep the output sane. printk() should not require
>> + * more than 1 level of recursion (allowing, for example, printk() to 
>> trigger
>> + * a WARN), but a higher value is used in case some printk-internal errors
>> + * exist, such as the ringbuffer validation checks failing.
>> + */
>> +#define PRINTK_MAX_RECURSION 3
>> +
>> +/* Return a pointer to the dedicated counter for the CPU+context of the 
>> caller. */
>> +static char *printk_recursion_counter(void)
>> +{
>> +int ctx = 0;
>> +
>> +#ifdef CONFIG_PRINTK_NMI
>> +if (in_nmi())
>> +ctx = 1;
>> +#endif
>> +if (!printk_percpu_data_ready())
>> +return _count_early[ctx];
>> +return &((*this_cpu_ptr(_count))[ctx]);
>> +}
>
> It is not a big deal. But using an array for two contexts looks strange
> especially when only one is used on some architectures.
> Also &((*this_cpu_ptr(_count))[ctx]) is quite tricky ;-)
>
> What do you think about the following, please?
>
> static DEFINE_PER_CPU(u8 printk_count);
> static u8 printk_count_early;
>
> #ifdef CONFIG_HAVE_NMI
> static DEFINE_PER_CPU(u8 printk_count_nmi);
> static u8 printk_count_nmi_early;
> #endif
>
> static u8 *printk_recursion_counter(void)
> {
>   if (IS_ENABLED(CONFIG_HAVE_NMI) && in_nmi()) {
>   if (printk_cpu_data_ready())
>   return this_cpu_ptr(_count_nmi);
>   return printk_count_nmi_early;
>   }
>
>   if (printk_cpu_data_ready())
>   return this_cpu_ptr(_count);
>   return printk_count_early;
> }

I can split it into explicit variables. But is the use of the IS_ENABLED
macro preferred over ifdef? I would prefer:

static u8 *printk_recursion_counter(void)
{
#ifdef CONFIG_HAVE_NMI
if (in_nmi()) {
if (printk_cpu_data_ready())
return this_cpu_ptr(_count_nmi);
return printk_count_nmi_early;
}
#endif
if (printk_cpu_data_ready())
return this_cpu_ptr(_count);
return printk_count_early;
}

Since @printk_count_nmi and @printk_count_nmi_early would not exist, I
would prefer the pre-processor removes that code block rather than
relying on compiler optimization.

John Ogness


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-22 Thread John Ogness
On 2021-03-22, Petr Mladek  wrote:
> On Mon 2021-03-22 12:16:15, John Ogness wrote:
>> On 2021-03-21, Sergey Senozhatsky  wrote:
>> >> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, 
>> >> va_list args)
>> >>* Use the main logbuf even in NMI. But avoid calling console
>> >>* drivers that might have their own locks.
>> >>*/
>> >> - if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
>> >> + if (this_cpu_read(printk_context) &
>> >> + (PRINTK_NMI_DIRECT_CONTEXT_MASK |
>> >> +  PRINTK_NMI_CONTEXT_MASK |
>> >> +  PRINTK_SAFE_CONTEXT_MASK)) {
>> >
>> > Do we need printk_nmi_direct_enter/exit() and
>> > PRINTK_NMI_DIRECT_CONTEXT_MASK?  Seems like all printk_safe() paths
>> > are now DIRECT - we store messages to the prb, but don't call console
>> > drivers.
>>
>> I was planning on waiting until the kthreads are introduced, in which
>> case printk_safe.c is completely removed.
>
> You want to keep printk_safe() context because it prevents calling
> consoles even in normal context. Namely, it prevents deadlock by
> recursively taking, for example, sem->lock in console_lock() or
> console_owner_lock in console_trylock_spinning(). Am I right?

Correct.

>> But I suppose I could switch
>> the 1 printk_nmi_direct_enter() user to printk_nmi_enter() so that
>> PRINTK_NMI_DIRECT_CONTEXT_MASK can be removed now. I would do this in a
>> 4th patch of the series.
>
> Yes, please unify the PRINTK_NMI_CONTEXT. One is enough.

Agreed. (But I'll go even further. See below.)

> I wonder if it would make sense to go even further at this stage.
> There will still be 4 contexts that modify the printk behavior after
> this patchset:
>
>   + printk_count set by printk_enter()/exit()
>   + prevents: infinite recursion
>   + context: any context
>   + action: skips entire printk at 3rd recursion level
>
>   + prink_context set by printk_safe_enter()/exit()
>   + prevents: dead lock caused by recursion into some
>   console code in any context
>   + context: any
>   + action: skips console call at 1st recursion level

Technically, at this point printk_safe_enter() behavior is identical to
printk_nmi_enter(). Namely, prevent any recursive printk calls from
calling into the console code.

>   + printk_context set by printk_nmi_enter()/exit()
>   + prevents: dead lock caused by any console lock recursion
>   + context: NMI
>   + action: skips console calls at 0th recursion level
>
>   + kdb_trap_printk
>   + redirects printk() to kdb_printk() in kdb context
>
>
> What is possible?
>
> 1. We could get rid of printk_nmi_enter()/exit() and
>PRINTK_NMI_CONTEXT completely already now. It is enough
>to check in_nmi() in printk_func().
>
>printk_nmi_enter() was added by the commit 42a0bb3f71383b457a7db362
>("printk/nmi: generic solution for safe printk in NMI"). It was
>really needed to modify @printk_func pointer.
>
>We did not remove it later when printk_function became a real
>function. The idea was to track all printk contexts in a single
>variable. But we never added kdb context.
>
>It might make sense to remove it now. Peter Zijstra would be happy.
>There already were some churns with tracking printk_context in NMI.
>For example, see
>https://lore.kernel.org/r/20200219150744.428764...@infradead.org
>
>IMHO, it does not make sense to wait until the entire console-stuff
>rework is done in this case.

Agreed. in_nmi() within vprintk_emit() is enough to detect if the
console code should be skipped:

if (!in_sched && !in_nmi()) {
...
}

> 2. I thought about unifying printk_safe_enter()/exit() and
>printk_enter()/exit(). They both count recursion with
>IRQs disabled, have similar name. But they are used
>different way.
>
>But better might be to rename printk_safe_enter()/exit() to
>console_enter()/exit() or to printk_deferred_enter()/exit().
>It would make more clear what it does now. And it might help
>to better distinguish it from the new printk_enter()/exit().
>
>This patchset actually splits the original printk_safe()
>functionality into two:
>
>+ printk_count prevents infinite recursion
>+ printk_deferred_enter() deffers console handling.
>
>I am not sure if it is worth it. But it might help people (even me)
>when digging into the printk history. Different name will help to
>understand the functionality at the given time.

I am also not sure if it is worth the extra "noise" just to give the
function a more appropriate name. The plan is to remove it completely
soon anyway. My vote is to leave the name as it is.

But I am willing to do the rename in an addtional patch if you
want. printk_deferred_enter() sounds fine to me. Please confirm if you
want me to do this.

John Ogness


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-22 Thread John Ogness
On 2021-03-21, Sergey Senozhatsky  wrote:
>> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
>> args)
>>   * Use the main logbuf even in NMI. But avoid calling console
>>   * drivers that might have their own locks.
>>   */
>> -if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
>> +if (this_cpu_read(printk_context) &
>> +(PRINTK_NMI_DIRECT_CONTEXT_MASK |
>> + PRINTK_NMI_CONTEXT_MASK |
>> + PRINTK_SAFE_CONTEXT_MASK)) {
>
> Do we need printk_nmi_direct_enter/exit() and
> PRINTK_NMI_DIRECT_CONTEXT_MASK?  Seems like all printk_safe() paths
> are now DIRECT - we store messages to the prb, but don't call console
> drivers.

I was planning on waiting until the kthreads are introduced, in which
case printk_safe.c is completely removed. But I suppose I could switch
the 1 printk_nmi_direct_enter() user to printk_nmi_enter() so that
PRINTK_NMI_DIRECT_CONTEXT_MASK can be removed now. I would do this in a
4th patch of the series.

John Ogness


Re: [PATCH next v1 1/3] printk: track/limit recursion

2021-03-22 Thread John Ogness
On 2021-03-21, Sergey Senozhatsky  wrote:
>> @@ -2055,6 +2122,9 @@ int vprintk_store(int facility, int level,
>>   */
>>  ts_nsec = local_clock();
>>  
>> +if (!printk_enter_irqsave())
>> +return 0;
>
> I guess it can be interesting to somehow signal us that we had
> printk() recursion overflow, and how many messages we lost.

Honestly, if we hit 3 levels of recursion, we are probably dealing with
an infinite recursion issue. I do not see the value of counting the
overflows in that case. The logged messages at that recursion level
would ben enough to point us to the problem.

> 3 levels of recursion seem like reasonable limit, but I maybe wouldn't
> mind one extra level.

With 3 levels, we will see all the messages of:

printk -> WARN_ON -> WARN_ON -> WARN_ON

Keep in mind that each additional level causes the reading of the logs
to be significantly more complex. Each level increases the output
exponentially:

   for every line1 in 1st_WARN_ON {
  for every line2 in 2nd_WARN_ON {
 for every line3 in 3rd_WARN_ON {
print $line3
 }
 print $line2
  }
  print $line1
   }
   print $line0

IMHO 2 levels is enough because we should _never_ hit 2 levels of
recursion. If we do, the log output at that second level should be
enough to point to the bug. IMHO printing a third level just makes
things unnecessarily difficult to read. (My series uses 3 levels as a
compromise on my part. I would prefer reducing it to 2.)

> And maybe we could add some sort of message prefix for high levels of
> recursion nesting (levels 3+), so that things should not be normal
> will be on the radars and, possibly, will be reported.

I considered this, but am very hesitant to change the output
format. Also, the CUT_HERE usage (combined with PRINTK_CALLER) seem to
be enough.

John Ogness


[PATCH next v1 2/3] printk: remove safe buffers

2021-03-16 Thread John Ogness
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console lock is left in place. This is
because the console lock is needed for the actual printing.

Signed-off-by: John Ogness 
---
 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h |  10 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |   2 -
 kernel/printk/printk.c |  81 ++--
 kernel/printk/printk_safe.c| 332 +
 lib/nmi_backtrace.c|   6 -
 9 files changed, 18 insertions(+), 423 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a44a30b0688c..5828c83eaca6 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -171,7 +171,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c9a8f4781a10..dc17d8903d4f 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -183,11 +183,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
-   /*
-* printk_safe_flush() seems to require another print
-* before anything actually goes out to console.
-*/
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe7eb2351610..2476796c1150 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char 
*fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char 
*log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 extern int kptr_restrict;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index f04d04d1b855..64bf5d5cdd06 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -977,7 +977,6 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
__crash_kexec(NULL);
 
/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 */
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
-   /* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index e7acc2888c8e..e108b2ece8c7 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -23,7 +23,6 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)   \
@@ -67,6 +66,5 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
args) { return 0; }
 #define printk_safe_enter_irq() local_irq_disable()
 #define printk_safe_exit_irq() local_irq_enable()
 
-static inline void printk_safe_init(void) { }
 static inline bool printk_percpu_data_ready(void) { return false; }
 #endif /* CONFIG_PRINTK */
diff --git a/kernel/printk/printk.c b

[PATCH next v1 1/3] printk: track/limit recursion

2021-03-16 Thread John Ogness
Track printk() recursion and limit it to 3 levels per-CPU and per-context.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 80 --
 1 file changed, 77 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2f829fbf0a13..c666e3e43f0c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1940,6 +1940,71 @@ static void call_console_drivers(const char *ext_text, 
size_t ext_len,
}
 }
 
+/*
+ * Recursion is tracked separately on each CPU. If NMIs are supported, an
+ * additional NMI context per CPU is also separately tracked. Until per-CPU
+ * is available, a separate "early tracking" is performed.
+ */
+#ifdef CONFIG_PRINTK_NMI
+#define PRINTK_CTX_NUM 2
+#else
+#define PRINTK_CTX_NUM 1
+#endif
+static DEFINE_PER_CPU(char [PRINTK_CTX_NUM], printk_count);
+static char printk_count_early[PRINTK_CTX_NUM];
+
+/*
+ * Recursion is limited to keep the output sane. printk() should not require
+ * more than 1 level of recursion (allowing, for example, printk() to trigger
+ * a WARN), but a higher value is used in case some printk-internal errors
+ * exist, such as the ringbuffer validation checks failing.
+ */
+#define PRINTK_MAX_RECURSION 3
+
+/* Return a pointer to the dedicated counter for the CPU+context of the 
caller. */
+static char *printk_recursion_counter(void)
+{
+   int ctx = 0;
+
+#ifdef CONFIG_PRINTK_NMI
+   if (in_nmi())
+   ctx = 1;
+#endif
+   if (!printk_percpu_data_ready())
+   return _count_early[ctx];
+   return &((*this_cpu_ptr(_count))[ctx]);
+}
+
+/*
+ * Enter recursion tracking. Interrupts are disabled to simplify tracking.
+ * The caller must check the return value to see if the recursion is allowed.
+ * On failure, interrupts are not disabled.
+ */
+static bool printk_enter_irqsave(unsigned long *flags)
+{
+   char *count;
+
+   local_irq_save(*flags);
+   count = printk_recursion_counter();
+   if (*count > PRINTK_MAX_RECURSION) {
+   local_irq_restore(*flags);
+   return false;
+   }
+   (*count)++;
+
+   return true;
+}
+
+/* Exit recursion tracking, restoring interrupts. */
+static void printk_exit_irqrestore(unsigned long flags)
+{
+   char *count;
+
+   count = printk_recursion_counter();
+   (*count)--;
+   local_irq_restore(flags);
+}
+
 int printk_delay_msec __read_mostly;
 
 static inline void printk_delay(void)
@@ -2040,11 +2105,13 @@ int vprintk_store(int facility, int level,
struct prb_reserved_entry e;
enum log_flags lflags = 0;
struct printk_record r;
+   unsigned long irqflags;
u16 trunc_msg_len = 0;
char prefix_buf[8];
u16 reserve_size;
va_list args2;
u16 text_len;
+   int ret = 0;
u64 ts_nsec;
 
/*
@@ -2055,6 +2122,9 @@ int vprintk_store(int facility, int level,
 */
ts_nsec = local_clock();
 
+   if (!printk_enter_irqsave())
+   return 0;
+
/*
 * The sprintf needs to come first since the syslog prefix might be
 * passed in as a parameter. An extra byte must be reserved so that
@@ -2092,7 +2162,8 @@ int vprintk_store(int facility, int level,
prb_commit();
}
 
-   return text_len;
+   ret = text_len;
+   goto out;
}
}
 
@@ -2108,7 +2179,7 @@ int vprintk_store(int facility, int level,
 
prb_rec_init_wr(, reserve_size + trunc_msg_len);
if (!prb_reserve(, prb, ))
-   return 0;
+   goto out;
}
 
/* fill message */
@@ -2130,7 +2201,10 @@ int vprintk_store(int facility, int level,
else
prb_final_commit();
 
-   return (text_len + trunc_msg_len);
+   ret = text_len + trunc_msg_len;
+out:
+   printk_exit_irqrestore(irqflags);
+   return ret;
 }
 
 asmlinkage int vprintk_emit(int facility, int level,
-- 
2.20.1



[PATCH next v1 3/3] printk: convert @syslog_lock to spin_lock

2021-03-16 Thread John Ogness
@syslog_log was a raw_spin_lock to simplify the transition of
removing @logbuf_lock and the safe buffers. With that transition
complete, @syslog_log can become a spin_lock.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fa52a5daa232..1e38174583c5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,7 +356,7 @@ enum log_flags {
 };
 
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
-static DEFINE_RAW_SPINLOCK(syslog_lock);
+static DEFINE_SPINLOCK(syslog_lock);
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
@@ -1478,9 +1478,9 @@ static int syslog_print(char __user *buf, int size)
size_t n;
size_t skip;
 
-   raw_spin_lock_irq(_lock);
+   spin_lock_irq(_lock);
if (!prb_read_valid(prb, syslog_seq, )) {
-   raw_spin_unlock_irq(_lock);
+   spin_unlock_irq(_lock);
break;
}
if (r.info->seq != syslog_seq) {
@@ -1509,7 +1509,7 @@ static int syslog_print(char __user *buf, int size)
syslog_partial += n;
} else
n = 0;
-   raw_spin_unlock_irq(_lock);
+   spin_unlock_irq(_lock);
 
if (!n)
break;
@@ -1573,9 +1573,9 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
}
 
if (clear) {
-   raw_spin_lock_irq(_lock);
+   spin_lock_irq(_lock);
latched_seq_write(_seq, seq);
-   raw_spin_unlock_irq(_lock);
+   spin_unlock_irq(_lock);
}
 
kfree(text);
@@ -1584,9 +1584,9 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 
 static void syslog_clear(void)
 {
-   raw_spin_lock_irq(_lock);
+   spin_lock_irq(_lock);
latched_seq_write(_seq, prb_next_seq(prb));
-   raw_spin_unlock_irq(_lock);
+   spin_unlock_irq(_lock);
 }
 
 /* Return a consistent copy of @syslog_seq. */
@@ -1594,9 +1594,9 @@ static u64 read_syslog_seq_irq(void)
 {
u64 seq;
 
-   raw_spin_lock_irq(_lock);
+   spin_lock_irq(_lock);
seq = syslog_seq;
-   raw_spin_unlock_irq(_lock);
+   spin_unlock_irq(_lock);
 
return seq;
 }
@@ -1674,10 +1674,10 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
break;
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
-   raw_spin_lock_irq(_lock);
+   spin_lock_irq(_lock);
if (!prb_read_valid_info(prb, syslog_seq, , NULL)) {
/* No unread messages. */
-   raw_spin_unlock_irq(_lock);
+   spin_unlock_irq(_lock);
return 0;
}
if (info.seq != syslog_seq) {
@@ -1705,7 +1705,7 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
}
error -= syslog_partial;
}
-   raw_spin_unlock_irq(_lock);
+   spin_unlock_irq(_lock);
break;
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER:
@@ -3013,9 +3013,9 @@ void register_console(struct console *newcon)
exclusive_console_stop_seq = console_seq;
 
/* Get a consistent copy of @syslog_seq. */
-   raw_spin_lock_irqsave(_lock, flags);
+   spin_lock_irqsave(_lock, flags);
console_seq = syslog_seq;
-   raw_spin_unlock_irqrestore(_lock, flags);
+   spin_unlock_irqrestore(_lock, flags);
}
console_unlock();
console_sysfs_notify();
-- 
2.20.1



[PATCH next v1 0/3] printk: remove safe buffers

2021-03-16 Thread John Ogness
Hello,

Here is v1 of a series to remove the safe buffers. They are no
longer needed because messages can be stored directly into the
log buffer from any context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
also implemented for this series.

This series falls in line with the printk-rework plan as
presented [0] at Linux Plumbers in Lisbon 2019.

This series is based on next-20210316.

John Ogness

[0] 
https://linuxplumbersconf.org/event/4/contributions/290/attachments/276/463/lpc2019_jogness_printk.pdf
 (slide 23)

John Ogness (3):
  printk: track/limit recursion
  printk: remove safe buffers
  printk: convert @syslog_lock to spin_lock

 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h |  10 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |   2 -
 kernel/printk/printk.c | 171 +
 kernel/printk/printk_safe.c| 332 +
 lib/nmi_backtrace.c|   6 -
 9 files changed, 100 insertions(+), 431 deletions(-)

-- 
2.20.1



[PATCH next v4 12/15] printk: introduce a kmsg_dump iterator

2021-03-03 Thread John Ogness
Rather than storing the iterator information in the registered
kmsg_dumper structure, create a separate iterator structure. The
kmsg_dump_iter structure can reside on the stack of the caller, thus
allowing lockless use of the kmsg_dump functions.

Update code that accesses the kernel logs using the kmsg_dumper
structure to use the new kmsg_dump_iter structure. For kmsg_dumpers,
this also means adding a call to kmsg_dump_rewind() to initialize
the iterator.

All this is in preparation for removal of @logbuf_lock.

Signed-off-by: John Ogness 
Reviewed-by: Kees Cook  # pstore
---
 arch/powerpc/kernel/nvram_64.c |  8 +++--
 arch/powerpc/xmon/xmon.c   |  6 ++--
 arch/um/kernel/kmsg_dump.c |  5 ++-
 drivers/hv/vmbus_drv.c |  4 ++-
 drivers/mtd/mtdoops.c  |  5 ++-
 fs/pstore/platform.c   |  5 ++-
 include/linux/kmsg_dump.h  | 36 ++-
 kernel/debug/kdb/kdb_main.c| 10 +++---
 kernel/printk/printk.c | 63 +-
 9 files changed, 80 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
index 532f22637783..3c8d9bbb51cf 100644
--- a/arch/powerpc/kernel/nvram_64.c
+++ b/arch/powerpc/kernel/nvram_64.c
@@ -647,6 +647,7 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
 {
struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf;
static unsigned int oops_count = 0;
+   static struct kmsg_dump_iter iter;
static bool panicking = false;
static DEFINE_SPINLOCK(lock);
unsigned long flags;
@@ -681,13 +682,14 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
return;
 
if (big_oops_buf) {
-   kmsg_dump_get_buffer(dumper, false,
+   kmsg_dump_rewind();
+   kmsg_dump_get_buffer(, false,
 big_oops_buf, big_oops_buf_sz, _len);
rc = zip_oops(text_len);
}
if (rc != 0) {
-   kmsg_dump_rewind(dumper);
-   kmsg_dump_get_buffer(dumper, false,
+   kmsg_dump_rewind();
+   kmsg_dump_get_buffer(, false,
 oops_data, oops_data_sz, _len);
err_type = ERR_TYPE_KERNEL_PANIC;
oops_hdr->version = cpu_to_be16(OOPS_HDR_VERSION);
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 80ed3e1becf9..5978b90a885f 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3001,7 +3001,7 @@ print_address(unsigned long addr)
 static void
 dump_log_buf(void)
 {
-   struct kmsg_dumper dumper;
+   struct kmsg_dump_iter iter;
unsigned char buf[128];
size_t len;
 
@@ -3013,9 +3013,9 @@ dump_log_buf(void)
catch_memory_errors = 1;
sync();
 
-   kmsg_dump_rewind_nolock();
+   kmsg_dump_rewind_nolock();
xmon_start_pagination();
-   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), 
)) {
+   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), )) 
{
buf[len] = '\0';
printf("%s", buf);
}
diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
index a765d235e50e..0224fcb36e22 100644
--- a/arch/um/kernel/kmsg_dump.c
+++ b/arch/um/kernel/kmsg_dump.c
@@ -10,6 +10,7 @@
 static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
enum kmsg_dump_reason reason)
 {
+   static struct kmsg_dump_iter iter;
static DEFINE_SPINLOCK(lock);
static char line[1024];
struct console *con;
@@ -35,8 +36,10 @@ static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
if (!spin_trylock_irqsave(, flags))
return;
 
+   kmsg_dump_rewind();
+
printf("kmsg_dump:\n");
-   while (kmsg_dump_get_line(dumper, true, line, sizeof(line), )) {
+   while (kmsg_dump_get_line(, true, line, sizeof(line), )) {
line[len] = '\0';
printf("%s", line);
}
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 10dce9f91216..b341b144bde8 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1391,6 +1391,7 @@ static void vmbus_isr(void)
 static void hv_kmsg_dump(struct kmsg_dumper *dumper,
 enum kmsg_dump_reason reason)
 {
+   struct kmsg_dump_iter iter;
size_t bytes_written;
phys_addr_t panic_pa;
 
@@ -1404,7 +1405,8 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
 * Write dump contents to the page. No need to synchronize; panic should
 * be single-threaded.
 */
-   kmsg_dump_get_buffer(dumper, false, hv_panic_page, HV_HYP_PAGE_SIZE,
+   kmsg_dump_rewind();
+   kmsg_dump_get_buffer(, false, hv_panic_page, HV_HYP_PAGE_SIZE,
 _written);
if (bytes_written)
hyperv_report_panic_m

[PATCH next v4 13/15] printk: remove logbuf_lock

2021-03-03 Thread John Ogness
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.

@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/internal.h|   4 +-
 kernel/printk/printk.c  | 112 
 kernel/printk/printk_safe.c |  27 +++--
 3 files changed, 46 insertions(+), 97 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 3a8fd491758c..e7acc2888c8e 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -12,8 +12,6 @@
 
 #define PRINTK_NMI_CONTEXT_OFFSET  0x01000
 
-extern raw_spinlock_t logbuf_lock;
-
 __printf(4, 0)
 int vprintk_store(int facility, int level,
  const struct dev_printk_info *dev_info,
@@ -59,7 +57,7 @@ void defer_console_output(void);
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; }
 
 /*
- * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem
+ * In !PRINTK builds we still export console_sem
  * semaphore and some of console functions (console_unlock()/etc.), so
  * printk-safe must preserve the existing local IRQ guarantees.
  */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b49dee256947..8994bc192b88 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -355,41 +355,6 @@ enum log_flags {
LOG_CONT= 8,/* text is a fragment of a continuation line */
 };
 
-/*
- * The logbuf_lock protects kmsg buffer, indices, counters.  This can be taken
- * within the scheduler's rq lock. It must be released before calling
- * console_unlock() or anything else that might wake up a process.
- */
-DEFINE_RAW_SPINLOCK(logbuf_lock);
-
-/*
- * Helper macros to lock/unlock logbuf_lock and switch between
- * printk-safe/unsafe modes.
- */
-#define logbuf_lock_irq()  \
-   do {\
-   printk_safe_enter_irq();\
-   raw_spin_lock(_lock);\
-   } while (0)
-
-#define logbuf_unlock_irq()\
-   do {\
-   raw_spin_unlock(_lock);  \
-   printk_safe_exit_irq(); \
-   } while (0)
-
-#define logbuf_lock_irqsave(flags) \
-   do {\
-   printk_safe_enter_irqsave(flags);   \
-   raw_spin_lock(_lock);\
-   } while (0)
-
-#define logbuf_unlock_irqrestore(flags)\
-   do {\
-   raw_spin_unlock(_lock);  \
-   printk_safe_exit_irqrestore(flags); \
-   } while (0)
-
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
 static DEFINE_RAW_SPINLOCK(syslog_lock);
 
@@ -401,6 +366,7 @@ static u64 syslog_seq;
 static size_t syslog_partial;
 static bool syslog_time;
 
+/* All 3 protected by @console_sem. */
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u64 exclusive_console_stop_seq;
@@ -766,27 +732,27 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
if (ret)
return ret;
 
-   logbuf_lock_irq();
+   printk_safe_enter_irq();
if (!prb_read_valid(prb, atomic64_read(>seq), r)) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
goto out;
}
 
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
ret = wait_event_interruptible(log_wait,
prb_read_valid(prb, atomic64_read(>seq), 
r));
if (ret)
goto out;
-   logbuf_lock_irq();
+   printk_safe_enter_irq();
}
 
if (r->info->seq != atomic64_read(>seq)) {
/* our last seen message is gone, return error and reset */
atomic64_set(>seq, r->info->seq);
ret = -EPIPE;
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
goto out;
}
 
@@ -796,7 +762,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
  >info->dev_info);
 
atomic64_set(>seq, r->info->seq + 1);
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
 
if (len > count) {
ret = -EINVAL;
@@ -831,7 +797,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
if (offset)
return -ESPIPE;
 
-   logbuf_lock_irq()

[PATCH next v4 11/15] printk: kmsg_dumper: remove @active field

2021-03-03 Thread John Ogness
All 6 kmsg_dumpers do not benefit from the @active flag:

  (provide their own synchronization)
  - arch/powerpc/kernel/nvram_64.c
  - arch/um/kernel/kmsg_dump.c
  - drivers/mtd/mtdoops.c
  - fs/pstore/platform.c

  (only dump on KMSG_DUMP_PANIC, which does not require
  synchronization)
  - arch/powerpc/platforms/powernv/opal-kmsg.c
  - drivers/hv/vmbus_drv.c

The other 2 kmsg_dump users also do not rely on @active:

  (hard-code @active to always be true)
  - arch/powerpc/xmon/xmon.c
  - kernel/debug/kdb/kdb_main.c

Therefore, @active can be removed.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 arch/powerpc/xmon/xmon.c|  2 +-
 include/linux/kmsg_dump.h   |  2 --
 kernel/debug/kdb/kdb_main.c |  2 +-
 kernel/printk/printk.c  | 10 +-
 4 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 3fe37495f63d..80ed3e1becf9 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3001,7 +3001,7 @@ print_address(unsigned long addr)
 static void
 dump_log_buf(void)
 {
-   struct kmsg_dumper dumper = { .active = 1 };
+   struct kmsg_dumper dumper;
unsigned char buf[128];
size_t len;
 
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 070c994ff19f..84eaa2090efa 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -36,7 +36,6 @@ enum kmsg_dump_reason {
  * through the record iterator
  * @max_reason:filter for highest reason number that should be dumped
  * @registered:Flag that specifies if this is already registered
- * @active:Flag that specifies if this is currently dumping
  * @cur_seq:   Points to the oldest message to dump
  * @next_seq:  Points after the newest message to dump
  */
@@ -44,7 +43,6 @@ struct kmsg_dumper {
struct list_head list;
void (*dump)(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason);
enum kmsg_dump_reason max_reason;
-   bool active;
bool registered;
 
/* private state of the kmsg iterator */
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 930ac1b25ec7..315169d5e119 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2101,7 +2101,7 @@ static int kdb_dmesg(int argc, const char **argv)
int adjust = 0;
int n = 0;
int skip = 0;
-   struct kmsg_dumper dumper = { .active = 1 };
+   struct kmsg_dumper dumper;
size_t len;
char buf[201];
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e794a08de00f..ce4cc64ba7c9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3408,8 +3408,6 @@ void kmsg_dump(enum kmsg_dump_reason reason)
continue;
 
/* initialize iterator with data about the stored records */
-   dumper->active = true;
-
logbuf_lock_irqsave(flags);
dumper->cur_seq = latched_seq_read_nolock(_seq);
dumper->next_seq = prb_next_seq(prb);
@@ -3417,9 +3415,6 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 
/* invoke dumper which will iterate over records */
dumper->dump(dumper, reason);
-
-   /* reset iterator */
-   dumper->active = false;
}
rcu_read_unlock();
 }
@@ -3454,9 +3449,6 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper 
*dumper, bool syslog,
 
prb_rec_init_rd(, , line, size);
 
-   if (!dumper->active)
-   goto out;
-
/* Read text or count text lines? */
if (line) {
if (!prb_read_valid(prb, dumper->cur_seq, ))
@@ -3542,7 +3534,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
bool ret = false;
bool time = printk_time;
 
-   if (!dumper->active || !buf || !size)
+   if (!buf || !size)
goto out;
 
logbuf_lock_irqsave(flags);
-- 
2.20.1



[PATCH next v4 10/15] printk: add syslog_lock

2021-03-03 Thread John Ogness
The global variables @syslog_seq, @syslog_partial, @syslog_time
and write access to @clear_seq are protected by @logbuf_lock.
Once @logbuf_lock is removed, these variables will need their
own synchronization method. Introduce @syslog_lock for this
purpose.

@syslog_lock is a raw_spin_lock for now. This simplifies the
transition to removing @logbuf_lock. Once @logbuf_lock and the
safe buffers are removed, @syslog_lock can change to spin_lock.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 41 +
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 65e216ca6ca6..e794a08de00f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -390,8 +390,12 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
printk_safe_exit_irqrestore(flags); \
} while (0)
 
+/* syslog_lock protects syslog_* variables and write access to clear_seq. */
+static DEFINE_RAW_SPINLOCK(syslog_lock);
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
+/* All 3 protected by @syslog_lock. */
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static size_t syslog_partial;
@@ -410,7 +414,7 @@ struct latched_seq {
 /*
  * The next printk record to read after the last 'clear' command. There are
  * two copies (updated with seqcount_latch) so that reads can locklessly
- * access a valid value. Writers are synchronized by @logbuf_lock.
+ * access a valid value. Writers are synchronized by @syslog_lock.
  */
 static struct latched_seq clear_seq = {
.latch  = SEQCNT_LATCH_ZERO(clear_seq.latch),
@@ -470,7 +474,7 @@ bool printk_percpu_data_ready(void)
return __printk_percpu_data_ready;
 }
 
-/* Must be called under logbuf_lock. */
+/* Must be called under syslog_lock. */
 static void latched_seq_write(struct latched_seq *ls, u64 val)
 {
raw_write_seqcount_latch(>latch);
@@ -1529,7 +1533,9 @@ static int syslog_print(char __user *buf, int size)
size_t skip;
 
logbuf_lock_irq();
+   raw_spin_lock(_lock);
if (!prb_read_valid(prb, syslog_seq, )) {
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
break;
}
@@ -1559,6 +1565,7 @@ static int syslog_print(char __user *buf, int size)
syslog_partial += n;
} else
n = 0;
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
 
if (!n)
@@ -1625,8 +1632,11 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
break;
}
 
-   if (clear)
+   if (clear) {
+   raw_spin_lock(_lock);
latched_seq_write(_seq, seq);
+   raw_spin_unlock(_lock);
+   }
logbuf_unlock_irq();
 
kfree(text);
@@ -1636,10 +1646,24 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 static void syslog_clear(void)
 {
logbuf_lock_irq();
+   raw_spin_lock(_lock);
latched_seq_write(_seq, prb_next_seq(prb));
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
 }
 
+/* Return a consistent copy of @syslog_seq. */
+static u64 read_syslog_seq_irq(void)
+{
+   u64 seq;
+
+   raw_spin_lock_irq(_lock);
+   seq = syslog_seq;
+   raw_spin_unlock_irq(_lock);
+
+   return seq;
+}
+
 int do_syslog(int type, char __user *buf, int len, int source)
 {
struct printk_info info;
@@ -1663,8 +1687,9 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
return 0;
if (!access_ok(buf, len))
return -EFAULT;
+
error = wait_event_interruptible(log_wait,
-   prb_read_valid(prb, syslog_seq, NULL));
+   prb_read_valid(prb, read_syslog_seq_irq(), 
NULL));
if (error)
return error;
error = syslog_print(buf, len);
@@ -1713,8 +1738,10 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
logbuf_lock_irq();
+   raw_spin_lock(_lock);
if (!prb_read_valid_info(prb, syslog_seq, , NULL)) {
/* No unread messages. */
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
return 0;
}
@@ -1743,6 +1770,7 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
}
error -= syslog_partial;
}
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
break;
/* Size of the 

[PATCH next v4 03/15] printk: limit second loop of syslog_print_all

2021-03-03 Thread John Ogness
The second loop of syslog_print_all() subtracts lengths that were
added in the first loop. With commit b031a684bfd0 ("printk: remove
logbuf_lock writer-protection of ringbuffer") it is possible that
records are (over)written during syslog_print_all(). This allows the
possibility of the second loop subtracting lengths that were never
added in the first loop.

This situation can result in syslog_print_all() filling the buffer
starting from a later record, even though there may have been room
to fit the earlier record(s) as well.

Fixes: b031a684bfd0 ("printk: remove logbuf_lock writer-protection of 
ringbuffer")
Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 575a34b88936..77ae2704e979 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1494,6 +1494,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
struct printk_info info;
unsigned int line_count;
struct printk_record r;
+   u64 max_seq;
char *text;
int len = 0;
u64 seq;
@@ -1512,9 +1513,15 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
prb_for_each_info(clear_seq, prb, seq, , _count)
len += get_record_print_text_size(, line_count, true, 
time);
 
+   /*
+* Set an upper bound for the next loop to avoid subtracting lengths
+* that were never added.
+*/
+   max_seq = seq;
+
/* move first record forward until length fits into the buffer */
prb_for_each_info(clear_seq, prb, seq, , _count) {
-   if (len <= size)
+   if (len <= size || info.seq >= max_seq)
break;
len -= get_record_print_text_size(, line_count, true, 
time);
}
-- 
2.20.1



[PATCH next v4 02/15] mtd: mtdoops: synchronize kmsg_dumper

2021-03-03 Thread John Ogness
The kmsg_dumper can be called from any context and CPU, possibly
from multiple CPUs simultaneously. Since the writing of the buffer
can occur from a later scheduled work queue, the oops buffer must
be protected against simultaneous dumping.

Use an atomic bit to mark when the buffer is protected. Release the
protection in between setting the buffer and the actual writing in
order for a possible panic (immediate write) to be written during
the scheduling of a previous oops (delayed write).

An atomic bit (rather than a spinlock) was chosen so that no
scheduling or preemption side-effects would be introduced. The MTD
kmsg_dumper may dump directly or it may be delayed (via scheduled
work). Depending on the context, different MTD callbacks are used.
For example, mtd_write() expects to be called in a non-atomic
context and may take a mutex.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 drivers/mtd/mtdoops.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/mtdoops.c b/drivers/mtd/mtdoops.c
index 774970bfcf85..8bbfba40a554 100644
--- a/drivers/mtd/mtdoops.c
+++ b/drivers/mtd/mtdoops.c
@@ -52,6 +52,7 @@ static struct mtdoops_context {
int nextcount;
unsigned long *oops_page_used;
 
+   unsigned long oops_buf_busy;
void *oops_buf;
 } oops_cxt;
 
@@ -180,6 +181,9 @@ static void mtdoops_write(struct mtdoops_context *cxt, int 
panic)
u32 *hdr;
int ret;
 
+   if (test_and_set_bit(0, >oops_buf_busy))
+   return;
+
/* Add mtdoops header to the buffer */
hdr = cxt->oops_buf;
hdr[0] = cxt->nextcount;
@@ -190,7 +194,7 @@ static void mtdoops_write(struct mtdoops_context *cxt, int 
panic)
  record_size, , cxt->oops_buf);
if (ret == -EOPNOTSUPP) {
printk(KERN_ERR "mtdoops: Cannot write from panic 
without panic_write\n");
-   return;
+   goto out;
}
} else
ret = mtd_write(mtd, cxt->nextpage * record_size,
@@ -203,6 +207,8 @@ static void mtdoops_write(struct mtdoops_context *cxt, int 
panic)
memset(cxt->oops_buf, 0xff, record_size);
 
mtdoops_inc_counter(cxt);
+out:
+   clear_bit(0, >oops_buf_busy);
 }
 
 static void mtdoops_workfunc_write(struct work_struct *work)
@@ -276,8 +282,11 @@ static void mtdoops_do_dump(struct kmsg_dumper *dumper,
if (reason == KMSG_DUMP_OOPS && !dump_oops)
return;
 
+   if (test_and_set_bit(0, >oops_buf_busy))
+   return;
kmsg_dump_get_buffer(dumper, true, cxt->oops_buf + MTDOOPS_HEADER_SIZE,
 record_size - MTDOOPS_HEADER_SIZE, NULL);
+   clear_bit(0, >oops_buf_busy);
 
if (reason != KMSG_DUMP_OOPS) {
/* Panics must be written immediately */
@@ -394,6 +403,7 @@ static int __init mtdoops_init(void)
return -ENOMEM;
}
memset(cxt->oops_buf, 0xff, record_size);
+   cxt->oops_buf_busy = 0;
 
INIT_WORK(>work_erase, mtdoops_workfunc_erase);
INIT_WORK(>work_write, mtdoops_workfunc_write);
-- 
2.20.1



[PATCH next v4 00/15] printk: remove logbuf_lock

2021-03-03 Thread John Ogness
Hello,

Here is v4 of a series to remove @logbuf_lock, exposing the
ringbuffer locklessly to both readers and writers. v3 is
here [0].

Since @logbuf_lock was protecting much more than just the
ringbuffer, this series clarifies and cleans up the various
protections using comments, lockless accessors, atomic types,
and a new finer-grained @syslog_lock.

Removing @logbuf_lock required changing the semantics of the
kmsg_dumper callback in order to work locklessly. This series
adjusts all kmsg_dumpers and users of the kmsg_dump_get_*()
functions for the new semantics.

This series is based on next-20210303.

Changes since v3:

- disable interrupts in the arch/um kmsg_dumper

- reduce CONSOLE_LOG_MAX value from 4096 back to 1024 to revert
  the increasd 3KiB static memory footprint

- change the kmsg_dumper() callback prototype back to how it
  was because some dumpers need the registered object for
  container_of() usage

- for kmsg_dump_get_line()/kmsg_dump_get_buffer() restrict the
  minimal allowed sequence number to the cleared sequence number

John Ogness

[0] 
https://lore.kernel.org/lkml/20210225202438.28985-1-john.ogn...@linutronix.de/

John Ogness (15):
  um: synchronize kmsg_dumper
  mtd: mtdoops: synchronize kmsg_dumper
  printk: limit second loop of syslog_print_all
  printk: kmsg_dump: remove unused fields
  printk: refactor kmsg_dump_get_buffer()
  printk: consolidate kmsg_dump_get_buffer/syslog_print_all code
  printk: introduce CONSOLE_LOG_MAX
  printk: use seqcount_latch for clear_seq
  printk: use atomic64_t for devkmsg_user.seq
  printk: add syslog_lock
  printk: kmsg_dumper: remove @active field
  printk: introduce a kmsg_dump iterator
  printk: remove logbuf_lock
  printk: kmsg_dump: remove _nolock() variants
  printk: console: remove unnecessary safe buffer usage

 arch/powerpc/kernel/nvram_64.c |   8 +-
 arch/powerpc/xmon/xmon.c   |   6 +-
 arch/um/kernel/kmsg_dump.c |  13 +-
 drivers/hv/vmbus_drv.c |   4 +-
 drivers/mtd/mtdoops.c  |  17 +-
 fs/pstore/platform.c   |   5 +-
 include/linux/kmsg_dump.h  |  47 ++--
 kernel/debug/kdb/kdb_main.c|  10 +-
 kernel/printk/internal.h   |   4 +-
 kernel/printk/printk.c | 464 +
 kernel/printk/printk_safe.c|  27 +-
 11 files changed, 310 insertions(+), 295 deletions(-)

-- 
2.20.1



[PATCH next v4 01/15] um: synchronize kmsg_dumper

2021-03-03 Thread John Ogness
The kmsg_dumper can be called from any context and CPU, possibly
from multiple CPUs simultaneously. Since a static buffer is used
to retrieve the kernel logs, this buffer must be protected against
simultaneous dumping. Skip dumping if another context is already
dumping.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 arch/um/kernel/kmsg_dump.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
index 6516ef1f8274..a765d235e50e 100644
--- a/arch/um/kernel/kmsg_dump.c
+++ b/arch/um/kernel/kmsg_dump.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -9,8 +10,10 @@
 static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
enum kmsg_dump_reason reason)
 {
+   static DEFINE_SPINLOCK(lock);
static char line[1024];
struct console *con;
+   unsigned long flags;
size_t len = 0;
 
/* only dump kmsg when no console is available */
@@ -29,11 +32,16 @@ static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
if (con)
return;
 
+   if (!spin_trylock_irqsave(, flags))
+   return;
+
printf("kmsg_dump:\n");
while (kmsg_dump_get_line(dumper, true, line, sizeof(line), )) {
line[len] = '\0';
printf("%s", line);
}
+
+   spin_unlock_irqrestore(, flags);
 }
 
 static struct kmsg_dumper kmsg_dumper = {
-- 
2.20.1



[PATCH next v4 06/15] printk: consolidate kmsg_dump_get_buffer/syslog_print_all code

2021-03-03 Thread John Ogness
The logic for finding records to fit into a buffer is the same for
kmsg_dump_get_buffer() and syslog_print_all(). Introduce a helper
function find_first_fitting_seq() to handle this logic.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 87 --
 1 file changed, 50 insertions(+), 37 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ed678d84dc51..9a5f9ccc46ea 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1421,6 +1421,50 @@ static size_t get_record_print_text_size(struct 
printk_info *info,
return ((prefix_len * line_count) + info->text_len + 1);
 }
 
+/*
+ * Beginning with @start_seq, find the first record where it and all following
+ * records up to (but not including) @max_seq fit into @size.
+ *
+ * @max_seq is simply an upper bound and does not need to exist. If the caller
+ * does not require an upper bound, -1 can be used for @max_seq.
+ */
+static u64 find_first_fitting_seq(u64 start_seq, u64 max_seq, size_t size,
+ bool syslog, bool time)
+{
+   struct printk_info info;
+   unsigned int line_count;
+   size_t len = 0;
+   u64 seq;
+
+   /* Determine the size of the records up to @max_seq. */
+   prb_for_each_info(start_seq, prb, seq, , _count) {
+   if (info.seq >= max_seq)
+   break;
+   len += get_record_print_text_size(, line_count, syslog, 
time);
+   }
+
+   /*
+* Adjust the upper bound for the next loop to avoid subtracting
+* lengths that were never added.
+*/
+   if (seq < max_seq)
+   max_seq = seq;
+
+   /*
+* Move first record forward until length fits into the buffer. Ignore
+* newest messages that were not counted in the above cycle. Messages
+* might appear and get lost in the meantime. This is a best effort
+* that prevents an infinite loop that could occur with a retry.
+*/
+   prb_for_each_info(start_seq, prb, seq, , _count) {
+   if (len <= size || info.seq >= max_seq)
+   break;
+   len -= get_record_print_text_size(, line_count, syslog, 
time);
+   }
+
+   return seq;
+}
+
 static int syslog_print(char __user *buf, int size)
 {
struct printk_info info;
@@ -1492,9 +1536,7 @@ static int syslog_print(char __user *buf, int size)
 static int syslog_print_all(char __user *buf, int size, bool clear)
 {
struct printk_info info;
-   unsigned int line_count;
struct printk_record r;
-   u64 max_seq;
char *text;
int len = 0;
u64 seq;
@@ -1510,21 +1552,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 * Find first record that fits, including all following records,
 * into the user-provided buffer for this dump.
 */
-   prb_for_each_info(clear_seq, prb, seq, , _count)
-   len += get_record_print_text_size(, line_count, true, 
time);
-
-   /*
-* Set an upper bound for the next loop to avoid subtracting lengths
-* that were never added.
-*/
-   max_seq = seq;
-
-   /* move first record forward until length fits into the buffer */
-   prb_for_each_info(clear_seq, prb, seq, , _count) {
-   if (len <= size || info.seq >= max_seq)
-   break;
-   len -= get_record_print_text_size(, line_count, true, 
time);
-   }
+   seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
 
prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
 
@@ -3427,7 +3455,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
  char *buf, size_t size, size_t *len_out)
 {
struct printk_info info;
-   unsigned int line_count;
struct printk_record r;
unsigned long flags;
u64 seq;
@@ -3455,26 +3482,12 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
 
/*
 * Find first record that fits, including all following records,
-* into the user-provided buffer for this dump.
+* into the user-provided buffer for this dump. Pass in size-1
+* because this function (by way of record_print_text()) will
+* not write more than size-1 bytes of text into @buf.
 */
-
-   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
-   if (info.seq >= dumper->next_seq)
-   break;
-   len += get_record_print_text_size(, line_count, syslog, 
time);
-   }
-
-   /*
-* Move first record forward until length fits into the buffer. Ignore
-* newest messages that were not counted in the above cycle. Messages
-* might appear and get lost in the meantime. This is the best effort
-* 

[PATCH next v4 07/15] printk: introduce CONSOLE_LOG_MAX

2021-03-03 Thread John Ogness
Instead of using "LOG_LINE_MAX + PREFIX_MAX" for temporary buffer
sizes, introduce CONSOLE_LOG_MAX. This represents the maximum size
that is allowed to be printed to the console for a single record.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9a5f9ccc46ea..2c8873fa2f29 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -410,7 +410,12 @@ static u64 clear_seq;
 #else
 #define PREFIX_MAX 32
 #endif
-#define LOG_LINE_MAX   (1024 - PREFIX_MAX)
+
+/* the maximum size of a formatted record (i.e. with prefix added per line) */
+#define CONSOLE_LOG_MAX1024
+
+/* the maximum size allowed to be reserved for a record */
+#define LOG_LINE_MAX   (CONSOLE_LOG_MAX - PREFIX_MAX)
 
 #define LOG_LEVEL(v)   ((v) & 0x07)
 #define LOG_FACILITY(v)((v) >> 3 & 0xff)
@@ -1472,11 +1477,11 @@ static int syslog_print(char __user *buf, int size)
char *text;
int len = 0;
 
-   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+   text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
-   prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
+   prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
while (size > 0) {
size_t n;
@@ -1542,7 +1547,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
u64 seq;
bool time;
 
-   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+   text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
@@ -1554,7 +1559,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 */
seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
 
-   prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
+   prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
len = 0;
prb_for_each_record(seq, prb, seq, ) {
@@ -2187,8 +2192,7 @@ EXPORT_SYMBOL(printk);
 
 #else /* CONFIG_PRINTK */
 
-#define LOG_LINE_MAX   0
-#define PREFIX_MAX 0
+#define CONSOLE_LOG_MAX0
 #define printk_timefalse
 
 #define prb_read_valid(rb, seq, r) false
@@ -2506,7 +2510,7 @@ static inline int can_use_console(void)
 void console_unlock(void)
 {
static char ext_text[CONSOLE_EXT_LOG_MAX];
-   static char text[LOG_LINE_MAX + PREFIX_MAX];
+   static char text[CONSOLE_LOG_MAX];
unsigned long flags;
bool do_cond_resched, retry;
struct printk_info info;
-- 
2.20.1



[PATCH next v4 04/15] printk: kmsg_dump: remove unused fields

2021-03-03 Thread John Ogness
struct kmsg_dumper still contains some fields that were used to
iterate the old ringbuffer. They are no longer used. Remove them
and update the struct documentation.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 include/linux/kmsg_dump.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 3378bcbe585e..ae38035f1dca 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -36,6 +36,9 @@ enum kmsg_dump_reason {
  * through the record iterator
  * @max_reason:filter for highest reason number that should be dumped
  * @registered:Flag that specifies if this is already registered
+ * @active:Flag that specifies if this is currently dumping
+ * @cur_seq:   Points to the oldest message to dump
+ * @next_seq:  Points after the newest message to dump
  */
 struct kmsg_dumper {
struct list_head list;
@@ -45,8 +48,6 @@ struct kmsg_dumper {
bool registered;
 
/* private state of the kmsg iterator */
-   u32 cur_idx;
-   u32 next_idx;
u64 cur_seq;
u64 next_seq;
 };
-- 
2.20.1



[PATCH next v4 15/15] printk: console: remove unnecessary safe buffer usage

2021-03-03 Thread John Ogness
Upon registering a console, safe buffers are activated when setting
up the sequence number to replay the log. However, these are already
protected by @console_sem and @syslog_lock. Remove the unnecessary
safe buffer usage.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 602de86d4e76..2f829fbf0a13 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2967,9 +2967,7 @@ void register_console(struct console *newcon)
/*
 * console_unlock(); will print out the buffered messages
 * for us.
-*/
-   printk_safe_enter_irqsave(flags);
-   /*
+*
 * We're about to replay the log buffer.  Only do this to the
 * just-registered console to avoid excessive message spam to
 * the already-registered consoles.
@@ -2982,11 +2980,9 @@ void register_console(struct console *newcon)
exclusive_console_stop_seq = console_seq;
 
/* Get a consistent copy of @syslog_seq. */
-   raw_spin_lock(_lock);
+   raw_spin_lock_irqsave(_lock, flags);
console_seq = syslog_seq;
-   raw_spin_unlock(_lock);
-
-   printk_safe_exit_irqrestore(flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
}
console_unlock();
console_sysfs_notify();
-- 
2.20.1



[PATCH next v4 09/15] printk: use atomic64_t for devkmsg_user.seq

2021-03-03 Thread John Ogness
@user->seq is indirectly protected by @logbuf_lock. Once @logbuf_lock
is removed, @user->seq will be no longer safe from an atomicity point
of view.

In preparation for the removal of @logbuf_lock, change it to
atomic64_t to provide this safety.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1b4bb88c3547..65e216ca6ca6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -662,7 +662,7 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
-   u64 seq;
+   atomic64_t seq;
struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
@@ -763,7 +763,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
return ret;
 
logbuf_lock_irq();
-   if (!prb_read_valid(prb, user->seq, r)) {
+   if (!prb_read_valid(prb, atomic64_read(>seq), r)) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
logbuf_unlock_irq();
@@ -772,15 +772,15 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
 
logbuf_unlock_irq();
ret = wait_event_interruptible(log_wait,
-   prb_read_valid(prb, user->seq, r));
+   prb_read_valid(prb, atomic64_read(>seq), 
r));
if (ret)
goto out;
logbuf_lock_irq();
}
 
-   if (r->info->seq != user->seq) {
+   if (r->info->seq != atomic64_read(>seq)) {
/* our last seen message is gone, return error and reset */
-   user->seq = r->info->seq;
+   atomic64_set(>seq, r->info->seq);
ret = -EPIPE;
logbuf_unlock_irq();
goto out;
@@ -791,7 +791,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
  >text_buf[0], r->info->text_len,
  >info->dev_info);
 
-   user->seq = r->info->seq + 1;
+   atomic64_set(>seq, r->info->seq + 1);
logbuf_unlock_irq();
 
if (len > count) {
@@ -831,7 +831,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
switch (whence) {
case SEEK_SET:
/* the first record */
-   user->seq = prb_first_valid_seq(prb);
+   atomic64_set(>seq, prb_first_valid_seq(prb));
break;
case SEEK_DATA:
/*
@@ -839,11 +839,11 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
 * changes no global state, and does not clear anything.
 */
-   user->seq = latched_seq_read_nolock(_seq);
+   atomic64_set(>seq, latched_seq_read_nolock(_seq));
break;
case SEEK_END:
/* after the last record */
-   user->seq = prb_next_seq(prb);
+   atomic64_set(>seq, prb_next_seq(prb));
break;
default:
ret = -EINVAL;
@@ -864,9 +864,9 @@ static __poll_t devkmsg_poll(struct file *file, poll_table 
*wait)
poll_wait(file, _wait, wait);
 
logbuf_lock_irq();
-   if (prb_read_valid_info(prb, user->seq, , NULL)) {
+   if (prb_read_valid_info(prb, atomic64_read(>seq), , NULL)) {
/* return error when data has vanished underneath us */
-   if (info.seq != user->seq)
+   if (info.seq != atomic64_read(>seq))
ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
else
ret = EPOLLIN|EPOLLRDNORM;
@@ -905,7 +905,7 @@ static int devkmsg_open(struct inode *inode, struct file 
*file)
>text_buf[0], sizeof(user->text_buf));
 
logbuf_lock_irq();
-   user->seq = prb_first_valid_seq(prb);
+   atomic64_set(>seq, prb_first_valid_seq(prb));
logbuf_unlock_irq();
 
file->private_data = user;
-- 
2.20.1



[PATCH next v4 08/15] printk: use seqcount_latch for clear_seq

2021-03-03 Thread John Ogness
kmsg_dump_rewind_nolock() locklessly reads @clear_seq. However,
this is not done atomically. Since @clear_seq is 64-bit, this
cannot be an atomic operation for all platforms. Therefore, use
a seqcount_latch to allow readers to always read a consistent
value.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 58 --
 1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2c8873fa2f29..1b4bb88c3547 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -402,8 +402,21 @@ static u64 console_seq;
 static u64 exclusive_console_stop_seq;
 static unsigned long console_dropped;
 
-/* the next printk record to read after the last 'clear' command */
-static u64 clear_seq;
+struct latched_seq {
+   seqcount_latch_tlatch;
+   u64 val[2];
+};
+
+/*
+ * The next printk record to read after the last 'clear' command. There are
+ * two copies (updated with seqcount_latch) so that reads can locklessly
+ * access a valid value. Writers are synchronized by @logbuf_lock.
+ */
+static struct latched_seq clear_seq = {
+   .latch  = SEQCNT_LATCH_ZERO(clear_seq.latch),
+   .val[0] = 0,
+   .val[1] = 0,
+};
 
 #ifdef CONFIG_PRINTK_CALLER
 #define PREFIX_MAX 48
@@ -457,6 +470,31 @@ bool printk_percpu_data_ready(void)
return __printk_percpu_data_ready;
 }
 
+/* Must be called under logbuf_lock. */
+static void latched_seq_write(struct latched_seq *ls, u64 val)
+{
+   raw_write_seqcount_latch(>latch);
+   ls->val[0] = val;
+   raw_write_seqcount_latch(>latch);
+   ls->val[1] = val;
+}
+
+/* Can be called from any context. */
+static u64 latched_seq_read_nolock(struct latched_seq *ls)
+{
+   unsigned int seq;
+   unsigned int idx;
+   u64 val;
+
+   do {
+   seq = raw_read_seqcount_latch(>latch);
+   idx = seq & 0x1;
+   val = ls->val[idx];
+   } while (read_seqcount_latch_retry(>latch, seq));
+
+   return val;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -801,7 +839,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
 * changes no global state, and does not clear anything.
 */
-   user->seq = clear_seq;
+   user->seq = latched_seq_read_nolock(_seq);
break;
case SEEK_END:
/* after the last record */
@@ -960,6 +998,9 @@ void log_buf_vmcoreinfo_setup(void)
 
VMCOREINFO_SIZE(atomic_long_t);
VMCOREINFO_TYPE_OFFSET(atomic_long_t, counter);
+
+   VMCOREINFO_STRUCT_SIZE(latched_seq);
+   VMCOREINFO_OFFSET(latched_seq, val);
 }
 #endif
 
@@ -1557,7 +1598,8 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 * Find first record that fits, including all following records,
 * into the user-provided buffer for this dump.
 */
-   seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
+   seq = find_first_fitting_seq(latched_seq_read_nolock(_seq), -1,
+size, true, time);
 
prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
@@ -1584,7 +1626,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
}
 
if (clear)
-   clear_seq = seq;
+   latched_seq_write(_seq, seq);
logbuf_unlock_irq();
 
kfree(text);
@@ -1594,7 +1636,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 static void syslog_clear(void)
 {
logbuf_lock_irq();
-   clear_seq = prb_next_seq(prb);
+   latched_seq_write(_seq, prb_next_seq(prb));
logbuf_unlock_irq();
 }
 
@@ -3336,7 +3378,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
dumper->active = true;
 
logbuf_lock_irqsave(flags);
-   dumper->cur_seq = clear_seq;
+   dumper->cur_seq = latched_seq_read_nolock(_seq);
dumper->next_seq = prb_next_seq(prb);
logbuf_unlock_irqrestore(flags);
 
@@ -3534,7 +3576,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
  */
 void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
 {
-   dumper->cur_seq = clear_seq;
+   dumper->cur_seq = latched_seq_read_nolock(_seq);
dumper->next_seq = prb_next_seq(prb);
 }
 
-- 
2.20.1



[PATCH next v4 14/15] printk: kmsg_dump: remove _nolock() variants

2021-03-03 Thread John Ogness
kmsg_dump_rewind() and kmsg_dump_get_line() are lockless, so there is
no need for _nolock() variants. Remove these functions and switch all
callers of the _nolock() variants.

The functions without _nolock() were chosen because they are already
exported to kernel modules.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 arch/powerpc/xmon/xmon.c|  4 +--
 include/linux/kmsg_dump.h   | 16 --
 kernel/debug/kdb/kdb_main.c |  8 ++---
 kernel/printk/printk.c  | 60 +
 4 files changed, 14 insertions(+), 74 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 5978b90a885f..bf7d69625a2e 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3013,9 +3013,9 @@ dump_log_buf(void)
catch_memory_errors = 1;
sync();
 
-   kmsg_dump_rewind_nolock();
+   kmsg_dump_rewind();
xmon_start_pagination();
-   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), )) 
{
+   while (kmsg_dump_get_line(, false, buf, sizeof(buf), )) {
buf[len] = '\0';
printf("%s", buf);
}
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 36c8c57e1051..906521c2329c 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -57,17 +57,12 @@ struct kmsg_dumper {
 #ifdef CONFIG_PRINTK
 void kmsg_dump(enum kmsg_dump_reason reason);
 
-bool kmsg_dump_get_line_nolock(struct kmsg_dump_iter *iter, bool syslog,
-  char *line, size_t size, size_t *len);
-
 bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
char *line, size_t size, size_t *len);
 
 bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
  char *buf, size_t size, size_t *len_out);
 
-void kmsg_dump_rewind_nolock(struct kmsg_dump_iter *iter);
-
 void kmsg_dump_rewind(struct kmsg_dump_iter *iter);
 
 int kmsg_dump_register(struct kmsg_dumper *dumper);
@@ -80,13 +75,6 @@ static inline void kmsg_dump(enum kmsg_dump_reason reason)
 {
 }
 
-static inline bool kmsg_dump_get_line_nolock(struct kmsg_dump_iter *iter,
-bool syslog, const char *line,
-size_t size, size_t *len)
-{
-   return false;
-}
-
 static inline bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
const char *line, size_t size, size_t *len)
 {
@@ -99,10 +87,6 @@ static inline bool kmsg_dump_get_buffer(struct 
kmsg_dump_iter *iter, bool syslog
return false;
 }
 
-static inline void kmsg_dump_rewind_nolock(struct kmsg_dump_iter *iter)
-{
-}
-
 static inline void kmsg_dump_rewind(struct kmsg_dump_iter *iter)
 {
 }
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 8544d7a55a57..67d9f2403b52 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2126,8 +2126,8 @@ static int kdb_dmesg(int argc, const char **argv)
kdb_set(2, setargs);
}
 
-   kmsg_dump_rewind_nolock();
-   while (kmsg_dump_get_line_nolock(, 1, NULL, 0, NULL))
+   kmsg_dump_rewind();
+   while (kmsg_dump_get_line(, 1, NULL, 0, NULL))
n++;
 
if (lines < 0) {
@@ -2159,8 +2159,8 @@ static int kdb_dmesg(int argc, const char **argv)
if (skip >= n || skip < 0)
return 0;
 
-   kmsg_dump_rewind_nolock();
-   while (kmsg_dump_get_line_nolock(, 1, buf, sizeof(buf), )) {
+   kmsg_dump_rewind();
+   while (kmsg_dump_get_line(, 1, buf, sizeof(buf), )) {
if (skip) {
skip--;
continue;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8994bc192b88..602de86d4e76 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3373,7 +3373,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 }
 
 /**
- * kmsg_dump_get_line_nolock - retrieve one kmsg log line (unlocked version)
+ * kmsg_dump_get_line - retrieve one kmsg log line
  * @iter: kmsg dump iterator
  * @syslog: include the "<4>" prefixes
  * @line: buffer to copy the line to
@@ -3388,22 +3388,22 @@ void kmsg_dump(enum kmsg_dump_reason reason)
  *
  * A return value of FALSE indicates that there are no more records to
  * read.
- *
- * The function is similar to kmsg_dump_get_line(), but grabs no locks.
  */
-bool kmsg_dump_get_line_nolock(struct kmsg_dump_iter *iter, bool syslog,
-  char *line, size_t size, size_t *len)
+bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
+   char *line, size_t size, size_t *len)
 {
u64 min_seq = latched_seq_read_nolock(_seq);
struct printk_info info;
unsigned int line_count;
struct printk_record r;
+   unsigned long flags;
size_t l = 0;
bool ret =

[PATCH next v4 05/15] printk: refactor kmsg_dump_get_buffer()

2021-03-03 Thread John Ogness
kmsg_dump_get_buffer() requires nearly the same logic as
syslog_print_all(), but uses different variable names and
does not make use of the ringbuffer loop macros. Modify
kmsg_dump_get_buffer() so that the implementation is as similar
to syslog_print_all() as possible.

A follow-up commit will move this common logic into a
separate helper function.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 include/linux/kmsg_dump.h |  2 +-
 kernel/printk/printk.c| 62 +--
 2 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index ae38035f1dca..070c994ff19f 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -62,7 +62,7 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool 
syslog,
char *line, size_t size, size_t *len);
 
 bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
- char *buf, size_t size, size_t *len);
+ char *buf, size_t size, size_t *len_out);
 
 void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 77ae2704e979..ed678d84dc51 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3410,7 +3410,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_line);
  * @syslog: include the "<4>" prefixes
  * @buf: buffer to copy the line to
  * @size: maximum size of the buffer
- * @len: length of line placed into buffer
+ * @len_out: length of line placed into buffer
  *
  * Start at the end of the kmsg buffer and fill the provided buffer
  * with as many of the *youngest* kmsg records that fit into it.
@@ -3424,7 +3424,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_line);
  * read.
  */
 bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
- char *buf, size_t size, size_t *len)
+ char *buf, size_t size, size_t *len_out)
 {
struct printk_info info;
unsigned int line_count;
@@ -3432,12 +3432,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
unsigned long flags;
u64 seq;
u64 next_seq;
-   size_t l = 0;
+   size_t len = 0;
bool ret = false;
bool time = printk_time;
 
-   prb_rec_init_rd(, , buf, size);
-
if (!dumper->active || !buf || !size)
goto out;
 
@@ -3455,48 +3453,54 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
goto out;
}
 
-   /* calculate length of entire buffer */
-   seq = dumper->cur_seq;
-   while (prb_read_valid_info(prb, seq, , _count)) {
-   if (r.info->seq >= dumper->next_seq)
+   /*
+* Find first record that fits, including all following records,
+* into the user-provided buffer for this dump.
+*/
+
+   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
+   if (info.seq >= dumper->next_seq)
break;
-   l += get_record_print_text_size(, line_count, syslog, 
time);
-   seq = r.info->seq + 1;
+   len += get_record_print_text_size(, line_count, syslog, 
time);
}
 
-   /* move first record forward until length fits into the buffer */
-   seq = dumper->cur_seq;
-   while (l >= size && prb_read_valid_info(prb, seq,
-   , _count)) {
-   if (r.info->seq >= dumper->next_seq)
+   /*
+* Move first record forward until length fits into the buffer. Ignore
+* newest messages that were not counted in the above cycle. Messages
+* might appear and get lost in the meantime. This is the best effort
+* that prevents an infinite loop.
+*/
+   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
+   if (len < size || info.seq >= dumper->next_seq)
break;
-   l -= get_record_print_text_size(, line_count, syslog, 
time);
-   seq = r.info->seq + 1;
+   len -= get_record_print_text_size(, line_count, syslog, 
time);
}
 
-   /* last message in next interation */
+   /*
+* Next kmsg_dump_get_buffer() invocation will dump block of
+* older records stored right before this one.
+*/
next_seq = seq;
 
-   /* actually read text into the buffer now */
-   l = 0;
-   while (prb_read_valid(prb, seq, )) {
+   prb_rec_init_rd(, , buf, size);
+
+   len = 0;
+   prb_for_each_record(seq, prb, seq, ) {
if (r.info->seq >= dumper->next_seq)
break;
 
-   l += record_print_text(, syslog, time);
+   len += record_print_text(, syslog, time);
 
-   /* adjust record t

Re: [PATCH next v3 07/15] printk: introduce CONSOLE_LOG_MAX for improved multi-line support

2021-03-02 Thread John Ogness
Hi Geert,

On 2021-03-02, Geert Uytterhoeven  wrote:
> On Tue, Mar 2, 2021 at 2:54 PM Geert Uytterhoeven  
> wrote:
>> On Thu, Feb 25, 2021 at 9:30 PM John Ogness  
>> wrote:
>>> Instead of using "LOG_LINE_MAX + PREFIX_MAX" for temporary buffer
>>> sizes, introduce CONSOLE_LOG_MAX. This represents the maximum size
>>> that is allowed to be printed to the console for a single record.
>>>
>>> Rather than setting CONSOLE_LOG_MAX to "LOG_LINE_MAX + PREFIX_MAX"
>>> (1024), increase it to 4096. With a larger buffer size, multi-line
>>> records that are nearly LOG_LINE_MAX in length will have a better
>>> chance of being fully printed. (When formatting a record for the
>>> console, each line of a multi-line record is prepended with a copy
>>> of the prefix.)
>>>
>>> Signed-off-by: John Ogness 
>>> Reviewed-by: Petr Mladek 
>>
>> Thanks for your patch!
>>
>> This increases kernel size by more than 3 KiB, which affects small
>> devices (e.g. SoCs with 10 MiB of SRAM inside).

Petr was concerned that this patch might raise issues for the small
devices.

>> Who is printing such long lines to the console?

Some printk users like to print large multi-line messages into a single
record. They can get pretty long. But since no one is complaining with
the current 1024, we can assume it is big enough.

For v4 I will return it back to 1024 bytes.

> BTW, printing a single line of 1024 characters to a serial console at
> 115200 bps takes almost 100 ms.

Yes. Although once we move to threaded printers, I don't think anyone
will care. Also, I think the netconsole will become quite attractive
when we move to threaded printers.

John Ogness


Re: [PATCH next v3 12/15] printk: introduce a kmsg_dump iterator

2021-03-02 Thread John Ogness
On 2021-03-01, Petr Mladek  wrote:
>> diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
>> index 532f22637783..5a64b24a91c2 100644
>> --- a/arch/powerpc/kernel/nvram_64.c
>> +++ b/arch/powerpc/kernel/nvram_64.c
>> @@ -72,8 +72,7 @@ static const char *nvram_os_partitions[] = {
>>  NULL
>>  };
>>  
>> -static void oops_to_nvram(struct kmsg_dumper *dumper,
>> -  enum kmsg_dump_reason reason);
>> +static void oops_to_nvram(enum kmsg_dump_reason reason);
>>  
>>  static struct kmsg_dumper nvram_kmsg_dumper = {
>>  .dump = oops_to_nvram
>> @@ -642,11 +641,11 @@ void __init nvram_init_oops_partition(int 
>> rtas_partition_exists)
>>   * that we think will compress sufficiently to fit in the lnx,oops-log
>>   * partition.  If that's too much, go back and capture uncompressed text.
>>   */
>> -static void oops_to_nvram(struct kmsg_dumper *dumper,
>> -  enum kmsg_dump_reason reason)
>> +static void oops_to_nvram(enum kmsg_dump_reason reason)
>>  {
>>  struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf;
>>  static unsigned int oops_count = 0;
>> +static struct kmsg_dump_iter iter;
>>  static bool panicking = false;
>>  static DEFINE_SPINLOCK(lock);
>>  unsigned long flags;
>> @@ -681,13 +680,14 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
>>  return;
>>  
>>  if (big_oops_buf) {
>> -kmsg_dump_get_buffer(dumper, false,
>> +kmsg_dump_rewind();
>
> It would be nice to get rid of the kmsg_dump_rewind() calls
> in all callers.
>
> A solution might be to create the following in include/linux/kmsg_dump.h
>
> #define KMSG_DUMP_ITER_INIT(iter) {   \
>   .cur_seq = 0,   \
>   .next_seq = U64_MAX,\
>   }
>
> #define DEFINE_KMSG_DUMP_ITER(iter)   \
>   struct kmsg_dump_iter iter = KMSG_DUMP_ITER_INIT(iter)

For this caller (arch/powerpc/kernel/nvram_64.c) and for
(kernel/debug/kdb/kdb_main.c), kmsg_dump_rewind() is called twice within
the dumper. So rewind will still be used there.

> Then we could do the following at the beginning of both
> kmsg_dump_get_buffer() and kmsg_dump_get_line():
>
>   u64 clear_seq = latched_seq_read_nolock(_seq);
>
>   if (iter->cur_seq < clear_seq)
>   cur_seq = clear_seq;

I suppose we need to add this part anyway, if we want to enforce that
records before @clear_seq are not to be available for dumpers.

> I am not completely sure about next_seq:
>
>+ kmsg_dump_get_buffer() will set it for the next call anyway.
>  It reads the blocks of messages from the newest.
>
>+ kmsg_dump_get_line() wants to read the entire buffer anyway.
>  But there is a small risk of an infinite loop when new messages
>  are printed when dumping each line.
>
> It might be better to avoid the infinite loop. We could do the following:
>
> static void check_and_set_iter(struct kmsg_dump_iter)
> {
>   if (iter->cur_seq == 0 && iter->next_seq == U64_MAX) {
>   kmsg_dump_rewind(iter);
> }
>
> and call this at the beginning of both kmsg_dump_get_buffer()
> and kmsg_dump_get_line()
>
> What do you think?

On a technical level, it does not make any difference. It is pure
cosmetic.

Personally, I prefer the rewind directly before the kmsg_dump_get calls
because it puts the initializer directly next to the user.

As an example to illustrate my view, I prefer:

for (i = 0; i < n; i++)
...;

instead of:

int i = 0;

...

for (; i < n; i++)
    ...;

Also, I do not really like the special use of 0/U64_MAX to identify
special actions of the kmsg_dump_get functions.

> Note that I do not resist on it. But it might make the API easier to
> use from my POV.

Since you do not resist, I will keep the API the same for v4. But I will
add the @clear_seq check to the kmsg_dump_get functions.

John Ogness


Re: [PATCH next v3 02/15] mtd: mtdoops: synchronize kmsg_dumper

2021-03-02 Thread John Ogness
On 2021-03-01, Petr Mladek  wrote:
>> The kmsg_dumper can be called from any context and CPU, possibly
>> from multiple CPUs simultaneously. Since the writing of the buffer
>> can occur from a later scheduled work queue, the oops buffer must
>> be protected against simultaneous dumping.
>> 
>> Use an atomic bit to mark when the buffer is protected. Release the
>> protection in between setting the buffer and the actual writing in
>> order for a possible panic (immediate write) to be written during
>> the scheduling of a previous oops (delayed write).
>
> Just to be sure. You did not use spin lock to prevent problems
> with eventual double unlock in panic(). Do I get it correctly,
> please?

I do not understand what possible double unlock you are referring to.

I chose not to use spinlocks because I wanted something that does not
cause any scheduling or preemption side-effects for mtd. The mtd dumper
sometimes dumps directly, sometimes delayed (via scheduled work), and
they use different mtd callbacks in different contexts.

mtd_write() expects to be called in a non-atomic context. The callbacks
can take a mutex.

John Ogness


Re: [PATCH next v3 01/15] um: synchronize kmsg_dumper

2021-03-02 Thread John Ogness
On 2021-03-01, Petr Mladek  wrote:
>> > diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
>> > index 6516ef1f8274..4869e2cc787c 100644
>> > --- a/arch/um/kernel/kmsg_dump.c
>> > +++ b/arch/um/kernel/kmsg_dump.c
>> > @@ -1,5 +1,6 @@
>> >  // SPDX-License-Identifier: GPL-2.0
>> >  #include 
>> > +#include 
>> >  #include 
>> >  #include 
>> >  #include 
>> > @@ -9,6 +10,7 @@
>> >  static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
>> >enum kmsg_dump_reason reason)
>> >  {
>> > +  static DEFINE_SPINLOCK(lock);
>> >static char line[1024];
>> >struct console *con;
>> >size_t len = 0;
>> > @@ -29,11 +31,16 @@ static void kmsg_dumper_stdout(struct kmsg_dumper 
>> > *dumper,
>> >if (con)
>> >return;
>> >  
>> > +  if (!spin_trylock())
>> 
>> I have almost missed this. It is wrong. The last version correctly
>> used
>> 
>>  if (!spin_trylock_irqsave(, flags))
>> 
>> kmsg_dump(KMSG_DUMP_PANIC) is called in panic() with interrupts
>> disabled. We have to store the flags here.
>
> Ah, I get always confused with these things. spin_trylock() can
> actually get called in a context with IRQ disabled. So it is not
> as wrong as I thought.
>
> But still. panic() and kmsg_dump() can be called in IRQ context.
> So, this function might be called in IRQ context. So, it feels
> more correct to use the _irqsafe variant here.
>
> I know that there is the trylock so it probably does not matter much.
> Well, the disabled irq might help to serialize the two calls when
> one is in normal context and the other would happen in IRQ one.
>
> As I said, using _irqsafe variant looks better to me.

For the record, the reason I removed the _irqsave for v3 is because I
felt like it was misleading, appearing to be necessary when it is not.

I think anyone could argue both sides. But it really doesn't matter
(especially for arch/um). I will use the _irqsave variant for v4. I am
OK with that.

John Ogness


Re: [PATCH] printk: fix buffer overflow potential for print_text()

2021-02-26 Thread John Ogness
On 2021-02-26, Alexander Gordeev  wrote:
> I am seeing KASAN reporting incorrect 1-byte access in exactly
> same location Sven has identified before. In case there no
> fix for it yet, please see below what happens in case of pretty
> large buffer - WARN_ONCE() invocation in my case.

It looks like you have not applied the fix yet:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=08d60e5999540110576e7c1346d486220751b7f9

John Ogness


Re: synchronization model: was: Re: [PATCH printk-rework 09/14] printk: introduce a kmsg_dump iterator

2021-02-26 Thread John Ogness
On 2021-02-25, Petr Mladek  wrote:
> IMHO, a better design would be:
>
> 1. dumper->dump() callback should have only one parameter @reason.
>The callback should define its own iterator, buffer, and
>do the dump.

Unfortunately this won't work because drivers/mtd/mtdoops.c is using the
dumper parameter for container_of().

So we will need 2 parameters: dumper and reason.

Can we agree to proceed with 2 parameters in the callback?

> 2. dumpe->dump() callback should synchronize the entire operation
>using its own locks. Only the callback knows whether it is
>safe to do more dumps in parallel. Only the callback knows
>whether it is called only during panic() when no locks
>are needed.

Agreed. I implemented this part for the v3 series.

John Ogness


Re: [PATCH next v3 12/15] printk: introduce a kmsg_dump iterator

2021-02-26 Thread John Ogness
Hello,

Thank you kernel test robot!

Despite all of my efforts to carefully construct and test this series,
somehome I managed to miss a compile test with CONFIG_MTD_OOPS. That
kmsg_dumper does require the dumper parameter so that it can use
container_of().

I will discuss this with the printk team. But most likely we will just
re-instate the dumper parameter in the callback.

I apologize for the lack of care on my part.

John Ogness

On 2021-02-26, kernel test robot  wrote:
> Hi John,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on next-20210225]
>
> url:
> https://github.com/0day-ci/linux/commits/John-Ogness/printk-remove-logbuf_lock/20210226-043457
> base:7f206cf3ec2bee4621325cfacb2588e5085c07f5
> config: arm-randconfig-r024-20210225 (attached as .config)
> compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
> a921aaf789912d981cbb2036bdc91ad7289e1523)
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install arm cross compiling tool for clang build
> # apt-get install binutils-arm-linux-gnueabi
> # 
> https://github.com/0day-ci/linux/commit/fc7f655cded40fc98ba5304c200e3a01e8291fb4
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review 
> John-Ogness/printk-remove-logbuf_lock/20210226-043457
> git checkout fc7f655cded40fc98ba5304c200e3a01e8291fb4
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
>
> All errors (new ones prefixed by >>):
>
>>> drivers/mtd/mtdoops.c:277:45: error: use of undeclared identifier 'dumper'
>struct mtdoops_context *cxt = container_of(dumper,
>   ^
>>> drivers/mtd/mtdoops.c:277:45: error: use of undeclared identifier 'dumper'
>>> drivers/mtd/mtdoops.c:277:45: error: use of undeclared identifier 'dumper'
>    3 errors generated.
>
>
> vim +/dumper +277 drivers/mtd/mtdoops.c
>
> 4b23aff083649e Richard Purdie 2007-05-29  274  
> fc7f655cded40f John Ogness2021-02-25  275  static void 
> mtdoops_do_dump(enum kmsg_dump_reason reason)
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  276  {
> 2e386e4bac9055 Simon Kagstrom 2009-11-03 @277 struct mtdoops_context 
> *cxt = container_of(dumper,
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  278 struct 
> mtdoops_context, dump);
> fc7f655cded40f John Ogness2021-02-25  279 struct kmsg_dump_iter 
> iter;
> fc2d557c74dc58 Seiji Aguchi   2011-01-12  280  
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  281 /* Only dump oopses if 
> dump_oops is set */
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  282 if (reason == 
> KMSG_DUMP_OOPS && !dump_oops)
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  283     return;
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  284  
> fc7f655cded40f John Ogness2021-02-25  285 kmsg_dump_rewind();
> fc7f655cded40f John Ogness2021-02-25  286  
> df92cad8a03e83 John Ogness2021-02-25  287 if (test_and_set_bit(0, 
> >oops_buf_busy))
> df92cad8a03e83 John Ogness2021-02-25  288 return;
> fc7f655cded40f John Ogness    2021-02-25  289 
> kmsg_dump_get_buffer(, true, cxt->oops_buf + MTDOOPS_HEADER_SIZE,
> e2ae715d66bf4b Kay Sievers2012-06-15  290  
> record_size - MTDOOPS_HEADER_SIZE, NULL);
> df92cad8a03e83 John Ogness2021-02-25  291 clear_bit(0, 
> >oops_buf_busy);
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  292  
> c1cf1d57d14922 Mark Tomlinson 2020-09-03  293 if (reason != 
> KMSG_DUMP_OOPS) {
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  294 /* Panics must 
> be written immediately */
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  295 
> mtdoops_write(cxt, 1);
> c1cf1d57d14922 Mark Tomlinson 2020-09-03  296 } else {
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  297 /* For other 
> cases, schedule work to write it "nicely" */
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  298 
> schedule_work(>work_write);
> 2e386e4bac9055 Simon Kagstrom 2009-11-03  299 }
> c1cf1d57d14922 Mark Tomlinson 2020-09-03  300  }
> 4b23aff083649e Richard Purdie 2007-05-29  301  
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


[PATCH v1] powerpc: low_i2c: change @lock to raw_spinlock_t

2021-02-25 Thread John Ogness
i2c transfers are occurring with local interrupts disabled:

smp_core99_give_timebase()
  local_irq_save();
  smp_core99_cypress_tb_freeze()
pmac_i2c_xfer()
  kw_i2c_xfer()
spin_lock_irqsave(>lock, flags)

This is a problem because with PREEMPT_RT a spinlock_t can sleep,
causing the system to hang. Convert the spinlock_t to the
non-sleeping raw_spinlock_t.

Signed-off-by: John Ogness 
---
 arch/powerpc/platforms/powermac/low_i2c.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/low_i2c.c 
b/arch/powerpc/platforms/powermac/low_i2c.c
index f77a59b5c2e1..ba89c95ef290 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -116,7 +116,7 @@ struct pmac_i2c_host_kw
int polled;
int result;
struct completion   complete;
-   spinlock_t  lock;
+   raw_spinlock_t  lock;
struct timer_list   timeout_timer;
 };
 
@@ -346,14 +346,14 @@ static irqreturn_t kw_i2c_irq(int irq, void *dev_id)
struct pmac_i2c_host_kw *host = dev_id;
unsigned long flags;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
del_timer(>timeout_timer);
kw_i2c_handle_interrupt(host, kw_read_reg(reg_isr));
if (host->state != state_idle) {
host->timeout_timer.expires = jiffies + KW_POLL_TIMEOUT;
add_timer(>timeout_timer);
}
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
return IRQ_HANDLED;
 }
 
@@ -362,7 +362,7 @@ static void kw_i2c_timeout(struct timer_list *t)
struct pmac_i2c_host_kw *host = from_timer(host, t, timeout_timer);
unsigned long flags;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
/*
 * If the timer is pending, that means we raced with the
@@ -377,7 +377,7 @@ static void kw_i2c_timeout(struct timer_list *t)
add_timer(>timeout_timer);
}
  skip:
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 }
 
 static int kw_i2c_open(struct pmac_i2c_bus *bus)
@@ -470,9 +470,9 @@ static int kw_i2c_xfer(struct pmac_i2c_bus *bus, u8 
addrdir, int subsize,
unsigned long flags;
 
u8 isr = kw_i2c_wait_interrupt(host);
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
kw_i2c_handle_interrupt(host, isr);
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
}
}
 
@@ -508,7 +508,7 @@ static struct pmac_i2c_host_kw *__init 
kw_i2c_host_init(struct device_node *np)
}
mutex_init(>mutex);
init_completion(>complete);
-   spin_lock_init(>lock);
+   raw_spin_lock_init(>lock);
timer_setup(>timeout_timer, kw_i2c_timeout, 0);
 
psteps = of_get_property(np, "AAPL,address-step", NULL);
-- 
2.20.1



[PATCH next v3 13/15] printk: remove logbuf_lock

2021-02-25 Thread John Ogness
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.

@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.

Signed-off-by: John Ogness 
---
 kernel/printk/internal.h|   4 +-
 kernel/printk/printk.c  | 112 
 kernel/printk/printk_safe.c |  27 +++--
 3 files changed, 46 insertions(+), 97 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 3a8fd491758c..e7acc2888c8e 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -12,8 +12,6 @@
 
 #define PRINTK_NMI_CONTEXT_OFFSET  0x01000
 
-extern raw_spinlock_t logbuf_lock;
-
 __printf(4, 0)
 int vprintk_store(int facility, int level,
  const struct dev_printk_info *dev_info,
@@ -59,7 +57,7 @@ void defer_console_output(void);
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; }
 
 /*
- * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem
+ * In !PRINTK builds we still export console_sem
  * semaphore and some of console functions (console_unlock()/etc.), so
  * printk-safe must preserve the existing local IRQ guarantees.
  */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e58ccc368348..01385ea92e7c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -355,41 +355,6 @@ enum log_flags {
LOG_CONT= 8,/* text is a fragment of a continuation line */
 };
 
-/*
- * The logbuf_lock protects kmsg buffer, indices, counters.  This can be taken
- * within the scheduler's rq lock. It must be released before calling
- * console_unlock() or anything else that might wake up a process.
- */
-DEFINE_RAW_SPINLOCK(logbuf_lock);
-
-/*
- * Helper macros to lock/unlock logbuf_lock and switch between
- * printk-safe/unsafe modes.
- */
-#define logbuf_lock_irq()  \
-   do {\
-   printk_safe_enter_irq();\
-   raw_spin_lock(_lock);\
-   } while (0)
-
-#define logbuf_unlock_irq()\
-   do {\
-   raw_spin_unlock(_lock);  \
-   printk_safe_exit_irq(); \
-   } while (0)
-
-#define logbuf_lock_irqsave(flags) \
-   do {\
-   printk_safe_enter_irqsave(flags);   \
-   raw_spin_lock(_lock);\
-   } while (0)
-
-#define logbuf_unlock_irqrestore(flags)\
-   do {\
-   raw_spin_unlock(_lock);  \
-   printk_safe_exit_irqrestore(flags); \
-   } while (0)
-
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
 static DEFINE_RAW_SPINLOCK(syslog_lock);
 
@@ -401,6 +366,7 @@ static u64 syslog_seq;
 static size_t syslog_partial;
 static bool syslog_time;
 
+/* All 3 protected by @console_sem. */
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u64 exclusive_console_stop_seq;
@@ -766,27 +732,27 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
if (ret)
return ret;
 
-   logbuf_lock_irq();
+   printk_safe_enter_irq();
if (!prb_read_valid(prb, atomic64_read(>seq), r)) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
goto out;
}
 
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
ret = wait_event_interruptible(log_wait,
prb_read_valid(prb, atomic64_read(>seq), 
r));
if (ret)
goto out;
-   logbuf_lock_irq();
+   printk_safe_enter_irq();
}
 
if (r->info->seq != atomic64_read(>seq)) {
/* our last seen message is gone, return error and reset */
atomic64_set(>seq, r->info->seq);
ret = -EPIPE;
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
goto out;
}
 
@@ -796,7 +762,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
  >info->dev_info);
 
atomic64_set(>seq, r->info->seq + 1);
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
 
if (len > count) {
ret = -EINVAL;
@@ -831,7 +797,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
if (offset)
return -ESPIPE;
 
-   logbuf_lock_irq();
+   printk_safe_en

[PATCH next v3 06/15] printk: consolidate kmsg_dump_get_buffer/syslog_print_all code

2021-02-25 Thread John Ogness
The logic for finding records to fit into a buffer is the same for
kmsg_dump_get_buffer() and syslog_print_all(). Introduce a helper
function find_first_fitting_seq() to handle this logic.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 87 --
 1 file changed, 50 insertions(+), 37 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ed678d84dc51..9a5f9ccc46ea 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1421,6 +1421,50 @@ static size_t get_record_print_text_size(struct 
printk_info *info,
return ((prefix_len * line_count) + info->text_len + 1);
 }
 
+/*
+ * Beginning with @start_seq, find the first record where it and all following
+ * records up to (but not including) @max_seq fit into @size.
+ *
+ * @max_seq is simply an upper bound and does not need to exist. If the caller
+ * does not require an upper bound, -1 can be used for @max_seq.
+ */
+static u64 find_first_fitting_seq(u64 start_seq, u64 max_seq, size_t size,
+ bool syslog, bool time)
+{
+   struct printk_info info;
+   unsigned int line_count;
+   size_t len = 0;
+   u64 seq;
+
+   /* Determine the size of the records up to @max_seq. */
+   prb_for_each_info(start_seq, prb, seq, , _count) {
+   if (info.seq >= max_seq)
+   break;
+   len += get_record_print_text_size(, line_count, syslog, 
time);
+   }
+
+   /*
+* Adjust the upper bound for the next loop to avoid subtracting
+* lengths that were never added.
+*/
+   if (seq < max_seq)
+   max_seq = seq;
+
+   /*
+* Move first record forward until length fits into the buffer. Ignore
+* newest messages that were not counted in the above cycle. Messages
+* might appear and get lost in the meantime. This is a best effort
+* that prevents an infinite loop that could occur with a retry.
+*/
+   prb_for_each_info(start_seq, prb, seq, , _count) {
+   if (len <= size || info.seq >= max_seq)
+   break;
+   len -= get_record_print_text_size(, line_count, syslog, 
time);
+   }
+
+   return seq;
+}
+
 static int syslog_print(char __user *buf, int size)
 {
struct printk_info info;
@@ -1492,9 +1536,7 @@ static int syslog_print(char __user *buf, int size)
 static int syslog_print_all(char __user *buf, int size, bool clear)
 {
struct printk_info info;
-   unsigned int line_count;
struct printk_record r;
-   u64 max_seq;
char *text;
int len = 0;
u64 seq;
@@ -1510,21 +1552,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 * Find first record that fits, including all following records,
 * into the user-provided buffer for this dump.
 */
-   prb_for_each_info(clear_seq, prb, seq, , _count)
-   len += get_record_print_text_size(, line_count, true, 
time);
-
-   /*
-* Set an upper bound for the next loop to avoid subtracting lengths
-* that were never added.
-*/
-   max_seq = seq;
-
-   /* move first record forward until length fits into the buffer */
-   prb_for_each_info(clear_seq, prb, seq, , _count) {
-   if (len <= size || info.seq >= max_seq)
-   break;
-   len -= get_record_print_text_size(, line_count, true, 
time);
-   }
+   seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
 
prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
 
@@ -3427,7 +3455,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
  char *buf, size_t size, size_t *len_out)
 {
struct printk_info info;
-   unsigned int line_count;
struct printk_record r;
unsigned long flags;
u64 seq;
@@ -3455,26 +3482,12 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
 
/*
 * Find first record that fits, including all following records,
-* into the user-provided buffer for this dump.
+* into the user-provided buffer for this dump. Pass in size-1
+* because this function (by way of record_print_text()) will
+* not write more than size-1 bytes of text into @buf.
 */
-
-   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
-   if (info.seq >= dumper->next_seq)
-   break;
-   len += get_record_print_text_size(, line_count, syslog, 
time);
-   }
-
-   /*
-* Move first record forward until length fits into the buffer. Ignore
-* newest messages that were not counted in the above cycle. Messages
-* might appear and get lost in the meantime. This is the best effort
-* 

[PATCH next v3 10/15] printk: add syslog_lock

2021-02-25 Thread John Ogness
The global variables @syslog_seq, @syslog_partial, @syslog_time
and write access to @clear_seq are protected by @logbuf_lock.
Once @logbuf_lock is removed, these variables will need their
own synchronization method. Introduce @syslog_lock for this
purpose.

@syslog_lock is a raw_spin_lock for now. This simplifies the
transition to removing @logbuf_lock. Once @logbuf_lock and the
safe buffers are removed, @syslog_lock can change to spin_lock.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 41 +
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 82d89eec4aac..c2ed7db8930b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -390,8 +390,12 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
printk_safe_exit_irqrestore(flags); \
} while (0)
 
+/* syslog_lock protects syslog_* variables and write access to clear_seq. */
+static DEFINE_RAW_SPINLOCK(syslog_lock);
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
+/* All 3 protected by @syslog_lock. */
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static size_t syslog_partial;
@@ -410,7 +414,7 @@ struct latched_seq {
 /*
  * The next printk record to read after the last 'clear' command. There are
  * two copies (updated with seqcount_latch) so that reads can locklessly
- * access a valid value. Writers are synchronized by @logbuf_lock.
+ * access a valid value. Writers are synchronized by @syslog_lock.
  */
 static struct latched_seq clear_seq = {
.latch  = SEQCNT_LATCH_ZERO(clear_seq.latch),
@@ -470,7 +474,7 @@ bool printk_percpu_data_ready(void)
return __printk_percpu_data_ready;
 }
 
-/* Must be called under logbuf_lock. */
+/* Must be called under syslog_lock. */
 static void latched_seq_write(struct latched_seq *ls, u64 val)
 {
raw_write_seqcount_latch(>latch);
@@ -1529,7 +1533,9 @@ static int syslog_print(char __user *buf, int size)
size_t skip;
 
logbuf_lock_irq();
+   raw_spin_lock(_lock);
if (!prb_read_valid(prb, syslog_seq, )) {
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
break;
}
@@ -1559,6 +1565,7 @@ static int syslog_print(char __user *buf, int size)
syslog_partial += n;
} else
n = 0;
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
 
if (!n)
@@ -1625,8 +1632,11 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
break;
}
 
-   if (clear)
+   if (clear) {
+   raw_spin_lock(_lock);
latched_seq_write(_seq, seq);
+   raw_spin_unlock(_lock);
+   }
logbuf_unlock_irq();
 
kfree(text);
@@ -1636,10 +1646,24 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 static void syslog_clear(void)
 {
logbuf_lock_irq();
+   raw_spin_lock(_lock);
latched_seq_write(_seq, prb_next_seq(prb));
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
 }
 
+/* Return a consistent copy of @syslog_seq. */
+static u64 read_syslog_seq_irq(void)
+{
+   u64 seq;
+
+   raw_spin_lock_irq(_lock);
+   seq = syslog_seq;
+   raw_spin_unlock_irq(_lock);
+
+   return seq;
+}
+
 int do_syslog(int type, char __user *buf, int len, int source)
 {
struct printk_info info;
@@ -1663,8 +1687,9 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
return 0;
if (!access_ok(buf, len))
return -EFAULT;
+
error = wait_event_interruptible(log_wait,
-   prb_read_valid(prb, syslog_seq, NULL));
+   prb_read_valid(prb, read_syslog_seq_irq(), 
NULL));
if (error)
return error;
error = syslog_print(buf, len);
@@ -1713,8 +1738,10 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
logbuf_lock_irq();
+   raw_spin_lock(_lock);
if (!prb_read_valid_info(prb, syslog_seq, , NULL)) {
/* No unread messages. */
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
return 0;
}
@@ -1743,6 +1770,7 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
}
error -= syslog_partial;
}
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
break;
/* Size of the log buffer */
@@ -299

[PATCH next v3 07/15] printk: introduce CONSOLE_LOG_MAX for improved multi-line support

2021-02-25 Thread John Ogness
Instead of using "LOG_LINE_MAX + PREFIX_MAX" for temporary buffer
sizes, introduce CONSOLE_LOG_MAX. This represents the maximum size
that is allowed to be printed to the console for a single record.

Rather than setting CONSOLE_LOG_MAX to "LOG_LINE_MAX + PREFIX_MAX"
(1024), increase it to 4096. With a larger buffer size, multi-line
records that are nearly LOG_LINE_MAX in length will have a better
chance of being fully printed. (When formatting a record for the
console, each line of a multi-line record is prepended with a copy
of the prefix.)

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9a5f9ccc46ea..a60f709896dd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -410,8 +410,13 @@ static u64 clear_seq;
 #else
 #define PREFIX_MAX 32
 #endif
+
+/* the maximum size allowed to be reserved for a record */
 #define LOG_LINE_MAX   (1024 - PREFIX_MAX)
 
+/* the maximum size of a formatted record (i.e. with prefix added per line) */
+#define CONSOLE_LOG_MAX4096
+
 #define LOG_LEVEL(v)   ((v) & 0x07)
 #define LOG_FACILITY(v)((v) >> 3 & 0xff)
 
@@ -1472,11 +1477,11 @@ static int syslog_print(char __user *buf, int size)
char *text;
int len = 0;
 
-   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+   text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
-   prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
+   prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
while (size > 0) {
size_t n;
@@ -1542,7 +1547,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
u64 seq;
bool time;
 
-   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+   text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
@@ -1554,7 +1559,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 */
seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
 
-   prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
+   prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
len = 0;
prb_for_each_record(seq, prb, seq, ) {
@@ -2187,8 +2192,7 @@ EXPORT_SYMBOL(printk);
 
 #else /* CONFIG_PRINTK */
 
-#define LOG_LINE_MAX   0
-#define PREFIX_MAX 0
+#define CONSOLE_LOG_MAX0
 #define printk_timefalse
 
 #define prb_read_valid(rb, seq, r) false
@@ -2506,7 +2510,7 @@ static inline int can_use_console(void)
 void console_unlock(void)
 {
static char ext_text[CONSOLE_EXT_LOG_MAX];
-   static char text[LOG_LINE_MAX + PREFIX_MAX];
+   static char text[CONSOLE_LOG_MAX];
unsigned long flags;
bool do_cond_resched, retry;
struct printk_info info;
-- 
2.20.1



[PATCH next v3 12/15] printk: introduce a kmsg_dump iterator

2021-02-25 Thread John Ogness
Rather than storing the iterator information in the registered
kmsg_dumper structure, create a separate iterator structure. The
kmsg_dump_iter structure can reside on the stack of the caller, thus
allowing lockless use of the kmsg_dump functions.

This change also means that the kmsg_dumper dump() callback no
longer needs to pass in the kmsg_dumper as an argument. If
kmsg_dumpers want to access the kernel logs, they can use the new
iterator.

Update the kmsg_dumper callback prototype. Update code that accesses
the kernel logs using the kmsg_dumper structure to use the new
kmsg_dump_iter structure. For kmsg_dumpers, this also means adding a
call to kmsg_dump_rewind() to initialize the iterator.

All this is in preparation for removal of @logbuf_lock.

Signed-off-by: John Ogness 
---
 arch/powerpc/kernel/nvram_64.c | 14 +++---
 arch/powerpc/platforms/powernv/opal-kmsg.c |  3 +-
 arch/powerpc/xmon/xmon.c   |  6 +--
 arch/um/kernel/kmsg_dump.c |  8 +--
 drivers/hv/vmbus_drv.c |  7 +--
 drivers/mtd/mtdoops.c  |  8 +--
 fs/pstore/platform.c   |  8 +--
 include/linux/kmsg_dump.h  | 38 ---
 kernel/debug/kdb/kdb_main.c| 10 ++--
 kernel/printk/printk.c | 57 ++
 10 files changed, 81 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
index 532f22637783..5a64b24a91c2 100644
--- a/arch/powerpc/kernel/nvram_64.c
+++ b/arch/powerpc/kernel/nvram_64.c
@@ -72,8 +72,7 @@ static const char *nvram_os_partitions[] = {
NULL
 };
 
-static void oops_to_nvram(struct kmsg_dumper *dumper,
- enum kmsg_dump_reason reason);
+static void oops_to_nvram(enum kmsg_dump_reason reason);
 
 static struct kmsg_dumper nvram_kmsg_dumper = {
.dump = oops_to_nvram
@@ -642,11 +641,11 @@ void __init nvram_init_oops_partition(int 
rtas_partition_exists)
  * that we think will compress sufficiently to fit in the lnx,oops-log
  * partition.  If that's too much, go back and capture uncompressed text.
  */
-static void oops_to_nvram(struct kmsg_dumper *dumper,
- enum kmsg_dump_reason reason)
+static void oops_to_nvram(enum kmsg_dump_reason reason)
 {
struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf;
static unsigned int oops_count = 0;
+   static struct kmsg_dump_iter iter;
static bool panicking = false;
static DEFINE_SPINLOCK(lock);
unsigned long flags;
@@ -681,13 +680,14 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
return;
 
if (big_oops_buf) {
-   kmsg_dump_get_buffer(dumper, false,
+   kmsg_dump_rewind();
+   kmsg_dump_get_buffer(, false,
 big_oops_buf, big_oops_buf_sz, _len);
rc = zip_oops(text_len);
}
if (rc != 0) {
-   kmsg_dump_rewind(dumper);
-   kmsg_dump_get_buffer(dumper, false,
+   kmsg_dump_rewind();
+   kmsg_dump_get_buffer(, false,
 oops_data, oops_data_sz, _len);
err_type = ERR_TYPE_KERNEL_PANIC;
oops_hdr->version = cpu_to_be16(OOPS_HDR_VERSION);
diff --git a/arch/powerpc/platforms/powernv/opal-kmsg.c 
b/arch/powerpc/platforms/powernv/opal-kmsg.c
index 6c3bc4b4da98..a7bd6ac681f4 100644
--- a/arch/powerpc/platforms/powernv/opal-kmsg.c
+++ b/arch/powerpc/platforms/powernv/opal-kmsg.c
@@ -19,8 +19,7 @@
  * may not be completely printed.  This function does not actually dump the
  * message, it just ensures that OPAL completely flushes the console buffer.
  */
-static void kmsg_dump_opal_console_flush(struct kmsg_dumper *dumper,
-enum kmsg_dump_reason reason)
+static void kmsg_dump_opal_console_flush(enum kmsg_dump_reason reason)
 {
/*
 * Outside of a panic context the pollers will continue to run,
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 80ed3e1becf9..5978b90a885f 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3001,7 +3001,7 @@ print_address(unsigned long addr)
 static void
 dump_log_buf(void)
 {
-   struct kmsg_dumper dumper;
+   struct kmsg_dump_iter iter;
unsigned char buf[128];
size_t len;
 
@@ -3013,9 +3013,9 @@ dump_log_buf(void)
catch_memory_errors = 1;
sync();
 
-   kmsg_dump_rewind_nolock();
+   kmsg_dump_rewind_nolock();
xmon_start_pagination();
-   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), 
)) {
+   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), )) 
{
buf[len] = '\0';
printf("%s", buf);
}
diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
index

[PATCH next v3 08/15] printk: use seqcount_latch for clear_seq

2021-02-25 Thread John Ogness
kmsg_dump_rewind_nolock() locklessly reads @clear_seq. However,
this is not done atomically. Since @clear_seq is 64-bit, this
cannot be an atomic operation for all platforms. Therefore, use
a seqcount_latch to allow readers to always read a consistent
value.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 58 --
 1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a60f709896dd..b78b85947312 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -402,8 +402,21 @@ static u64 console_seq;
 static u64 exclusive_console_stop_seq;
 static unsigned long console_dropped;
 
-/* the next printk record to read after the last 'clear' command */
-static u64 clear_seq;
+struct latched_seq {
+   seqcount_latch_tlatch;
+   u64 val[2];
+};
+
+/*
+ * The next printk record to read after the last 'clear' command. There are
+ * two copies (updated with seqcount_latch) so that reads can locklessly
+ * access a valid value. Writers are synchronized by @logbuf_lock.
+ */
+static struct latched_seq clear_seq = {
+   .latch  = SEQCNT_LATCH_ZERO(clear_seq.latch),
+   .val[0] = 0,
+   .val[1] = 0,
+};
 
 #ifdef CONFIG_PRINTK_CALLER
 #define PREFIX_MAX 48
@@ -457,6 +470,31 @@ bool printk_percpu_data_ready(void)
return __printk_percpu_data_ready;
 }
 
+/* Must be called under logbuf_lock. */
+static void latched_seq_write(struct latched_seq *ls, u64 val)
+{
+   raw_write_seqcount_latch(>latch);
+   ls->val[0] = val;
+   raw_write_seqcount_latch(>latch);
+   ls->val[1] = val;
+}
+
+/* Can be called from any context. */
+static u64 latched_seq_read_nolock(struct latched_seq *ls)
+{
+   unsigned int seq;
+   unsigned int idx;
+   u64 val;
+
+   do {
+   seq = raw_read_seqcount_latch(>latch);
+   idx = seq & 0x1;
+   val = ls->val[idx];
+   } while (read_seqcount_latch_retry(>latch, seq));
+
+   return val;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -801,7 +839,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
 * changes no global state, and does not clear anything.
 */
-   user->seq = clear_seq;
+   user->seq = latched_seq_read_nolock(_seq);
break;
case SEEK_END:
/* after the last record */
@@ -960,6 +998,9 @@ void log_buf_vmcoreinfo_setup(void)
 
VMCOREINFO_SIZE(atomic_long_t);
VMCOREINFO_TYPE_OFFSET(atomic_long_t, counter);
+
+   VMCOREINFO_STRUCT_SIZE(latched_seq);
+   VMCOREINFO_OFFSET(latched_seq, val);
 }
 #endif
 
@@ -1557,7 +1598,8 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 * Find first record that fits, including all following records,
 * into the user-provided buffer for this dump.
 */
-   seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
+   seq = find_first_fitting_seq(latched_seq_read_nolock(_seq), -1,
+size, true, time);
 
prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
@@ -1584,7 +1626,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
}
 
if (clear)
-   clear_seq = seq;
+   latched_seq_write(_seq, seq);
logbuf_unlock_irq();
 
kfree(text);
@@ -1594,7 +1636,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 static void syslog_clear(void)
 {
logbuf_lock_irq();
-   clear_seq = prb_next_seq(prb);
+   latched_seq_write(_seq, prb_next_seq(prb));
logbuf_unlock_irq();
 }
 
@@ -3336,7 +3378,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
dumper->active = true;
 
logbuf_lock_irqsave(flags);
-   dumper->cur_seq = clear_seq;
+   dumper->cur_seq = latched_seq_read_nolock(_seq);
dumper->next_seq = prb_next_seq(prb);
logbuf_unlock_irqrestore(flags);
 
@@ -3534,7 +3576,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
  */
 void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
 {
-   dumper->cur_seq = clear_seq;
+   dumper->cur_seq = latched_seq_read_nolock(_seq);
dumper->next_seq = prb_next_seq(prb);
 }
 
-- 
2.20.1



[PATCH next v3 14/15] printk: kmsg_dump: remove _nolock() variants

2021-02-25 Thread John Ogness
kmsg_dump_rewind() and kmsg_dump_get_line() are lockless, so there is
no need for _nolock() variants. Remove these functions and switch all
callers of the _nolock() variants.

The functions without _nolock() were chosen because they are already
exported to kernel modules.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 arch/powerpc/xmon/xmon.c|  4 +--
 include/linux/kmsg_dump.h   | 16 --
 kernel/debug/kdb/kdb_main.c |  8 ++---
 kernel/printk/printk.c  | 60 +
 4 files changed, 14 insertions(+), 74 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 5978b90a885f..bf7d69625a2e 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3013,9 +3013,9 @@ dump_log_buf(void)
catch_memory_errors = 1;
sync();
 
-   kmsg_dump_rewind_nolock();
+   kmsg_dump_rewind();
xmon_start_pagination();
-   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), )) 
{
+   while (kmsg_dump_get_line(, false, buf, sizeof(buf), )) {
buf[len] = '\0';
printf("%s", buf);
}
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 5d3bf20f9f0a..532673b6570a 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -57,17 +57,12 @@ struct kmsg_dumper {
 #ifdef CONFIG_PRINTK
 void kmsg_dump(enum kmsg_dump_reason reason);
 
-bool kmsg_dump_get_line_nolock(struct kmsg_dump_iter *iter, bool syslog,
-  char *line, size_t size, size_t *len);
-
 bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
char *line, size_t size, size_t *len);
 
 bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool syslog,
  char *buf, size_t size, size_t *len_out);
 
-void kmsg_dump_rewind_nolock(struct kmsg_dump_iter *iter);
-
 void kmsg_dump_rewind(struct kmsg_dump_iter *iter);
 
 int kmsg_dump_register(struct kmsg_dumper *dumper);
@@ -80,13 +75,6 @@ static inline void kmsg_dump(enum kmsg_dump_reason reason)
 {
 }
 
-static inline bool kmsg_dump_get_line_nolock(struct kmsg_dump_iter *iter,
-bool syslog, const char *line,
-size_t size, size_t *len)
-{
-   return false;
-}
-
 static inline bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
const char *line, size_t size, size_t *len)
 {
@@ -99,10 +87,6 @@ static inline bool kmsg_dump_get_buffer(struct 
kmsg_dump_iter *iter, bool syslog
return false;
 }
 
-static inline void kmsg_dump_rewind_nolock(struct kmsg_dump_iter *iter)
-{
-}
-
 static inline void kmsg_dump_rewind(struct kmsg_dump_iter *iter)
 {
 }
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 8544d7a55a57..67d9f2403b52 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2126,8 +2126,8 @@ static int kdb_dmesg(int argc, const char **argv)
kdb_set(2, setargs);
}
 
-   kmsg_dump_rewind_nolock();
-   while (kmsg_dump_get_line_nolock(, 1, NULL, 0, NULL))
+   kmsg_dump_rewind();
+   while (kmsg_dump_get_line(, 1, NULL, 0, NULL))
n++;
 
if (lines < 0) {
@@ -2159,8 +2159,8 @@ static int kdb_dmesg(int argc, const char **argv)
if (skip >= n || skip < 0)
return 0;
 
-   kmsg_dump_rewind_nolock();
-   while (kmsg_dump_get_line_nolock(, 1, buf, sizeof(buf), )) {
+   kmsg_dump_rewind();
+   while (kmsg_dump_get_line(, 1, buf, sizeof(buf), )) {
if (skip) {
skip--;
continue;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 01385ea92e7c..15a9bc409e0a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3373,7 +3373,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 }
 
 /**
- * kmsg_dump_get_line_nolock - retrieve one kmsg log line (unlocked version)
+ * kmsg_dump_get_line - retrieve one kmsg log line
  * @iter: kmsg dump iterator
  * @syslog: include the "<4>" prefixes
  * @line: buffer to copy the line to
@@ -3388,18 +3388,18 @@ void kmsg_dump(enum kmsg_dump_reason reason)
  *
  * A return value of FALSE indicates that there are no more records to
  * read.
- *
- * The function is similar to kmsg_dump_get_line(), but grabs no locks.
  */
-bool kmsg_dump_get_line_nolock(struct kmsg_dump_iter *iter, bool syslog,
-  char *line, size_t size, size_t *len)
+bool kmsg_dump_get_line(struct kmsg_dump_iter *iter, bool syslog,
+   char *line, size_t size, size_t *len)
 {
struct printk_info info;
unsigned int line_count;
struct printk_record r;
+   unsigned long flags;
size_t l = 0;
bool ret = false;
 
+   printk_safe_enter_irqsave(flags);
   

[PATCH next v3 11/15] printk: kmsg_dumper: remove @active field

2021-02-25 Thread John Ogness
All 6 kmsg_dumpers do not benefit from the @active flag:

  (provide their own synchronization)
  - arch/powerpc/kernel/nvram_64.c
  - arch/um/kernel/kmsg_dump.c
  - drivers/mtd/mtdoops.c
  - fs/pstore/platform.c

  (only dump on KMSG_DUMP_PANIC, which does not require
  synchronization)
  - arch/powerpc/platforms/powernv/opal-kmsg.c
  - drivers/hv/vmbus_drv.c

The other 2 kmsg_dump users also do not rely on @active:

  (hard-code @active to always be true)
  - arch/powerpc/xmon/xmon.c
  - kernel/debug/kdb/kdb_main.c

Therefore, @active can be removed.

Signed-off-by: John Ogness 
---
 arch/powerpc/xmon/xmon.c|  2 +-
 include/linux/kmsg_dump.h   |  2 --
 kernel/debug/kdb/kdb_main.c |  2 +-
 kernel/printk/printk.c  | 10 +-
 4 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 3fe37495f63d..80ed3e1becf9 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3001,7 +3001,7 @@ print_address(unsigned long addr)
 static void
 dump_log_buf(void)
 {
-   struct kmsg_dumper dumper = { .active = 1 };
+   struct kmsg_dumper dumper;
unsigned char buf[128];
size_t len;
 
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 070c994ff19f..84eaa2090efa 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -36,7 +36,6 @@ enum kmsg_dump_reason {
  * through the record iterator
  * @max_reason:filter for highest reason number that should be dumped
  * @registered:Flag that specifies if this is already registered
- * @active:Flag that specifies if this is currently dumping
  * @cur_seq:   Points to the oldest message to dump
  * @next_seq:  Points after the newest message to dump
  */
@@ -44,7 +43,6 @@ struct kmsg_dumper {
struct list_head list;
void (*dump)(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason);
enum kmsg_dump_reason max_reason;
-   bool active;
bool registered;
 
/* private state of the kmsg iterator */
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 930ac1b25ec7..315169d5e119 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2101,7 +2101,7 @@ static int kdb_dmesg(int argc, const char **argv)
int adjust = 0;
int n = 0;
int skip = 0;
-   struct kmsg_dumper dumper = { .active = 1 };
+   struct kmsg_dumper dumper;
size_t len;
char buf[201];
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c2ed7db8930b..45cb3e9c62c5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3408,8 +3408,6 @@ void kmsg_dump(enum kmsg_dump_reason reason)
continue;
 
/* initialize iterator with data about the stored records */
-   dumper->active = true;
-
logbuf_lock_irqsave(flags);
dumper->cur_seq = latched_seq_read_nolock(_seq);
dumper->next_seq = prb_next_seq(prb);
@@ -3417,9 +3415,6 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 
/* invoke dumper which will iterate over records */
dumper->dump(dumper, reason);
-
-   /* reset iterator */
-   dumper->active = false;
}
rcu_read_unlock();
 }
@@ -3454,9 +3449,6 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper 
*dumper, bool syslog,
 
prb_rec_init_rd(, , line, size);
 
-   if (!dumper->active)
-   goto out;
-
/* Read text or count text lines? */
if (line) {
if (!prb_read_valid(prb, dumper->cur_seq, ))
@@ -3542,7 +3534,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
bool ret = false;
bool time = printk_time;
 
-   if (!dumper->active || !buf || !size)
+   if (!buf || !size)
goto out;
 
logbuf_lock_irqsave(flags);
-- 
2.20.1



[PATCH next v3 15/15] printk: console: remove unnecessary safe buffer usage

2021-02-25 Thread John Ogness
Upon registering a console, safe buffers are activated when setting
up the sequence number to replay the log. However, these are already
protected by @console_sem and @syslog_lock. Remove the unnecessary
safe buffer usage.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 15a9bc409e0a..27a748ed0bc7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2967,9 +2967,7 @@ void register_console(struct console *newcon)
/*
 * console_unlock(); will print out the buffered messages
 * for us.
-*/
-   printk_safe_enter_irqsave(flags);
-   /*
+*
 * We're about to replay the log buffer.  Only do this to the
 * just-registered console to avoid excessive message spam to
 * the already-registered consoles.
@@ -2982,11 +2980,9 @@ void register_console(struct console *newcon)
exclusive_console_stop_seq = console_seq;
 
/* Get a consistent copy of @syslog_seq. */
-   raw_spin_lock(_lock);
+   raw_spin_lock_irqsave(_lock, flags);
console_seq = syslog_seq;
-   raw_spin_unlock(_lock);
-
-   printk_safe_exit_irqrestore(flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
}
console_unlock();
console_sysfs_notify();
-- 
2.20.1



[PATCH next v3 05/15] printk: refactor kmsg_dump_get_buffer()

2021-02-25 Thread John Ogness
kmsg_dump_get_buffer() requires nearly the same logic as
syslog_print_all(), but uses different variable names and
does not make use of the ringbuffer loop macros. Modify
kmsg_dump_get_buffer() so that the implementation is as similar
to syslog_print_all() as possible.

A follow-up commit will move this common logic into a
separate helper function.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 include/linux/kmsg_dump.h |  2 +-
 kernel/printk/printk.c| 62 +--
 2 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index ae38035f1dca..070c994ff19f 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -62,7 +62,7 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool 
syslog,
char *line, size_t size, size_t *len);
 
 bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
- char *buf, size_t size, size_t *len);
+ char *buf, size_t size, size_t *len_out);
 
 void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 77ae2704e979..ed678d84dc51 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3410,7 +3410,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_line);
  * @syslog: include the "<4>" prefixes
  * @buf: buffer to copy the line to
  * @size: maximum size of the buffer
- * @len: length of line placed into buffer
+ * @len_out: length of line placed into buffer
  *
  * Start at the end of the kmsg buffer and fill the provided buffer
  * with as many of the *youngest* kmsg records that fit into it.
@@ -3424,7 +3424,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_line);
  * read.
  */
 bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
- char *buf, size_t size, size_t *len)
+ char *buf, size_t size, size_t *len_out)
 {
struct printk_info info;
unsigned int line_count;
@@ -3432,12 +3432,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
unsigned long flags;
u64 seq;
u64 next_seq;
-   size_t l = 0;
+   size_t len = 0;
bool ret = false;
bool time = printk_time;
 
-   prb_rec_init_rd(, , buf, size);
-
if (!dumper->active || !buf || !size)
goto out;
 
@@ -3455,48 +3453,54 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
goto out;
}
 
-   /* calculate length of entire buffer */
-   seq = dumper->cur_seq;
-   while (prb_read_valid_info(prb, seq, , _count)) {
-   if (r.info->seq >= dumper->next_seq)
+   /*
+* Find first record that fits, including all following records,
+* into the user-provided buffer for this dump.
+*/
+
+   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
+   if (info.seq >= dumper->next_seq)
break;
-   l += get_record_print_text_size(, line_count, syslog, 
time);
-   seq = r.info->seq + 1;
+   len += get_record_print_text_size(, line_count, syslog, 
time);
}
 
-   /* move first record forward until length fits into the buffer */
-   seq = dumper->cur_seq;
-   while (l >= size && prb_read_valid_info(prb, seq,
-   , _count)) {
-   if (r.info->seq >= dumper->next_seq)
+   /*
+* Move first record forward until length fits into the buffer. Ignore
+* newest messages that were not counted in the above cycle. Messages
+* might appear and get lost in the meantime. This is the best effort
+* that prevents an infinite loop.
+*/
+   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
+   if (len < size || info.seq >= dumper->next_seq)
break;
-   l -= get_record_print_text_size(, line_count, syslog, 
time);
-   seq = r.info->seq + 1;
+   len -= get_record_print_text_size(, line_count, syslog, 
time);
}
 
-   /* last message in next interation */
+   /*
+* Next kmsg_dump_get_buffer() invocation will dump block of
+* older records stored right before this one.
+*/
next_seq = seq;
 
-   /* actually read text into the buffer now */
-   l = 0;
-   while (prb_read_valid(prb, seq, )) {
+   prb_rec_init_rd(, , buf, size);
+
+   len = 0;
+   prb_for_each_record(seq, prb, seq, ) {
if (r.info->seq >= dumper->next_seq)
break;
 
-   l += record_print_text(, syslog, time);
+   len += record_print_text(, syslog, time);
 
-   /* adjust record t

[PATCH next v3 09/15] printk: use atomic64_t for devkmsg_user.seq

2021-02-25 Thread John Ogness
@user->seq is indirectly protected by @logbuf_lock. Once @logbuf_lock
is removed, @user->seq will be no longer safe from an atomicity point
of view.

In preparation for the removal of @logbuf_lock, change it to
atomic64_t to provide this safety.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b78b85947312..82d89eec4aac 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -662,7 +662,7 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
-   u64 seq;
+   atomic64_t seq;
struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
@@ -763,7 +763,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
return ret;
 
logbuf_lock_irq();
-   if (!prb_read_valid(prb, user->seq, r)) {
+   if (!prb_read_valid(prb, atomic64_read(>seq), r)) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
logbuf_unlock_irq();
@@ -772,15 +772,15 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
 
logbuf_unlock_irq();
ret = wait_event_interruptible(log_wait,
-   prb_read_valid(prb, user->seq, r));
+   prb_read_valid(prb, atomic64_read(>seq), 
r));
if (ret)
goto out;
logbuf_lock_irq();
}
 
-   if (r->info->seq != user->seq) {
+   if (r->info->seq != atomic64_read(>seq)) {
/* our last seen message is gone, return error and reset */
-   user->seq = r->info->seq;
+   atomic64_set(>seq, r->info->seq);
ret = -EPIPE;
logbuf_unlock_irq();
goto out;
@@ -791,7 +791,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
  >text_buf[0], r->info->text_len,
  >info->dev_info);
 
-   user->seq = r->info->seq + 1;
+   atomic64_set(>seq, r->info->seq + 1);
logbuf_unlock_irq();
 
if (len > count) {
@@ -831,7 +831,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
switch (whence) {
case SEEK_SET:
/* the first record */
-   user->seq = prb_first_valid_seq(prb);
+   atomic64_set(>seq, prb_first_valid_seq(prb));
break;
case SEEK_DATA:
/*
@@ -839,11 +839,11 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
 * changes no global state, and does not clear anything.
 */
-   user->seq = latched_seq_read_nolock(_seq);
+   atomic64_set(>seq, latched_seq_read_nolock(_seq));
break;
case SEEK_END:
/* after the last record */
-   user->seq = prb_next_seq(prb);
+   atomic64_set(>seq, prb_next_seq(prb));
break;
default:
ret = -EINVAL;
@@ -864,9 +864,9 @@ static __poll_t devkmsg_poll(struct file *file, poll_table 
*wait)
poll_wait(file, _wait, wait);
 
logbuf_lock_irq();
-   if (prb_read_valid_info(prb, user->seq, , NULL)) {
+   if (prb_read_valid_info(prb, atomic64_read(>seq), , NULL)) {
/* return error when data has vanished underneath us */
-   if (info.seq != user->seq)
+   if (info.seq != atomic64_read(>seq))
ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
else
ret = EPOLLIN|EPOLLRDNORM;
@@ -905,7 +905,7 @@ static int devkmsg_open(struct inode *inode, struct file 
*file)
>text_buf[0], sizeof(user->text_buf));
 
logbuf_lock_irq();
-   user->seq = prb_first_valid_seq(prb);
+   atomic64_set(>seq, prb_first_valid_seq(prb));
logbuf_unlock_irq();
 
file->private_data = user;
-- 
2.20.1



[PATCH next v3 03/15] printk: limit second loop of syslog_print_all

2021-02-25 Thread John Ogness
The second loop of syslog_print_all() subtracts lengths that were
added in the first loop. With commit b031a684bfd0 ("printk: remove
logbuf_lock writer-protection of ringbuffer") it is possible that
records are (over)written during syslog_print_all(). This allows the
possibility of the second loop subtracting lengths that were never
added in the first loop.

This situation can result in syslog_print_all() filling the buffer
starting from a later record, even though there may have been room
to fit the earlier record(s) as well.

Fixes: b031a684bfd0 ("printk: remove logbuf_lock writer-protection of 
ringbuffer")
Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 575a34b88936..77ae2704e979 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1494,6 +1494,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
struct printk_info info;
unsigned int line_count;
struct printk_record r;
+   u64 max_seq;
char *text;
int len = 0;
u64 seq;
@@ -1512,9 +1513,15 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
prb_for_each_info(clear_seq, prb, seq, , _count)
len += get_record_print_text_size(, line_count, true, 
time);
 
+   /*
+* Set an upper bound for the next loop to avoid subtracting lengths
+* that were never added.
+*/
+   max_seq = seq;
+
/* move first record forward until length fits into the buffer */
prb_for_each_info(clear_seq, prb, seq, , _count) {
-   if (len <= size)
+   if (len <= size || info.seq >= max_seq)
break;
len -= get_record_print_text_size(, line_count, true, 
time);
}
-- 
2.20.1



[PATCH next v3 00/15] printk: remove logbuf_lock

2021-02-25 Thread John Ogness
Hello,

Here is v3 of a series to remove @logbuf_lock, exposing the
ringbuffer locklessly to both readers and writers. v2 is here [0].

Since @logbuf_lock was protecting much more than just the
ringbuffer, this series clarifies and cleans up the various
protections using comments, lockless accessors, atomic types, and a
new finer-grained @syslog_lock.

Removing @logbuf_lock required changing the semantics of the
kmsg_dumper callback in order to work locklessly. Since this
involved touching all the kmsg_dump users, we also decided [1] to
use this opportunity to clean up and clarify the kmsg_dump semantics
in general.

This series is based on next-20210225.

Changes since v2:

- use get_maintainer.pl to get the full list of developers that
  should at least see the changes in their respective areas

- do not disable interrupts in arch/um kmsg_dumper (because there is
  no need to)

- protect the mtd/mtdoops kmsg_dumper buffer against concurrent
  dumps

- update kerneldoc for kmsg_dump_get_line() (@len_out)

- remove ksmg_dump's @active flag

- change kmsg_dumper callback to:
  void (*dump)(enum kmsg_dump_reason reason);

- rename kmsg_dumper_iter to kmsg_dump_iter

- update kmsg_dumpers to use their own kmsg_dump_iter (and
  initialize it with kmsg_dump_rewind() if necessary)

John Ogness

[0] https://lkml.kernel.org/r/20210218081817.28849-1-john.ogn...@linutronix.de
[1] https://lkml.kernel.org/r/YDeZAA08NKCHa4s%2F@alley

John Ogness (15):
  um: synchronize kmsg_dumper
  mtd: mtdoops: synchronize kmsg_dumper
  printk: limit second loop of syslog_print_all
  printk: kmsg_dump: remove unused fields
  printk: refactor kmsg_dump_get_buffer()
  printk: consolidate kmsg_dump_get_buffer/syslog_print_all code
  printk: introduce CONSOLE_LOG_MAX for improved multi-line support
  printk: use seqcount_latch for clear_seq
  printk: use atomic64_t for devkmsg_user.seq
  printk: add syslog_lock
  printk: kmsg_dumper: remove @active field
  printk: introduce a kmsg_dump iterator
  printk: remove logbuf_lock
  printk: kmsg_dump: remove _nolock() variants
  printk: console: remove unnecessary safe buffer usage

 arch/powerpc/kernel/nvram_64.c |  14 +-
 arch/powerpc/platforms/powernv/opal-kmsg.c |   3 +-
 arch/powerpc/xmon/xmon.c   |   6 +-
 arch/um/kernel/kmsg_dump.c |  15 +-
 drivers/hv/vmbus_drv.c |   7 +-
 drivers/mtd/mtdoops.c  |  20 +-
 fs/pstore/platform.c   |   8 +-
 include/linux/kmsg_dump.h  |  49 +--
 kernel/debug/kdb/kdb_main.c|  10 +-
 kernel/printk/internal.h   |   4 +-
 kernel/printk/printk.c | 456 ++---
 kernel/printk/printk_safe.c|  27 +-
 12 files changed, 309 insertions(+), 310 deletions(-)

-- 
2.20.1



[PATCH next v3 02/15] mtd: mtdoops: synchronize kmsg_dumper

2021-02-25 Thread John Ogness
The kmsg_dumper can be called from any context and CPU, possibly
from multiple CPUs simultaneously. Since the writing of the buffer
can occur from a later scheduled work queue, the oops buffer must
be protected against simultaneous dumping.

Use an atomic bit to mark when the buffer is protected. Release the
protection in between setting the buffer and the actual writing in
order for a possible panic (immediate write) to be written during
the scheduling of a previous oops (delayed write).

Signed-off-by: John Ogness 
---
 drivers/mtd/mtdoops.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/mtdoops.c b/drivers/mtd/mtdoops.c
index 774970bfcf85..8bbfba40a554 100644
--- a/drivers/mtd/mtdoops.c
+++ b/drivers/mtd/mtdoops.c
@@ -52,6 +52,7 @@ static struct mtdoops_context {
int nextcount;
unsigned long *oops_page_used;
 
+   unsigned long oops_buf_busy;
void *oops_buf;
 } oops_cxt;
 
@@ -180,6 +181,9 @@ static void mtdoops_write(struct mtdoops_context *cxt, int 
panic)
u32 *hdr;
int ret;
 
+   if (test_and_set_bit(0, >oops_buf_busy))
+   return;
+
/* Add mtdoops header to the buffer */
hdr = cxt->oops_buf;
hdr[0] = cxt->nextcount;
@@ -190,7 +194,7 @@ static void mtdoops_write(struct mtdoops_context *cxt, int 
panic)
  record_size, , cxt->oops_buf);
if (ret == -EOPNOTSUPP) {
printk(KERN_ERR "mtdoops: Cannot write from panic 
without panic_write\n");
-   return;
+   goto out;
}
} else
ret = mtd_write(mtd, cxt->nextpage * record_size,
@@ -203,6 +207,8 @@ static void mtdoops_write(struct mtdoops_context *cxt, int 
panic)
memset(cxt->oops_buf, 0xff, record_size);
 
mtdoops_inc_counter(cxt);
+out:
+   clear_bit(0, >oops_buf_busy);
 }
 
 static void mtdoops_workfunc_write(struct work_struct *work)
@@ -276,8 +282,11 @@ static void mtdoops_do_dump(struct kmsg_dumper *dumper,
if (reason == KMSG_DUMP_OOPS && !dump_oops)
return;
 
+   if (test_and_set_bit(0, >oops_buf_busy))
+   return;
kmsg_dump_get_buffer(dumper, true, cxt->oops_buf + MTDOOPS_HEADER_SIZE,
 record_size - MTDOOPS_HEADER_SIZE, NULL);
+   clear_bit(0, >oops_buf_busy);
 
if (reason != KMSG_DUMP_OOPS) {
/* Panics must be written immediately */
@@ -394,6 +403,7 @@ static int __init mtdoops_init(void)
return -ENOMEM;
}
memset(cxt->oops_buf, 0xff, record_size);
+   cxt->oops_buf_busy = 0;
 
INIT_WORK(>work_erase, mtdoops_workfunc_erase);
INIT_WORK(>work_write, mtdoops_workfunc_write);
-- 
2.20.1



[PATCH next v3 04/15] printk: kmsg_dump: remove unused fields

2021-02-25 Thread John Ogness
struct kmsg_dumper still contains some fields that were used to
iterate the old ringbuffer. They are no longer used. Remove them
and update the struct documentation.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 include/linux/kmsg_dump.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 3378bcbe585e..ae38035f1dca 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -36,6 +36,9 @@ enum kmsg_dump_reason {
  * through the record iterator
  * @max_reason:filter for highest reason number that should be dumped
  * @registered:Flag that specifies if this is already registered
+ * @active:Flag that specifies if this is currently dumping
+ * @cur_seq:   Points to the oldest message to dump
+ * @next_seq:  Points after the newest message to dump
  */
 struct kmsg_dumper {
struct list_head list;
@@ -45,8 +48,6 @@ struct kmsg_dumper {
bool registered;
 
/* private state of the kmsg iterator */
-   u32 cur_idx;
-   u32 next_idx;
u64 cur_seq;
u64 next_seq;
 };
-- 
2.20.1



[PATCH next v3 01/15] um: synchronize kmsg_dumper

2021-02-25 Thread John Ogness
The kmsg_dumper can be called from any context and CPU, possibly
from multiple CPUs simultaneously. Since a static buffer is used
to retrieve the kernel logs, this buffer must be protected against
simultaneous dumping. Skip dumping if another context is already
dumping.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 arch/um/kernel/kmsg_dump.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
index 6516ef1f8274..4869e2cc787c 100644
--- a/arch/um/kernel/kmsg_dump.c
+++ b/arch/um/kernel/kmsg_dump.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -9,6 +10,7 @@
 static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
enum kmsg_dump_reason reason)
 {
+   static DEFINE_SPINLOCK(lock);
static char line[1024];
struct console *con;
size_t len = 0;
@@ -29,11 +31,16 @@ static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
if (con)
return;
 
+   if (!spin_trylock())
+   return;
+
printf("kmsg_dump:\n");
while (kmsg_dump_get_line(dumper, true, line, sizeof(line), )) {
line[len] = '\0';
printf("%s", line);
}
+
+   spin_unlock();
 }
 
 static struct kmsg_dumper kmsg_dumper = {
-- 
2.20.1



Re: synchronization model: was: Re: [PATCH printk-rework 09/14] printk: introduce a kmsg_dump iterator

2021-02-24 Thread John Ogness
On 2021-02-24, John Ogness  wrote:
> The @active flag is useless. It should be removed.

I would like to clarify my statement, because the @active flag _did_
protect the arch/um dumper until now. (Although it didn't actually
matter because arch/um does not have SMP or preemption support.)

In mainline we have 6 dumpers. They can be classified as follows:

1. Dumpers that provide their own synchronization to protect against
   parallel or nested dump() calls.

   - arch/powerpc/kernel/nvram_64.c
   - fs/pstore/platform.c
   - arch/um/kernel/kmsg_dump.c (after this series)

2. Dumpers that are safe because they only dump on KMSG_DUMP_PANIC,
   which (currently) can never happen in parallel or nested.

   - arch/powerpc/platforms/powernv/opal-kmsg.c
   - drivers/hv/vmbus_drv.c

3. Dumpers that are unsafe and even @active did not provide the needed
   synchronization.

   - drivers/mtd/mtdoops.c

In all 6 dumpers, @action does not provide any help. That is why it can
be removed.

But I am concerned about drivers/mtd/mtdoops.c that does not have any
synchronization. Since my series is adding sychronization to
arch/um/kernel/kmsg_dump.c, I suppose it should also add it to
drivers/mtd/mtdoops.c also.

And rather than moving the useless @active from kmsg_dumper to
kmsg_dump_iter, I should just drop it.

Unless there are any objections, I will make these changes for my v3.

John Ogness


Re: synchronization model: was: Re: [PATCH printk-rework 09/14] printk: introduce a kmsg_dump iterator

2021-02-24 Thread John Ogness
On 2021-02-19, Petr Mladek  wrote:
> This is likely beyond the scope of this patchset.

It would be beyond the scope of this patchset because it is not related
to logbuf_lock removal.

> I am still scratching my head about the synchronization if these dumpers.
>
> There is the "active" flag. It has been introduced by the commit
> e2ae715d66bf4becfb ("kmsg - kmsg_dump() use iterator to receive log
> buffer content"). I do not see any explanation there.
>
> It might prevent some misuse of the API. But the synchronization
> model is not much clear:
>
>   + cur_seq and next_seq might be manipulated by
> kmsg_dump_rewind() even when the flag is not set.
>
>   + It is possible to use the same dumper more times in parallel.
> The API will fill the provided buffer of all callers
> as long as the active flag is set.
>
>   + The "active" flag does not synchronize other operations with
> the provided buffer. The "dump" callback is responsible
> to provide some synchronization on its own.
>
> In fact, it is not much clear how struct kmsg_dumper_iter, struct kmsg_dumper,
> and the used buffers are connected with each other and synchronized.

With this series applied, there is no connection between them. And
actually you have made me realize that the iterator should be named
"kmsg_dump_iter" instead. I will change that for v3.

> It might some sense to have the iterator in a separate structure.
> But the only safe scenario seems to be when all these three things
> (both structures and the buffer) are connected together and
> synchronized by the same lock. Also the "active" flag does not look
> much helpful and can be removed.

The @active flag is useless. It should be removed.

We have kmsg_dump_get_line(), kmsg_dump_get_buffer(), kmsg_dump_rewind()
as an in-kernel interface to allow retrieving the kernel buffer
contents. To use these interfaces, the caller only needs to have an
iterator that is initialized using kmsg_dump_rewind(). These functions
can be (and are) used, regardless if a dumper has been registered. And I
think that is OK.

The used buffers (like the iterator) are local to the caller. So there
is no need for the kmsg_dump_*() functions to be concerned about any
synchronization there.

Then we have kmsg_dump_register() and kmsg_dump_unregister() to allow
for registration of a dump() callback, to be called when the kernel does
panic/oops/emergency/shutdown. Presumably the registered callback would
use the kmsg_dump_*() functions to access the kernel buffer. Again, no
need for kmsg_dump_*() functions to be concerned about synchronization
because the buffers are provided by the callbacks.

> As I said, this is likely beyond this patchset. This patch does more
> or less just a refactoring and helps to understand the dependencies.

Aside from removing the useless @active flag, I am not sure what else
you would want to change. Perhaps just fixup the comments/documentation
to clarify these interfaces and what their purpose is.

John Ogness


Re: [PATCH printk-rework 08/14] printk: add syslog_lock

2021-02-22 Thread John Ogness
On 2021-02-22, Petr Mladek  wrote:
>>>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>>>> index 20c21a25143d..401df370832b 100644
>>>> --- a/kernel/printk/printk.c
>>>> +++ b/kernel/printk/printk.c
>>>> +/* Return a consistent copy of @syslog_seq. */
>>>> +static u64 read_syslog_seq_irq(void)
>>>> +{
>>>> +  u64 seq;
>>>> +
>>>> +  raw_spin_lock_irq(_lock);
>>>> +  seq = syslog_seq;
>>>> +  raw_spin_unlock_irq(_lock);
>>>
>>> Is there any particular reason to disable interrupts here?
>>>
>>> It would make sense only when the lock could be taken in IRQ
>>> context. Then we would need to always disable interrupts when
>>> the lock is taken. And if it is taken in IRQ context, we would
>>> need to safe flags.
>
> Note that console_lock was a spinlock in 2.3.15.pre1. I see it defined
> in kernel/printk.c as:
>
> spinlock_t console_lock = SPIN_LOCK_UNLOCKED;
>
> But it is a sleeping semaphore these days. As a result,
> register_console(), as it is now, must not be called in an interrupt
> context.

OK. So I will change read_syslog_seq_irq() to not disable interrupts. As
you suggested, we can fix the rest when we remove the safe buffers.

John Ogness


Re: [PATCH printk-rework 08/14] printk: add syslog_lock

2021-02-19 Thread John Ogness
Added CC: linux-par...@vger.kernel.org

On 2021-02-19, John Ogness  wrote:
>>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>>> index 20c21a25143d..401df370832b 100644
>>> --- a/kernel/printk/printk.c
>>> +++ b/kernel/printk/printk.c
>>> +/* Return a consistent copy of @syslog_seq. */
>>> +static u64 read_syslog_seq_irq(void)
>>> +{
>>> +   u64 seq;
>>> +
>>> +   raw_spin_lock_irq(_lock);
>>> +   seq = syslog_seq;
>>> +   raw_spin_unlock_irq(_lock);
>>
>> Is there any particular reason to disable interrupts here?
>>
>> It would make sense only when the lock could be taken in IRQ
>> context. Then we would need to always disable interrupts when
>> the lock is taken. And if it is taken in IRQ context, we would
>> need to safe flags.
>
> All other instances of locking @syslog_lock are done with interrupts
> disabled. And we have:
>
> register_console()
>   logbuf_lock_irqsave()
> raw_spin_lock(_lock)
>
> I suppose I need to go through all the console drivers to see if any
> register in interrupt context. If not, that logbuf_lock_irqsave()
> should be replaced with logbuf_lock_irq(). And then locking
> @syslog_lock will not need to disable interrupts.

I found a possible call chain in interrupt context. From arch/parisc
there is the interrupt handler:

handle_interruption(code=1) /* High-priority machine check (HPMC) */
  pdc_console_restart()
pdc_console_init_force()
  register_console()

All other register_console() calls in the kernel are either during init
(within __init sections and probe functions) or are clearly not in
interrupt context (using mutex, kzalloc, spin_lock_irq, etc).

I am not familiar with parisc, but I am assuming handle_interruption()
is always called with interrupts disabled (unless the HPMC interrupt is
somehow an exception).

John Ogness


Re: [PATCH printk-rework 08/14] printk: add syslog_lock

2021-02-19 Thread John Ogness
On 2021-02-19, Petr Mladek  wrote:
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 20c21a25143d..401df370832b 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> +/* Return a consistent copy of @syslog_seq. */
>> +static u64 read_syslog_seq_irq(void)
>> +{
>> +u64 seq;
>> +
>> +raw_spin_lock_irq(_lock);
>> +seq = syslog_seq;
>> +raw_spin_unlock_irq(_lock);
>
> Is there any particular reason to disable interrupts here?
>
> It would make sense only when the lock could be taken in IRQ
> context. Then we would need to always disable interrupts when
> the lock is taken. And if it is taken in IRQ context, we would
> need to safe flags.

All other instances of locking @syslog_lock are done with interrupts
disabled. And we have:

register_console()
  logbuf_lock_irqsave()
raw_spin_lock(_lock)

Looking back through history, I found that locking of the "console lock"
in register_console() was changed from spin_lock_irq() to
spin_lock_irqsave() for 2.3.15pre1 [0]. The only reason I can find why
that was done is because sparc64 was regstering its console in a PROM
callback (the comments there: "Pretty sick eh?").

Today sparc64 is setting up the console in init code. I suppose I need
to go through all the console drivers to see if any register in
interrupt context. If not, that logbuf_lock_irqsave() should be replaced
with logbuf_lock_irq(). And then locking @syslog_lock will not need to
disable interrupts.

John Ogness

[0] 
https://github.com/schwabe/davej-history/commit/f91c3404ba16c88cdb33824bf0249c6263cd4465#diff-84036d1e27f4207c783a3b876aef4e45340d30f43b1319bca382f5775a9b14beL348


[PATCH printk-rework 09/14] printk: introduce a kmsg_dump iterator

2021-02-18 Thread John Ogness
Rather than store the iterator information into the registered
kmsg_dump structure, create a separate iterator structure. The
kmsg_dump_iter structure can reside on the stack of the caller,
thus allowing lockless use of the kmsg_dump functions.

This is in preparation for removal of @logbuf_lock.

Signed-off-by: John Ogness 
---
 arch/powerpc/kernel/nvram_64.c | 12 ++--
 arch/powerpc/platforms/powernv/opal-kmsg.c |  3 +-
 arch/powerpc/xmon/xmon.c   |  6 +-
 arch/um/kernel/kmsg_dump.c |  5 +-
 drivers/hv/vmbus_drv.c |  5 +-
 drivers/mtd/mtdoops.c  |  5 +-
 fs/pstore/platform.c   |  5 +-
 include/linux/kmsg_dump.h  | 43 +++---
 kernel/debug/kdb/kdb_main.c| 10 ++--
 kernel/printk/printk.c | 65 +++---
 10 files changed, 84 insertions(+), 75 deletions(-)

diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
index 532f22637783..1ef55f4b389a 100644
--- a/arch/powerpc/kernel/nvram_64.c
+++ b/arch/powerpc/kernel/nvram_64.c
@@ -73,7 +73,8 @@ static const char *nvram_os_partitions[] = {
 };
 
 static void oops_to_nvram(struct kmsg_dumper *dumper,
- enum kmsg_dump_reason reason);
+ enum kmsg_dump_reason reason,
+ struct kmsg_dumper_iter *iter);
 
 static struct kmsg_dumper nvram_kmsg_dumper = {
.dump = oops_to_nvram
@@ -643,7 +644,8 @@ void __init nvram_init_oops_partition(int 
rtas_partition_exists)
  * partition.  If that's too much, go back and capture uncompressed text.
  */
 static void oops_to_nvram(struct kmsg_dumper *dumper,
- enum kmsg_dump_reason reason)
+ enum kmsg_dump_reason reason,
+ struct kmsg_dumper_iter *iter)
 {
struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf;
static unsigned int oops_count = 0;
@@ -681,13 +683,13 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
return;
 
if (big_oops_buf) {
-   kmsg_dump_get_buffer(dumper, false,
+   kmsg_dump_get_buffer(iter, false,
 big_oops_buf, big_oops_buf_sz, _len);
rc = zip_oops(text_len);
}
if (rc != 0) {
-   kmsg_dump_rewind(dumper);
-   kmsg_dump_get_buffer(dumper, false,
+   kmsg_dump_rewind(iter);
+   kmsg_dump_get_buffer(iter, false,
 oops_data, oops_data_sz, _len);
err_type = ERR_TYPE_KERNEL_PANIC;
oops_hdr->version = cpu_to_be16(OOPS_HDR_VERSION);
diff --git a/arch/powerpc/platforms/powernv/opal-kmsg.c 
b/arch/powerpc/platforms/powernv/opal-kmsg.c
index 6c3bc4b4da98..ec862846bc82 100644
--- a/arch/powerpc/platforms/powernv/opal-kmsg.c
+++ b/arch/powerpc/platforms/powernv/opal-kmsg.c
@@ -20,7 +20,8 @@
  * message, it just ensures that OPAL completely flushes the console buffer.
  */
 static void kmsg_dump_opal_console_flush(struct kmsg_dumper *dumper,
-enum kmsg_dump_reason reason)
+enum kmsg_dump_reason reason,
+struct kmsg_dumper_iter *iter)
 {
/*
 * Outside of a panic context the pollers will continue to run,
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 55c43a6c9111..43162b885259 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3003,7 +3003,7 @@ print_address(unsigned long addr)
 static void
 dump_log_buf(void)
 {
-   struct kmsg_dumper dumper = { .active = 1 };
+   struct kmsg_dumper_iter iter = { .active = 1 };
unsigned char buf[128];
size_t len;
 
@@ -3015,9 +3015,9 @@ dump_log_buf(void)
catch_memory_errors = 1;
sync();
 
-   kmsg_dump_rewind_nolock();
+   kmsg_dump_rewind_nolock();
xmon_start_pagination();
-   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), 
)) {
+   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), )) 
{
buf[len] = '\0';
printf("%s", buf);
}
diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
index e4abac6c9727..f38349ad00ea 100644
--- a/arch/um/kernel/kmsg_dump.c
+++ b/arch/um/kernel/kmsg_dump.c
@@ -6,7 +6,8 @@
 #include 
 
 static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
-   enum kmsg_dump_reason reason)
+   enum kmsg_dump_reason reason,
+   struct kmsg_dumper_iter *iter)
 {
static char line[1024];
struct console *con;
@@ -25,7 +26,7 @@ static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
return;
 
printf("kmsg_dump:\

[PATCH printk-rework 01/14] printk: limit second loop of syslog_print_all

2021-02-18 Thread John Ogness
The second loop of syslog_print_all() subtracts lengths that were
added in the first loop. With commit b031a684bfd0 ("printk: remove
logbuf_lock writer-protection of ringbuffer") it is possible that
records are (over)written during syslog_print_all(). This allows the
possibility of the second loop subtracting lengths that were never
added in the first loop.

This situation can result in syslog_print_all() filling the buffer
starting from a later record, even though there may have been room
to fit the earlier record(s) as well.

Fixes: b031a684bfd0 ("printk: remove logbuf_lock writer-protection of 
ringbuffer")
Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7239d169bbe..411787b900ac 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1495,6 +1495,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
struct printk_info info;
unsigned int line_count;
struct printk_record r;
+   u64 max_seq;
char *text;
int len = 0;
u64 seq;
@@ -1513,9 +1514,15 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
prb_for_each_info(clear_seq, prb, seq, , _count)
len += get_record_print_text_size(, line_count, true, 
time);
 
+   /*
+* Set an upper bound for the next loop to avoid subtracting lengths
+* that were never added.
+*/
+   max_seq = seq;
+
/* move first record forward until length fits into the buffer */
prb_for_each_info(clear_seq, prb, seq, , _count) {
-   if (len <= size)
+   if (len <= size || info.seq >= max_seq)
break;
len -= get_record_print_text_size(, line_count, true, 
time);
}
-- 
2.20.1



[PATCH printk-rework 11/14] printk: remove logbuf_lock

2021-02-18 Thread John Ogness
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.

This means that printk_nmi_direct and printk_safe_flush_on_panic()
no longer need to acquire any lock to run.

@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.

Signed-off-by: John Ogness 
---
 kernel/printk/internal.h|   4 +-
 kernel/printk/printk.c  | 116 
 kernel/printk/printk_safe.c |  29 +++--
 3 files changed, 47 insertions(+), 102 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 3a8fd491758c..e7acc2888c8e 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -12,8 +12,6 @@
 
 #define PRINTK_NMI_CONTEXT_OFFSET  0x01000
 
-extern raw_spinlock_t logbuf_lock;
-
 __printf(4, 0)
 int vprintk_store(int facility, int level,
  const struct dev_printk_info *dev_info,
@@ -59,7 +57,7 @@ void defer_console_output(void);
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; }
 
 /*
- * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem
+ * In !PRINTK builds we still export console_sem
  * semaphore and some of console functions (console_unlock()/etc.), so
  * printk-safe must preserve the existing local IRQ guarantees.
  */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3ad1f9bcaaa1..c5ea46ed88c7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -355,41 +355,6 @@ enum log_flags {
LOG_CONT= 8,/* text is a fragment of a continuation line */
 };
 
-/*
- * The logbuf_lock protects kmsg buffer, indices, counters.  This can be taken
- * within the scheduler's rq lock. It must be released before calling
- * console_unlock() or anything else that might wake up a process.
- */
-DEFINE_RAW_SPINLOCK(logbuf_lock);
-
-/*
- * Helper macros to lock/unlock logbuf_lock and switch between
- * printk-safe/unsafe modes.
- */
-#define logbuf_lock_irq()  \
-   do {\
-   printk_safe_enter_irq();\
-   raw_spin_lock(_lock);\
-   } while (0)
-
-#define logbuf_unlock_irq()\
-   do {\
-   raw_spin_unlock(_lock);  \
-   printk_safe_exit_irq(); \
-   } while (0)
-
-#define logbuf_lock_irqsave(flags) \
-   do {\
-   printk_safe_enter_irqsave(flags);   \
-   raw_spin_lock(_lock);\
-   } while (0)
-
-#define logbuf_unlock_irqrestore(flags)\
-   do {\
-   raw_spin_unlock(_lock);  \
-   printk_safe_exit_irqrestore(flags); \
-   } while (0)
-
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
 static DEFINE_RAW_SPINLOCK(syslog_lock);
 
@@ -401,6 +366,7 @@ static u64 syslog_seq;
 static size_t syslog_partial;
 static bool syslog_time;
 
+/* All 3 protected by @console_sem. */
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u64 exclusive_console_stop_seq;
@@ -767,27 +733,27 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
if (ret)
return ret;
 
-   logbuf_lock_irq();
+   printk_safe_enter_irq();
if (!prb_read_valid(prb, atomic64_read(>seq), r)) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
goto out;
}
 
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
ret = wait_event_interruptible(log_wait,
prb_read_valid(prb, atomic64_read(>seq), 
r));
if (ret)
goto out;
-   logbuf_lock_irq();
+   printk_safe_enter_irq();
}
 
if (r->info->seq != atomic64_read(>seq)) {
/* our last seen message is gone, return error and reset */
atomic64_set(>seq, r->info->seq);
ret = -EPIPE;
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
goto out;
}
 
@@ -797,7 +763,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
  >info->dev_info);
 
atomic64_set(>seq, r->info->seq + 1);
-   logbuf_unlock_irq();
+   printk_safe_exit_irq();
 
if (len > count) {
ret = -EINVAL;
@@ -832,7 +798,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int w

[PATCH printk-rework 14/14] printk: console: remove unnecessary safe buffer usage

2021-02-18 Thread John Ogness
Upon registering a console, safe buffers are activated when setting
up the sequence number to replay the log. However, these are already
protected by @console_sem and @syslog_lock. Remove the unnecessary
safe buffer usage.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 23d525e885e7..78eee6c553a5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2961,9 +2961,7 @@ void register_console(struct console *newcon)
/*
 * console_unlock(); will print out the buffered messages
 * for us.
-*/
-   printk_safe_enter_irqsave(flags);
-   /*
+*
 * We're about to replay the log buffer.  Only do this to the
 * just-registered console to avoid excessive message spam to
 * the already-registered consoles.
@@ -2976,11 +2974,9 @@ void register_console(struct console *newcon)
exclusive_console_stop_seq = console_seq;
 
/* Get a consistent copy of @syslog_seq. */
-   raw_spin_lock(_lock);
+   raw_spin_lock_irqsave(_lock, flags);
console_seq = syslog_seq;
-   raw_spin_unlock(_lock);
-
-   printk_safe_exit_irqrestore(flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
}
console_unlock();
console_sysfs_notify();
-- 
2.20.1



[PATCH printk-rework 13/14] printk: kmsg_dump: use kmsg_dump_rewind

2021-02-18 Thread John Ogness
kmsg_dump() is open coding the kmsg_dump_rewind(). Call
kmsg_dump_rewind() instead.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 744b806d5457..23d525e885e7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3347,7 +3347,6 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 {
struct kmsg_dumper_iter iter;
struct kmsg_dumper *dumper;
-   unsigned long flags;
 
rcu_read_lock();
list_for_each_entry_rcu(dumper, _list, list) {
@@ -3366,10 +3365,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 
/* initialize iterator with data about the stored records */
iter.active = true;
-   printk_safe_enter_irqsave(flags);
-   iter.cur_seq = latched_seq_read_nolock(_seq);
-   iter.next_seq = prb_next_seq(prb);
-   printk_safe_exit_irqrestore(flags);
+   kmsg_dump_rewind();
 
/* invoke dumper which will iterate over records */
dumper->dump(dumper, reason, );
-- 
2.20.1



[PATCH printk-rework 07/14] printk: use atomic64_t for devkmsg_user.seq

2021-02-18 Thread John Ogness
@user->seq is indirectly protected by @logbuf_lock. Once @logbuf_lock
is removed, @user->seq will be no longer safe from an atomicity point
of view.

In preparation for the removal of @logbuf_lock, change it to
atomic64_t to provide this safety.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a71e0d41ccb5..20c21a25143d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -662,7 +662,7 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
-   u64 seq;
+   atomic64_t seq;
struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
@@ -764,7 +764,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
return ret;
 
logbuf_lock_irq();
-   if (!prb_read_valid(prb, user->seq, r)) {
+   if (!prb_read_valid(prb, atomic64_read(>seq), r)) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
logbuf_unlock_irq();
@@ -773,15 +773,15 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
 
logbuf_unlock_irq();
ret = wait_event_interruptible(log_wait,
-   prb_read_valid(prb, user->seq, r));
+   prb_read_valid(prb, atomic64_read(>seq), 
r));
if (ret)
goto out;
logbuf_lock_irq();
}
 
-   if (r->info->seq != user->seq) {
+   if (r->info->seq != atomic64_read(>seq)) {
/* our last seen message is gone, return error and reset */
-   user->seq = r->info->seq;
+   atomic64_set(>seq, r->info->seq);
ret = -EPIPE;
logbuf_unlock_irq();
goto out;
@@ -792,7 +792,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
  >text_buf[0], r->info->text_len,
  >info->dev_info);
 
-   user->seq = r->info->seq + 1;
+   atomic64_set(>seq, r->info->seq + 1);
logbuf_unlock_irq();
 
if (len > count) {
@@ -832,7 +832,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
switch (whence) {
case SEEK_SET:
/* the first record */
-   user->seq = prb_first_valid_seq(prb);
+   atomic64_set(>seq, prb_first_valid_seq(prb));
break;
case SEEK_DATA:
/*
@@ -840,11 +840,11 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
 * changes no global state, and does not clear anything.
 */
-   user->seq = latched_seq_read_nolock(_seq);
+   atomic64_set(>seq, latched_seq_read_nolock(_seq));
break;
case SEEK_END:
/* after the last record */
-   user->seq = prb_next_seq(prb);
+   atomic64_set(>seq, prb_next_seq(prb));
break;
default:
ret = -EINVAL;
@@ -865,9 +865,9 @@ static __poll_t devkmsg_poll(struct file *file, poll_table 
*wait)
poll_wait(file, _wait, wait);
 
logbuf_lock_irq();
-   if (prb_read_valid_info(prb, user->seq, , NULL)) {
+   if (prb_read_valid(prb, atomic64_read(>seq), NULL)) {
/* return error when data has vanished underneath us */
-   if (info.seq != user->seq)
+   if (info.seq != atomic64_read(>seq))
ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
else
ret = EPOLLIN|EPOLLRDNORM;
@@ -906,7 +906,7 @@ static int devkmsg_open(struct inode *inode, struct file 
*file)
>text_buf[0], sizeof(user->text_buf));
 
logbuf_lock_irq();
-   user->seq = prb_first_valid_seq(prb);
+   atomic64_set(>seq, prb_first_valid_seq(prb));
logbuf_unlock_irq();
 
file->private_data = user;
-- 
2.20.1



[PATCH printk-rework 06/14] printk: use seqcount_latch for clear_seq

2021-02-18 Thread John Ogness
kmsg_dump_rewind_nolock() locklessly reads @clear_seq. However,
this is not done atomically. Since @clear_seq is 64-bit, this
cannot be an atomic operation for all platforms. Therefore, use
a seqcount_latch to allow readers to always read a consistent
value.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 58 --
 1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f79e7515b5f1..a71e0d41ccb5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -402,8 +402,21 @@ static u64 console_seq;
 static u64 exclusive_console_stop_seq;
 static unsigned long console_dropped;
 
-/* the next printk record to read after the last 'clear' command */
-static u64 clear_seq;
+struct latched_seq {
+   seqcount_latch_tlatch;
+   u64 val[2];
+};
+
+/*
+ * The next printk record to read after the last 'clear' command. There are
+ * two copies (updated with seqcount_latch) so that reads can locklessly
+ * access a valid value. Writers are synchronized by @logbuf_lock.
+ */
+static struct latched_seq clear_seq = {
+   .latch  = SEQCNT_LATCH_ZERO(clear_seq.latch),
+   .val[0] = 0,
+   .val[1] = 0,
+};
 
 #ifdef CONFIG_PRINTK_CALLER
 #define PREFIX_MAX 48
@@ -457,6 +470,31 @@ bool printk_percpu_data_ready(void)
return __printk_percpu_data_ready;
 }
 
+/* Must be called under logbuf_lock. */
+static void latched_seq_write(struct latched_seq *ls, u64 val)
+{
+   raw_write_seqcount_latch(>latch);
+   ls->val[0] = val;
+   raw_write_seqcount_latch(>latch);
+   ls->val[1] = val;
+}
+
+/* Can be called from any context. */
+static u64 latched_seq_read_nolock(struct latched_seq *ls)
+{
+   unsigned int seq;
+   unsigned int idx;
+   u64 val;
+
+   do {
+   seq = raw_read_seqcount_latch(>latch);
+   idx = seq & 0x1;
+   val = ls->val[idx];
+   } while (read_seqcount_latch_retry(>latch, seq));
+
+   return val;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -802,7 +840,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
 * changes no global state, and does not clear anything.
 */
-   user->seq = clear_seq;
+   user->seq = latched_seq_read_nolock(_seq);
break;
case SEEK_END:
/* after the last record */
@@ -961,6 +999,9 @@ void log_buf_vmcoreinfo_setup(void)
 
VMCOREINFO_SIZE(atomic_long_t);
VMCOREINFO_TYPE_OFFSET(atomic_long_t, counter);
+
+   VMCOREINFO_STRUCT_SIZE(latched_seq);
+   VMCOREINFO_OFFSET(latched_seq, val);
 }
 #endif
 
@@ -1558,7 +1599,8 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 * Find first record that fits, including all following records,
 * into the user-provided buffer for this dump.
 */
-   seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
+   seq = find_first_fitting_seq(latched_seq_read_nolock(_seq), -1,
+size, true, time);
 
prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
@@ -1585,7 +1627,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
}
 
if (clear)
-   clear_seq = seq;
+   latched_seq_write(_seq, seq);
logbuf_unlock_irq();
 
kfree(text);
@@ -1595,7 +1637,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 static void syslog_clear(void)
 {
logbuf_lock_irq();
-   clear_seq = prb_next_seq(prb);
+   latched_seq_write(_seq, prb_next_seq(prb));
logbuf_unlock_irq();
 }
 
@@ -3332,7 +3374,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
dumper->active = true;
 
logbuf_lock_irqsave(flags);
-   dumper->cur_seq = clear_seq;
+   dumper->cur_seq = latched_seq_read_nolock(_seq);
dumper->next_seq = prb_next_seq(prb);
logbuf_unlock_irqrestore(flags);
 
@@ -3530,7 +3572,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
  */
 void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
 {
-   dumper->cur_seq = clear_seq;
+   dumper->cur_seq = latched_seq_read_nolock(_seq);
dumper->next_seq = prb_next_seq(prb);
 }
 
-- 
2.20.1



[PATCH printk-rework 12/14] printk: kmsg_dump: remove _nolock() variants

2021-02-18 Thread John Ogness
kmsg_dump_rewind() and kmsg_dump_get_line() are lockless, so there is
no need for _nolock() variants. Remove these functions and switch all
callers of the _nolock() variants.

The functions without _nolock() were chosen because they are already
exported to kernel modules.

Signed-off-by: John Ogness 
---
 arch/powerpc/xmon/xmon.c|  4 +--
 include/linux/kmsg_dump.h   | 18 +--
 kernel/debug/kdb/kdb_main.c |  8 ++---
 kernel/printk/printk.c  | 60 +
 4 files changed, 15 insertions(+), 75 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 43162b885259..4cac114ba32d 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3015,9 +3015,9 @@ dump_log_buf(void)
catch_memory_errors = 1;
sync();
 
-   kmsg_dump_rewind_nolock();
+   kmsg_dump_rewind();
xmon_start_pagination();
-   while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), )) 
{
+   while (kmsg_dump_get_line(, false, buf, sizeof(buf), )) {
buf[len] = '\0';
printf("%s", buf);
}
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 2fdb10ab1799..86673930c8ea 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -60,18 +60,13 @@ struct kmsg_dumper {
 #ifdef CONFIG_PRINTK
 void kmsg_dump(enum kmsg_dump_reason reason);
 
-bool kmsg_dump_get_line_nolock(struct kmsg_dumper_iter *iter, bool syslog,
-  char *line, size_t size, size_t *len);
-
 bool kmsg_dump_get_line(struct kmsg_dumper_iter *iter, bool syslog,
char *line, size_t size, size_t *len);
 
 bool kmsg_dump_get_buffer(struct kmsg_dumper_iter *iter, bool syslog,
  char *buf, size_t size, size_t *len_out);
 
-void kmsg_dump_rewind_nolock(struct kmsg_dumper_iter *iter);
-
-void kmsg_dump_rewind(struct kmsg_dumper_iter *dumper_iter);
+void kmsg_dump_rewind(struct kmsg_dumper_iter *iter);
 
 int kmsg_dump_register(struct kmsg_dumper *dumper);
 
@@ -83,13 +78,6 @@ static inline void kmsg_dump(enum kmsg_dump_reason reason)
 {
 }
 
-static inline bool kmsg_dump_get_line_nolock(struct kmsg_dumper_iter *iter,
-bool syslog, const char *line,
-size_t size, size_t *len)
-{
-   return false;
-}
-
 static inline bool kmsg_dump_get_line(struct kmsg_dumper_iter *iter, bool 
syslog,
const char *line, size_t size, size_t *len)
 {
@@ -102,10 +90,6 @@ static inline bool kmsg_dump_get_buffer(struct 
kmsg_dumper_iter *iter, bool sysl
return false;
 }
 
-static inline void kmsg_dump_rewind_nolock(struct kmsg_dumper_iter *iter)
-{
-}
-
 static inline void kmsg_dump_rewind(struct kmsg_dumper_iter *iter)
 {
 }
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 7ae9da245e4b..dbf1d126ac5e 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2126,8 +2126,8 @@ static int kdb_dmesg(int argc, const char **argv)
kdb_set(2, setargs);
}
 
-   kmsg_dump_rewind_nolock();
-   while (kmsg_dump_get_line_nolock(, 1, NULL, 0, NULL))
+   kmsg_dump_rewind();
+   while (kmsg_dump_get_line(, 1, NULL, 0, NULL))
n++;
 
if (lines < 0) {
@@ -2159,8 +2159,8 @@ static int kdb_dmesg(int argc, const char **argv)
if (skip >= n || skip < 0)
return 0;
 
-   kmsg_dump_rewind_nolock();
-   while (kmsg_dump_get_line_nolock(, 1, buf, sizeof(buf), )) {
+   kmsg_dump_rewind();
+   while (kmsg_dump_get_line(, 1, buf, sizeof(buf), )) {
if (skip) {
skip--;
continue;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c5ea46ed88c7..744b806d5457 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3378,7 +3378,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 }
 
 /**
- * kmsg_dump_get_line_nolock - retrieve one kmsg log line (unlocked version)
+ * kmsg_dump_get_line - retrieve one kmsg log line
  * @iter: kmsg dumper iterator
  * @syslog: include the "<4>" prefixes
  * @line: buffer to copy the line to
@@ -3393,18 +3393,18 @@ void kmsg_dump(enum kmsg_dump_reason reason)
  *
  * A return value of FALSE indicates that there are no more records to
  * read.
- *
- * The function is similar to kmsg_dump_get_line(), but grabs no locks.
  */
-bool kmsg_dump_get_line_nolock(struct kmsg_dumper_iter *iter, bool syslog,
-  char *line, size_t size, size_t *len)
+bool kmsg_dump_get_line(struct kmsg_dumper_iter *iter, bool syslog,
+   char *line, size_t size, size_t *len)
 {
struct printk_info info;
unsigned int line_count;
struct printk_record r;
+   unsigned long flags;
siz

[PATCH printk-rework 02/14] printk: kmsg_dump: remove unused fields

2021-02-18 Thread John Ogness
struct kmsg_dumper still contains some fields that were used to
iterate the old ringbuffer. They are no longer used. Remove them
and update the struct documentation.

Signed-off-by: John Ogness 
---
 include/linux/kmsg_dump.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 3378bcbe585e..235c50982c2d 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -36,6 +36,9 @@ enum kmsg_dump_reason {
  * through the record iterator
  * @max_reason:filter for highest reason number that should be dumped
  * @registered:Flag that specifies if this is already registered
+ * @active:Flag that specifies if this is currently dumping
+ * @cur_seq:   Points to the oldest message to dump (private)
+ * @next_seq:  Points after the newest message to dump (private)
  */
 struct kmsg_dumper {
struct list_head list;
@@ -45,8 +48,6 @@ struct kmsg_dumper {
bool registered;
 
/* private state of the kmsg iterator */
-   u32 cur_idx;
-   u32 next_idx;
u64 cur_seq;
u64 next_seq;
 };
-- 
2.20.1



[PATCH printk-rework 10/14] um: synchronize kmsg_dumper

2021-02-18 Thread John Ogness
The kmsg_dumper can be called from any context and CPU, possibly
from multiple CPUs simultaneously. Since a static buffer is used
to retrieve the kernel logs, this buffer must be protected against
simultaneous dumping.

Cc: Richard Weinberger 
Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 arch/um/kernel/kmsg_dump.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
index f38349ad00ea..173999422ed8 100644
--- a/arch/um/kernel/kmsg_dump.c
+++ b/arch/um/kernel/kmsg_dump.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -9,8 +10,10 @@ static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
enum kmsg_dump_reason reason,
struct kmsg_dumper_iter *iter)
 {
+   static DEFINE_SPINLOCK(lock);
static char line[1024];
struct console *con;
+   unsigned long flags;
size_t len = 0;
 
/* only dump kmsg when no console is available */
@@ -25,11 +28,16 @@ static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
if (con)
return;
 
+   if (!spin_trylock_irqsave(, flags))
+   return;
+
printf("kmsg_dump:\n");
while (kmsg_dump_get_line(iter, true, line, sizeof(line), )) {
line[len] = '\0';
printf("%s", line);
}
+
+   spin_unlock_irqrestore(, flags);
 }
 
 static struct kmsg_dumper kmsg_dumper = {
-- 
2.20.1



[PATCH printk-rework 03/14] printk: refactor kmsg_dump_get_buffer()

2021-02-18 Thread John Ogness
kmsg_dump_get_buffer() requires nearly the same logic as
syslog_print_all(), but uses different variable names and
does not make use of the ringbuffer loop macros. Modify
kmsg_dump_get_buffer() so that the implementation is as similar
to syslog_print_all() as possible.

A follow-up commit will move this common logic into a
separate helper function.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 include/linux/kmsg_dump.h |  2 +-
 kernel/printk/printk.c| 60 +--
 2 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 235c50982c2d..4095a34db0fa 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -62,7 +62,7 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool 
syslog,
char *line, size_t size, size_t *len);
 
 bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
- char *buf, size_t size, size_t *len);
+ char *buf, size_t size, size_t *len_out);
 
 void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 411787b900ac..b4f72b5f70b9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3420,7 +3420,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_line);
  * read.
  */
 bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
- char *buf, size_t size, size_t *len)
+ char *buf, size_t size, size_t *len_out)
 {
struct printk_info info;
unsigned int line_count;
@@ -3428,12 +3428,10 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
unsigned long flags;
u64 seq;
u64 next_seq;
-   size_t l = 0;
+   size_t len = 0;
bool ret = false;
bool time = printk_time;
 
-   prb_rec_init_rd(, , buf, size);
-
if (!dumper->active || !buf || !size)
goto out;
 
@@ -3451,48 +3449,54 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
goto out;
}
 
-   /* calculate length of entire buffer */
-   seq = dumper->cur_seq;
-   while (prb_read_valid_info(prb, seq, , _count)) {
-   if (r.info->seq >= dumper->next_seq)
+   /*
+* Find first record that fits, including all following records,
+* into the user-provided buffer for this dump.
+*/
+
+   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
+   if (info.seq >= dumper->next_seq)
break;
-   l += get_record_print_text_size(, line_count, syslog, 
time);
-   seq = r.info->seq + 1;
+   len += get_record_print_text_size(, line_count, syslog, 
time);
}
 
-   /* move first record forward until length fits into the buffer */
-   seq = dumper->cur_seq;
-   while (l >= size && prb_read_valid_info(prb, seq,
-   , _count)) {
-   if (r.info->seq >= dumper->next_seq)
+   /*
+* Move first record forward until length fits into the buffer. Ignore
+* newest messages that were not counted in the above cycle. Messages
+* might appear and get lost in the meantime. This is the best effort
+* that prevents an infinite loop.
+*/
+   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
+   if (len < size || info.seq >= dumper->next_seq)
break;
-   l -= get_record_print_text_size(, line_count, syslog, 
time);
-   seq = r.info->seq + 1;
+   len -= get_record_print_text_size(, line_count, syslog, 
time);
}
 
-   /* last message in next interation */
+   /*
+* Next kmsg_dump_get_buffer() invocation will dump block of
+* older records stored right before this one.
+*/
next_seq = seq;
 
-   /* actually read text into the buffer now */
-   l = 0;
-   while (prb_read_valid(prb, seq, )) {
+   prb_rec_init_rd(, , buf, size);
+
+   len = 0;
+   prb_for_each_record(seq, prb, seq, ) {
if (r.info->seq >= dumper->next_seq)
break;
 
-   l += record_print_text(, syslog, time);
-
-   /* adjust record to store to remaining buffer space */
-   prb_rec_init_rd(, , buf + l, size - l);
+   len += record_print_text(, syslog, time);
 
-   seq = r.info->seq + 1;
+   /* Adjust record to store to remaining buffer space. */
+   prb_rec_init_rd(, , buf + len, size - len);
}
 
dumper->next_seq = next_seq;
ret = true;
logbuf_unlock_irqrestore(flags);
 out:
-   if (len)
-   *

[PATCH printk-rework 05/14] printk: introduce CONSOLE_LOG_MAX for improved multi-line support

2021-02-18 Thread John Ogness
Instead of using "LOG_LINE_MAX + PREFIX_MAX" for temporary buffer
sizes, introduce CONSOLE_LOG_MAX. This represents the maximum size
that is allowed to be printed to the console for a single record.

Rather than setting CONSOLE_LOG_MAX to "LOG_LINE_MAX + PREFIX_MAX"
(1024), increase it to 4096. With a larger buffer size, multi-line
records that are nearly LOG_LINE_MAX in length will have a better
chance of being fully printed. (When formatting a record for the
console, each line of a multi-line record is prepended with a copy
of the prefix.)

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d6f93ebd7bd0..f79e7515b5f1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -410,8 +410,13 @@ static u64 clear_seq;
 #else
 #define PREFIX_MAX 32
 #endif
+
+/* the maximum size allowed to be reserved for a record */
 #define LOG_LINE_MAX   (1024 - PREFIX_MAX)
 
+/* the maximum size of a formatted record (i.e. with prefix added per line) */
+#define CONSOLE_LOG_MAX4096
+
 #define LOG_LEVEL(v)   ((v) & 0x07)
 #define LOG_FACILITY(v)((v) >> 3 & 0xff)
 
@@ -1473,11 +1478,11 @@ static int syslog_print(char __user *buf, int size)
char *text;
int len = 0;
 
-   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+   text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
-   prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
+   prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
while (size > 0) {
size_t n;
@@ -1543,7 +1548,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
u64 seq;
bool time;
 
-   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+   text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
@@ -1555,7 +1560,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 */
seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
 
-   prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
+   prb_rec_init_rd(, , text, CONSOLE_LOG_MAX);
 
len = 0;
prb_for_each_record(seq, prb, seq, ) {
@@ -2188,8 +2193,7 @@ EXPORT_SYMBOL(printk);
 
 #else /* CONFIG_PRINTK */
 
-#define LOG_LINE_MAX   0
-#define PREFIX_MAX 0
+#define CONSOLE_LOG_MAX0
 #define printk_timefalse
 
 #define prb_read_valid(rb, seq, r) false
@@ -2500,7 +2504,7 @@ static inline int can_use_console(void)
 void console_unlock(void)
 {
static char ext_text[CONSOLE_EXT_LOG_MAX];
-   static char text[LOG_LINE_MAX + PREFIX_MAX];
+   static char text[CONSOLE_LOG_MAX];
unsigned long flags;
bool do_cond_resched, retry;
struct printk_info info;
-- 
2.20.1



[PATCH printk-rework 00/14] printk: remove logbuf_lock

2021-02-18 Thread John Ogness
Hello,

Here is v2 of a series to remove @logbuf_lock, exposing the
ringbuffer locklessly to both readers and writers. v1 is here [0].

Since @logbuf_lock was protecting much more than just the
ringbuffer, this series clarifies and cleans up the various
protections using comments, lockless accessors, atomic types, and a
new finer-grained @syslog_log.

Changes since v1:

- handle the syslog_print_all() size calculation issue in a separate
  patch (patch 1)

- use a local printk_info for find_first_fitting_seq()

- define CONSOLE_LOG_MAX in printk.c instead of printk.h since it is
  not used outside of printk.c

- increase CONSOLE_LOG_MAX to 4096 to support long multi-line
  records

- add a wrapper function read_syslog_seq_irq() for getting a
  consistent @syslog_seq value (only used in do_syslog())

- drop the "hv: synchronize kmsg_dumper" patch

- in "remove logbuf_lock" only change to safe buffer usage

- fixup safe buffer usage and redundance in separate patches
  (patches 13 and 14)

- update comments and commit messages as requested

John Ogness

[0] https://lkml.kernel.org/r/20210126211551.26536-1-john.ogn...@linutronix.de

John Ogness (14):
  printk: limit second loop of syslog_print_all
  printk: kmsg_dump: remove unused fields
  printk: refactor kmsg_dump_get_buffer()
  printk: consolidate kmsg_dump_get_buffer/syslog_print_all code
  printk: introduce CONSOLE_LOG_MAX for improved multi-line support
  printk: use seqcount_latch for clear_seq
  printk: use atomic64_t for devkmsg_user.seq
  printk: add syslog_lock
  printk: introduce a kmsg_dump iterator
  um: synchronize kmsg_dumper
  printk: remove logbuf_lock
  printk: kmsg_dump: remove _nolock() variants
  printk: kmsg_dump: use kmsg_dump_rewind
  printk: console: remove unnecessary safe buffer usage

 arch/powerpc/kernel/nvram_64.c |  12 +-
 arch/powerpc/platforms/powernv/opal-kmsg.c |   3 +-
 arch/powerpc/xmon/xmon.c   |   6 +-
 arch/um/kernel/kmsg_dump.c |  13 +-
 drivers/hv/vmbus_drv.c |   5 +-
 drivers/mtd/mtdoops.c  |   5 +-
 fs/pstore/platform.c   |   5 +-
 include/linux/kmsg_dump.h  |  52 +--
 kernel/debug/kdb/kdb_main.c|  10 +-
 kernel/printk/internal.h   |   4 +-
 kernel/printk/printk.c | 454 +++--
 kernel/printk/printk_safe.c|  29 +-
 12 files changed, 298 insertions(+), 300 deletions(-)

-- 
2.20.1



[PATCH printk-rework 08/14] printk: add syslog_lock

2021-02-18 Thread John Ogness
The global variables @syslog_seq, @syslog_partial, @syslog_time
and write access to @clear_seq are protected by @logbuf_lock.
Once @logbuf_lock is removed, these variables will need their
own synchronization method. Introduce @syslog_lock for this
purpose.

@syslog_lock is a raw_spin_lock for now. This simplifies the
transition to removing @logbuf_lock. Once @logbuf_lock and the
safe buffers are removed, @syslog_lock can change to spin_lock.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 41 +
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 20c21a25143d..401df370832b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -390,8 +390,12 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
printk_safe_exit_irqrestore(flags); \
} while (0)
 
+/* syslog_lock protects syslog_* variables and write access to clear_seq. */
+static DEFINE_RAW_SPINLOCK(syslog_lock);
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
+/* All 3 protected by @syslog_lock. */
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static size_t syslog_partial;
@@ -410,7 +414,7 @@ struct latched_seq {
 /*
  * The next printk record to read after the last 'clear' command. There are
  * two copies (updated with seqcount_latch) so that reads can locklessly
- * access a valid value. Writers are synchronized by @logbuf_lock.
+ * access a valid value. Writers are synchronized by @syslog_lock.
  */
 static struct latched_seq clear_seq = {
.latch  = SEQCNT_LATCH_ZERO(clear_seq.latch),
@@ -470,7 +474,7 @@ bool printk_percpu_data_ready(void)
return __printk_percpu_data_ready;
 }
 
-/* Must be called under logbuf_lock. */
+/* Must be called under syslog_lock. */
 static void latched_seq_write(struct latched_seq *ls, u64 val)
 {
raw_write_seqcount_latch(>latch);
@@ -1530,7 +1534,9 @@ static int syslog_print(char __user *buf, int size)
size_t skip;
 
logbuf_lock_irq();
+   raw_spin_lock(_lock);
if (!prb_read_valid(prb, syslog_seq, )) {
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
break;
}
@@ -1560,6 +1566,7 @@ static int syslog_print(char __user *buf, int size)
syslog_partial += n;
} else
n = 0;
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
 
if (!n)
@@ -1626,8 +1633,11 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
break;
}
 
-   if (clear)
+   if (clear) {
+   raw_spin_lock(_lock);
latched_seq_write(_seq, seq);
+   raw_spin_unlock(_lock);
+   }
logbuf_unlock_irq();
 
kfree(text);
@@ -1637,10 +1647,24 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 static void syslog_clear(void)
 {
logbuf_lock_irq();
+   raw_spin_lock(_lock);
latched_seq_write(_seq, prb_next_seq(prb));
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
 }
 
+/* Return a consistent copy of @syslog_seq. */
+static u64 read_syslog_seq_irq(void)
+{
+   u64 seq;
+
+   raw_spin_lock_irq(_lock);
+   seq = syslog_seq;
+   raw_spin_unlock_irq(_lock);
+
+   return seq;
+}
+
 int do_syslog(int type, char __user *buf, int len, int source)
 {
struct printk_info info;
@@ -1664,8 +1688,9 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
return 0;
if (!access_ok(buf, len))
return -EFAULT;
+
error = wait_event_interruptible(log_wait,
-   prb_read_valid(prb, syslog_seq, NULL));
+   prb_read_valid(prb, read_syslog_seq_irq(), 
NULL));
if (error)
return error;
error = syslog_print(buf, len);
@@ -1714,8 +1739,10 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
logbuf_lock_irq();
+   raw_spin_lock(_lock);
if (!prb_read_valid_info(prb, syslog_seq, , NULL)) {
/* No unread messages. */
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
return 0;
}
@@ -1744,6 +1771,7 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
}
error -= syslog_partial;
}
+   raw_spin_unlock(_lock);
logbuf_unlock_irq();
break;
/* Size of the log buffer */
@@ -298

[PATCH printk-rework 04/14] printk: consolidate kmsg_dump_get_buffer/syslog_print_all code

2021-02-18 Thread John Ogness
The logic for finding records to fit into a buffer is the same for
kmsg_dump_get_buffer() and syslog_print_all(). Introduce a helper
function find_first_fitting_seq() to handle this logic.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 87 --
 1 file changed, 50 insertions(+), 37 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b4f72b5f70b9..d6f93ebd7bd0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1422,6 +1422,50 @@ static size_t get_record_print_text_size(struct 
printk_info *info,
return ((prefix_len * line_count) + info->text_len + 1);
 }
 
+/*
+ * Beginning with @start_seq, find the first record where it and all following
+ * records up to (but not including) @max_seq fit into @size.
+ *
+ * @max_seq is simply an upper bound and does not need to exist. If the caller
+ * does not require an upper bound, -1 can be used for @max_seq.
+ */
+static u64 find_first_fitting_seq(u64 start_seq, u64 max_seq, size_t size,
+ bool syslog, bool time)
+{
+   struct printk_info info;
+   unsigned int line_count;
+   size_t len = 0;
+   u64 seq;
+
+   /* Determine the size of the records up to @max_seq. */
+   prb_for_each_info(start_seq, prb, seq, , _count) {
+   if (info.seq >= max_seq)
+   break;
+   len += get_record_print_text_size(, line_count, syslog, 
time);
+   }
+
+   /*
+* Adjust the upper bound for the next loop to avoid subtracting
+* lengths that were never added.
+*/
+   if (seq < max_seq)
+   max_seq = seq;
+
+   /*
+* Move first record forward until length fits into the buffer. Ignore
+* newest messages that were not counted in the above cycle. Messages
+* might appear and get lost in the meantime. This is a best effort
+* that prevents an infinite loop that could occur with a retry.
+*/
+   prb_for_each_info(start_seq, prb, seq, , _count) {
+   if (len <= size || info.seq >= max_seq)
+   break;
+   len -= get_record_print_text_size(, line_count, syslog, 
time);
+   }
+
+   return seq;
+}
+
 static int syslog_print(char __user *buf, int size)
 {
struct printk_info info;
@@ -1493,9 +1537,7 @@ static int syslog_print(char __user *buf, int size)
 static int syslog_print_all(char __user *buf, int size, bool clear)
 {
struct printk_info info;
-   unsigned int line_count;
struct printk_record r;
-   u64 max_seq;
char *text;
int len = 0;
u64 seq;
@@ -1511,21 +1553,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
 * Find first record that fits, including all following records,
 * into the user-provided buffer for this dump.
 */
-   prb_for_each_info(clear_seq, prb, seq, , _count)
-   len += get_record_print_text_size(, line_count, true, 
time);
-
-   /*
-* Set an upper bound for the next loop to avoid subtracting lengths
-* that were never added.
-*/
-   max_seq = seq;
-
-   /* move first record forward until length fits into the buffer */
-   prb_for_each_info(clear_seq, prb, seq, , _count) {
-   if (len <= size || info.seq >= max_seq)
-   break;
-   len -= get_record_print_text_size(, line_count, true, 
time);
-   }
+   seq = find_first_fitting_seq(clear_seq, -1, size, true, time);
 
prb_rec_init_rd(, , text, LOG_LINE_MAX + PREFIX_MAX);
 
@@ -3423,7 +3451,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
  char *buf, size_t size, size_t *len_out)
 {
struct printk_info info;
-   unsigned int line_count;
struct printk_record r;
unsigned long flags;
u64 seq;
@@ -3451,26 +3478,12 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
 
/*
 * Find first record that fits, including all following records,
-* into the user-provided buffer for this dump.
+* into the user-provided buffer for this dump. Pass in size-1
+* because this function (by way of record_print_text()) will
+* not write more than size-1 bytes of text into @buf.
 */
-
-   prb_for_each_info(dumper->cur_seq, prb, seq, , _count) {
-   if (info.seq >= dumper->next_seq)
-   break;
-   len += get_record_print_text_size(, line_count, syslog, 
time);
-   }
-
-   /*
-* Move first record forward until length fits into the buffer. Ignore
-* newest messages that were not counted in the above cycle. Messages
-* might appear and get lost in the meantime. This is the best effort
-* that prevents an infinite loop.
-  

Re: smpboot: CPU numbers printed as warning

2021-02-16 Thread John Ogness
On 2021-02-16, Borislav Petkov  wrote:
>> Also you should add '\n' into the previous string to make the behavior
>> clear. It will always be printed on a new line when pr_info()
>> is used.
>
> This was made to use pr_cont() on purpose so that the output is
> compact,

It is supported to provide loglevels for CONT messages. The loglevel is
then only used if the append fails:

pr_cont(KERN_INFO "message part");

I don't know if we want to go down that path. But it is supported.

John Ogness


[PATCH v2] printk: avoid prb_first_valid_seq() where possible

2021-02-11 Thread John Ogness
If message sizes average larger than expected (more than 32
characters), the data_ring will wrap before the desc_ring. Once the
data_ring wraps, it will start invalidating descriptors. These
invalid descriptors hang around until they are eventually recycled
when the desc_ring wraps. Readers do not care about invalid
descriptors, but they still need to iterate past them. If the
average message size is much larger than 32 characters, then there
will be many invalid descriptors preceding the valid descriptors.

The function prb_first_valid_seq() always begins at the oldest
descriptor and searches for the first valid descriptor. This can
be rather expensive for the above scenario. And, in fact, because
of its heavy usage in /dev/kmsg, there have been reports of long
delays and even RCU stalls.

For code that does not need to search from the oldest record,
replace prb_first_valid_seq() usage with prb_read_valid_*()
functions, which provide a start sequence number to search from.

Fixes: 896fbe20b4e2333fb55 ("printk: use the lockless ringbuffer")
Reported-by: kernel test robot 
Reported-by: J. Avila 
Signed-off-by: John Ogness 
---
 patch against next-20210211

 v2: Abort and report no unread messages if SYSLOG_ACTION_SIZE_UNREAD
 fails to read the current or any newer record.

 kernel/printk/printk.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5a95c688621f..575a34b88936 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -735,9 +735,9 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
logbuf_lock_irq();
}
 
-   if (user->seq < prb_first_valid_seq(prb)) {
+   if (r->info->seq != user->seq) {
/* our last seen message is gone, return error and reset */
-   user->seq = prb_first_valid_seq(prb);
+   user->seq = r->info->seq;
ret = -EPIPE;
logbuf_unlock_irq();
goto out;
@@ -812,6 +812,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 {
struct devkmsg_user *user = file->private_data;
+   struct printk_info info;
__poll_t ret = 0;
 
if (!user)
@@ -820,9 +821,9 @@ static __poll_t devkmsg_poll(struct file *file, poll_table 
*wait)
poll_wait(file, _wait, wait);
 
logbuf_lock_irq();
-   if (prb_read_valid(prb, user->seq, NULL)) {
+   if (prb_read_valid_info(prb, user->seq, , NULL)) {
/* return error when data has vanished underneath us */
-   if (user->seq < prb_first_valid_seq(prb))
+   if (info.seq != user->seq)
ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
else
ret = EPOLLIN|EPOLLRDNORM;
@@ -1559,6 +1560,7 @@ static void syslog_clear(void)
 
 int do_syslog(int type, char __user *buf, int len, int source)
 {
+   struct printk_info info;
bool clear = false;
static int saved_console_loglevel = LOGLEVEL_DEFAULT;
int error;
@@ -1629,9 +1631,14 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
logbuf_lock_irq();
-   if (syslog_seq < prb_first_valid_seq(prb)) {
+   if (!prb_read_valid_info(prb, syslog_seq, , NULL)) {
+   /* No unread messages. */
+   logbuf_unlock_irq();
+   return 0;
+   }
+   if (info.seq != syslog_seq) {
/* messages are gone, move to first one */
-   syslog_seq = prb_first_valid_seq(prb);
+   syslog_seq = info.seq;
syslog_partial = 0;
}
if (source == SYSLOG_FROM_PROC) {
@@ -1643,7 +1650,6 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
error = prb_next_seq(prb) - syslog_seq;
} else {
bool time = syslog_partial ? syslog_time : printk_time;
-   struct printk_info info;
unsigned int line_count;
u64 seq;
 
@@ -3429,9 +3435,11 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
goto out;
 
logbuf_lock_irqsave(flags);
-   if (dumper->cur_seq < prb_first_valid_seq(prb)) {
-   /* messages are gone, move to first available one */
-   dumper->cur_seq = prb_first_valid_seq(prb);
+   if (prb_read_valid_info(prb, dumper->cur_seq, , NULL)) {
+   if (info.seq != dumper->cur_seq) {
+   /* messages are gone, move to 

Re: [PATCH] printk: avoid prb_first_valid_seq() where possible

2021-02-10 Thread John Ogness
On 2021-02-09, Petr Mladek  wrote:
>> @@ -1629,9 +1631,13 @@ int do_syslog(int type, char __user *buf, int len, 
>> int source)
>>  /* Number of chars in the log buffer */
>>  case SYSLOG_ACTION_SIZE_UNREAD:
>>  logbuf_lock_irq();
>> -if (syslog_seq < prb_first_valid_seq(prb)) {
>> -/* messages are gone, move to first one */
>> -syslog_seq = prb_first_valid_seq(prb);
>> +if (prb_read_valid_info(prb, syslog_seq, , NULL)) {
>> +if (info.seq != syslog_seq) {
>> +/* messages are gone, move to first one */
>> +syslog_seq = info.seq;
>> +syslog_partial = 0;
>> +}
>> +} else {
>>  syslog_partial = 0;
>
> I am scratching my head when prb_read_valid_info(prb,
> syslog_seq, , NULL)) might fail.

It can fail because the descriptor has been invalidated/recycled by
writers and perhaps there is no valid record that has yet come after it.

> It might fail when syslog_seq points to the next message
> after the last valid one. In this case, we could return
> immediately (after releasing the lock) because there are
> zero unread messages.

Yes, we could just return 0 in this case. If we are returning and not
modifying @syslog_seq, then there is no need to reset
@syslog_partial. At some point a reader will notice that the record is
gone and reset @syslog_partial accordingly.

> Anyway, syslog_partial must be zero in this case. syslog_seq
> should stay when the last read was partial. And there should
> always be at least one valid message in the log buffer
> be design.

A record can be invalidated at any time. It is a normal case that a
re-read of a record (to get the rest of the partial) can lead to the
record no longer being available.

> IMHO, it would deserve a comment and maybe even a warning.

I don't think we need a warning. It is something that can happen and it
is not a problem.

> What about something like?
>
>   /* Number of chars in the log buffer */
>   case SYSLOG_ACTION_SIZE_UNREAD:
>   logbuf_lock_irq();
>   if (!prb_read_valid_info(prb, syslog_seq, , NULL)) {
>   /* No unread message */
>   if (syslog_partial) {
>   /* This should never happen. */
>   pr_err_once("Unable to read any message even 
> when the last syslog read was partial: %zu", syslog_partial);
>   syslog_partial = 0;
>   }
>   logbuf_unlock_irq();
>   return 0;
>   }

I recommend changing your suggestion to:

>   if (!prb_read_valid_info(prb, syslog_seq, , NULL)) {
>   /*
>* No unread messages. No need to check/reset
>* syslog_partial. When a reader does read a new
>* message it will notice and appropriately update
>* syslog_seq and reset syslog_partial.
>*/
>   logbuf_unlock_irq();
>   return 0;
>   }
>   if (info.seq != syslog_seq) {
>   /* messages are gone, move to first one */
>   syslog_seq = info.seq;
>   syslog_partial = 0;
>   }

John Ogness


Re: [PATCH] printk: avoid prb_first_valid_seq() where possible

2021-02-10 Thread John Ogness
On 2021-02-08, Sergey Senozhatsky  wrote:
>> Can we please also ask the kernel test robot to test this patch?

Oliver Sang from LKP was able to verify that the RCU stall problem is
not seen anymore on their side. See his response below.

Thanks Oliver!

John Ogness

On 2021-02-10, Oliver Sang  wrote:
> On Mon, Feb 08, 2021 at 10:35:27AM +0106, John Ogness wrote:
>> Hello LKP Project,
>> 
>> Thank you for your valuable and excellent work!
>> 
>> You recently detected a problem:
>> 
>> https://lists.01.org/hyperkitty/list/l...@lists.01.org/thread/STZF3OODVA5KOG447JR2AJJXREWIPRXD/
>> 
>> We have posted a patch to fix the issue:
>> 
>> https://lkml.kernel.org/r/20210205141728.18117-1-john.ogn...@linutronix.de
>
> Hi John Ogness,
>
> by applying the patch upon below commit:
> commit: b031a684bfd01d633c79d281bd0cf11c2f834ada ("printk: remove logbuf_lock 
> writer-protection of ringbuffer")
>
> we didn't reproduce the previous INFO:rcu_tasks_detected_stalls_on_tasks
> issue in 30 runs:
>
> b031a684bfd01d63: ("printk: remove logbuf_lock writer-protection of 
> ringbuffer")
> 7e926a042bfad8b7: ("printk: avoid prb_first_valid_seq() where possible")
>
> b031a684bfd01d63  7e926a042bfad8b7334b4677d3
>   --
>fail:runs  %reproductionfail:runs
>| | |
>  10:21 -48%:30
> dmesg.INFO:rcu_tasks_detected_stalls_on_tasks
>  19:21 -90%:30last_state.is_incomplete_run
>   1:21  -5%:30last_state.post_run
>
>
>> 
>> Using a local lkp installation I can verify the problem is fixed. But we
>> would like to know if there possibilities to verify fixes using the LKP
>> test robot? Or is there any way to check that the test robot sees the
>> problem is fixed?
>> 
>> Thanks.
>> 
>> John Ogness


Re: [PATCH] printk: avoid prb_first_valid_seq() where possible

2021-02-08 Thread John Ogness
On 2021-02-08, Sergey Senozhatsky  wrote:
> Can we please also ask the kernel test robot to test this patch?

LKP is an automated service. The problem was reported for an older
commit. The new patch will not apply.

I will try to contact the LKP team and see how we can get some sort of
verification.

@Avila: Can you also verify that this patch fixes your issue [0]?

John Ogness

[0] https://lkml.kernel.org/r/20210122235238.655049-1-elav...@google.com


[PATCH] printk: avoid prb_first_valid_seq() where possible

2021-02-05 Thread John Ogness
If message sizes average larger than expected (more than 32
characters), the data_ring will wrap before the desc_ring. Once the
data_ring wraps, it will start invalidating descriptors. These
invalid descriptors hang around until they are eventually recycled
when the desc_ring wraps. Readers do not care about invalid
descriptors, but they still need to iterate past them. If the
average message size is much larger than 32 characters, then there
will be many invalid descriptors preceding the valid descriptors.

The function prb_first_valid_seq() always begins at the oldest
descriptor and searches for the first valid descriptor. This can
be rather expensive for the above scenario. And, in fact, because
of its heavy usage in /dev/kmsg, there have been reports of long
delays and even RCU stalls.

For code that does not need to search from the oldest record,
replace prb_first_valid_seq() usage with prb_read_valid_*()
functions, which provide a start sequence number to search from.

Fixes: 896fbe20b4e2333fb55 ("printk: use the lockless ringbuffer")
Reported-by: kernel test robot 
Reported-by: J. Avila 
Signed-off-by: John Ogness 
---
 patch against next-20210205

 kernel/printk/printk.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5a95c688621f..035aae771ea1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -735,9 +735,9 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
logbuf_lock_irq();
}
 
-   if (user->seq < prb_first_valid_seq(prb)) {
+   if (r->info->seq != user->seq) {
/* our last seen message is gone, return error and reset */
-   user->seq = prb_first_valid_seq(prb);
+   user->seq = r->info->seq;
ret = -EPIPE;
logbuf_unlock_irq();
goto out;
@@ -812,6 +812,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 {
struct devkmsg_user *user = file->private_data;
+   struct printk_info info;
__poll_t ret = 0;
 
if (!user)
@@ -820,9 +821,9 @@ static __poll_t devkmsg_poll(struct file *file, poll_table 
*wait)
poll_wait(file, _wait, wait);
 
logbuf_lock_irq();
-   if (prb_read_valid(prb, user->seq, NULL)) {
+   if (prb_read_valid_info(prb, user->seq, , NULL)) {
/* return error when data has vanished underneath us */
-   if (user->seq < prb_first_valid_seq(prb))
+   if (info.seq != user->seq)
ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
else
ret = EPOLLIN|EPOLLRDNORM;
@@ -1559,6 +1560,7 @@ static void syslog_clear(void)
 
 int do_syslog(int type, char __user *buf, int len, int source)
 {
+   struct printk_info info;
bool clear = false;
static int saved_console_loglevel = LOGLEVEL_DEFAULT;
int error;
@@ -1629,9 +1631,13 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
logbuf_lock_irq();
-   if (syslog_seq < prb_first_valid_seq(prb)) {
-   /* messages are gone, move to first one */
-   syslog_seq = prb_first_valid_seq(prb);
+   if (prb_read_valid_info(prb, syslog_seq, , NULL)) {
+   if (info.seq != syslog_seq) {
+   /* messages are gone, move to first one */
+   syslog_seq = info.seq;
+   syslog_partial = 0;
+   }
+   } else {
syslog_partial = 0;
}
if (source == SYSLOG_FROM_PROC) {
@@ -1643,7 +1649,6 @@ int do_syslog(int type, char __user *buf, int len, int 
source)
error = prb_next_seq(prb) - syslog_seq;
} else {
bool time = syslog_partial ? syslog_time : printk_time;
-   struct printk_info info;
unsigned int line_count;
u64 seq;
 
@@ -3429,9 +3434,11 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
goto out;
 
logbuf_lock_irqsave(flags);
-   if (dumper->cur_seq < prb_first_valid_seq(prb)) {
-   /* messages are gone, move to first available one */
-   dumper->cur_seq = prb_first_valid_seq(prb);
+   if (prb_read_valid_info(prb, dumper->cur_seq, , NULL)) {
+   if (info.seq != dumper->cur_seq) {
+   /* messages are gone, move to first available one */
+   dumper->cur_seq = info.seq;
+   }
}
 
/* last entry */
-- 
2.20.1



Re: [printk] b031a684bf: INFO:rcu_tasks_detected_stalls_on_tasks

2021-02-04 Thread John Ogness
On 2021-01-22, kernel test robot  wrote:
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: b031a684bfd01d633c79d281bd0cf11c2f834ada ("printk: remove logbuf_lock 
> writer-protection of ringbuffer")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

So I finally tracked down the problem. And yes, it is a problem with the
ringbuffer. And it turns out this is the same problem reported here [0].

If message sizes average larger than expected (more than 32 characters),
the data_ring will wrap before the desc_ring. Once the data_ring wraps,
it will start invalidating descriptors. These invalid descriptors hang
around until they are eventually recycled (when the desc_ring
wraps). Readers do not care about invalid descriptors, but they still
have to iterate past them. If the average message size is much larger
than 32 characters, then there will be many invalid descriptors
preceeding the valid descriptors.

For this particular LKP report, the RCU stalls started happening as the
number of invalid descriptors approached 17000. The reason this causes a
problem is because of the function prb_first_valid_seq(). It starts at
the oldest descriptor and searches to find the oldest _valid_
descriptor. In this case, it had to iterate past 17000 descriptors every
time.

prb_first_valid_seq() is used in devkmsg_read() and in
devkmsg_poll(). And worse, it is called with local interrupts disabled
and logbuf_lock locked.

The solution is to avoid using prb_first_valid_seq() if possible. And
indeed, in both of these cases it is a simple change:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c8847ee571f0..76e8df20fdf9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -736,9 +736,9 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
logbuf_lock_irq();
}
 
-   if (user->seq < prb_first_valid_seq(prb)) {
+   if (r->info->seq != user->seq) {
/* our last seen message is gone, return error and reset */
-   user->seq = prb_first_valid_seq(prb);
+   user->seq = r->info->seq;
ret = -EPIPE;
logbuf_unlock_irq();
goto out;
@@ -813,6 +813,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
 {
struct devkmsg_user *user = file->private_data;
+   struct printk_info info;
__poll_t ret = 0;
 
if (!user)
@@ -821,9 +822,9 @@ static __poll_t devkmsg_poll(struct file *file, poll_table 
*wait)
poll_wait(file, _wait, wait);
 
logbuf_lock_irq();
-   if (prb_read_valid(prb, user->seq, NULL)) {
+   if (prb_read_valid_info(prb, user->seq, , NULL)) {
/* return error when data has vanished underneath us */
-   if (user->seq < prb_first_valid_seq(prb))
+   if (info.seq != user->seq)
ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
else
ret = EPOLLIN|EPOLLRDNORM;

Once logbuf_lock and safe buffer usage is removed, this efficiency
wouldn't matter to the kernel anyway. But I am glad we hit it while it
still mattered because we should not be carelessly wasting CPU cycles
for any task.

Interestingly enough, LTP reported a problem with this code back in July
2020. The "invalid descriptor issue" was clearly stated [1] and Petr
even made a suggestion [2] which is nearly identical to how I propose to
fix it here.

prb_first_valid_seq() is used unnecessarily in some syslog and devkmsg
locations as well. And prb_first_valid_seq() itself can also be slightly
improved.

I am preparing a patch against linux-next for this. And although the
current situation is not pretty, I do not think it needs to be rushed
for 5.11. It is an inefficiency that occurs if the average message size
greatly exceeds 32 bytes and the ringbuffer is being blasted by new
messages and userspace is reading the ringbuffer.

John Ogness

[0] https://lkml.kernel.org/r/20210122235238.655049-1-elav...@google.com
[1] https://lkml.kernel.org/r/874kqhm1v8@jogness.linutronix.de
[2] https://lkml.kernel.org/r/20200709105906.GC11164@alley


Re: [printk] b031a684bf: INFO:rcu_tasks_detected_stalls_on_tasks

2021-02-02 Thread John Ogness
9]  entry_INT80_compat+0x71/0x76
[  926.939680] RIP: 0023:0xf7f9da02
[  926.940193] RSP: 002b:ffdb2864 EFLAGS: 0246 ORIG_RAX: 
0003
[  926.941301] RAX: ffe0 RBX: 0003 RCX: 56659234
[  926.942307] RDX: 1fff RSI: 01e0 RDI: 56659234
[  926.943312] RBP:  R08:  R09: 
[  926.944313] R10:  R11:  R12: 
[  926.945314] R13:  R14:  R15: 

This pattern is _always_ the same (using either my simple change or with
the problematic commit applied). Obviously the removal of the spinlock
usage is not the issue. But I am concerned that the ringbuffer is
somehow involved. I have tried to reproduce this problem doing
non-ringbuffer activity, but have not had success.

Also, the problem disappears if a newer kernel is used. So maybe there
was something fixed in rcu or an rcu user. But still, it is very odd
that the ringbuffer is triggering it.

I will continue investigating this.

Also, I plan to send a patch to lkp so that the test script is not
doing:

dmesg > /dev/kmsg

Although this may be a great test for printk, for rcutorture it would be
more appropriate to do something like:

dmesg > /tmpfile
cat /tmpfile > /dev/kmsg

to avoid the endless read/feed cycle.

John Ogness


Re: [PATCH printk-rework 11/12] printk: remove logbuf_lock

2021-02-02 Thread John Ogness
On 2021-02-02, Petr Mladek  wrote:
> On Tue 2021-01-26 22:21:50, John Ogness wrote:
>> Since the ringbuffer is lockless, there is no need for it to be
>> protected by @logbuf_lock. Remove @logbuf_lock.
>> 
>> This means that printk_nmi_direct and printk_safe_flush_on_panic()
>> no longer need to acquire any lock to run.
>> 
>> @console_seq, @exclusive_console_stop_seq, @console_dropped are
>> protected by @console_lock.
>> 
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index d14a4afc5b72..b57dba7f077d 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -401,6 +366,7 @@ static u64 syslog_seq;
>>  static size_t syslog_partial;
>>  static bool syslog_time;
>>  
>> +/* All 3 protected by @console_sem. */
>>  /* the next printk record to write to the console */
>>  static u64 console_seq;
>>  static u64 exclusive_console_stop_seq;
>> @@ -762,27 +728,27 @@ static ssize_t devkmsg_read(struct file *file, char 
>> __user *buf,
>>  if (ret)
>>  return ret;
>>  
>> -logbuf_lock_irq();
>> +printk_safe_enter_irq();
>
> What is the exact reason to keep this, please?

As Sergey pointed out [0], logbuf_lock_irq() does 2 things: logbuf_lock
and safe buffers. This series is not trying to remove the safe buffers
(a later series will). The series is only removing logbuf_lock. So all
logbuf_lock_*() calls will turn into printk_safe_*() calls. There are a
few exceptions, which you noticed and I will respond to.

[0] https://lkml.kernel.org/r/20201208203539.gb1667...@google.com

> 1. The primary function of the printk_safe context is to avoid deadlock
>caused by logbuf_lock. It might have happened with recursive or nested
>printk(). But logbuf_lock is gone now.

Agreed. Deadlock is not a concern anymore.

> 2. There are still some hidded locks that were guarded by this as
>well. For example, console_owner_lock, or spinlock inside
>console_sem, or scheduler locks taken when console_sem()
>wakes another waiting process. It might still make sense
>to somehow guard these.

This was not my motivation and I do not think it is an issue. I am not
aware of any technical need for the safe buffers to protect such
synchronization.

> 3. It kind of prevented infinite printk() recursion by using another
>code path. The other path was limited by the size of the per-cpu
>buffer. Well, recursion inside printk_safe code would likely
>hit end of the stack first.

Yes, this was my main motivation. The safe buffers carry this
responsibility in mainline. So until a replacement for recursion
protection is in place, the safe buffers should remain.

And even if we decide we do not need/want recursion protection, I still
do not think this series should be the one to remove it. I only wanted
to remove logbuf_lock for now.

If we later have regressions, it will be helpful to bisect if the safe
buffers (with their local_irq_disable()) or the logbuf_lock were
involved.

> IMHO, we do not need printk safe context here in devkmsg_read().
> It does not belong into any categoty that is described above.
> logbug_lock() is gone. devkmsg_read() is never called directly
> from printk().

No. But it is calling printk_ringbuffer functions that can trigger
WARN_ONs that can trigger printk's.

> The same is true for almost entire patch. There are only two or so
> exceptions, see below.
>
>
>>  if (!prb_read_valid(prb, atomic64_read(>seq), r)) {
>>  if (file->f_flags & O_NONBLOCK) {
>>  ret = -EAGAIN;
>> -logbuf_unlock_irq();
>> +printk_safe_exit_irq();
>>  goto out;
>>  }
>>  
>> -logbuf_unlock_irq();
>> +printk_safe_exit_irq();
>>  ret = wait_event_interruptible(log_wait,
>>  prb_read_valid(prb, atomic64_read(>seq), 
>> r));
>>  if (ret)
>>  goto out;
>> -logbuf_lock_irq();
>> +printk_safe_enter_irq();
>>  }
>>  
>>  if (atomic64_read(>seq) < prb_first_valid_seq(prb)) {
>>  /* our last seen message is gone, return error and reset */
>>  atomic64_set(>seq, prb_first_valid_seq(prb));
>>  ret = -EPIPE;
>> -logbuf_unlock_irq();
>> +printk_safe_exit_irq();
>>  goto out;
>>  }
>>  
>
>
>> @@ -2593,7 +2559,6 @@ void console_unlock(void)
>>  size_t len;
>>  
>>  printk_safe_enter_irqsave(f

Re: [PATCH 3/3] printk: move CONSOLE_EXT_LOG_MAX to kernel/printk/printk.c

2021-02-02 Thread John Ogness
On 2021-02-02, Masahiro Yamada  wrote:
> This macro is only used in kernel/printk/printk.c

I recently posted a patch [0] that added another macro CONSOLE_LOG_MAX
here. But it also is only used in printk.c. I see no reason why either
should be in the header. Neither my patch nor commit d43ff430f434
("printk: guard the amount written per line by devkmsg_read()") show any
motivation for using printk.h.

I am fine with moving them out. The only consequences could be
out-of-tree modules breaking, but do we care about that?

John Ogness

[0] https://lkml.kernel.org/r/20210126211551.26536-5-john.ogn...@linutronix.de


Re: [PATCH 1/3] printk: use CONFIG_CONSOLE_LOGLEVEL_* directly

2021-02-02 Thread John Ogness
On 2021-02-02, Masahiro Yamada  wrote:
> CONSOLE_LOGLEVEL_DEFAULT is nothing more than a shorthand of
> CONFIG_CONSOLE_LOGLEVEL_DEFAULT.
>
> When you change CONFIG_CONSOLE_LOGLEVEL_DEFAULT from Kconfig, almost
> all objects are rebuilt because CONFIG_CONSOLE_LOGLEVEL_DEFAULT is
> used in , which is included from most of source files.
>
> In fact, there are only 4 users of CONSOLE_LOGLEVEL_DEFAULT:
>
>   arch/x86/platform/uv/uv_nmi.c
>   drivers/firmware/efi/libstub/efi-stub-helper.c
>   drivers/tty/sysrq.c
>   kernel/printk/printk.c
>
> So, when you change CONFIG_CONSOLE_LOGLEVEL_DEFAULT and rebuild the
> kernel, it is enough to recompile those 4 files.
>
> Remove the CONSOLE_LOGLEVEL_DEFAULT definition from ,
> and use CONFIG_CONSOLE_LOGLEVEL_DEFAULT directly.

With commit a8fe19ebfbfd ("kernel/printk: use symbolic defines for
console loglevels") it can be seen that various drivers used to
hard-code their own values. The introduction of the macros in an
intuitive location (include/linux/printk.h) made it easier for authors
to find/use the various available printk settings and thresholds.

Technically there is no problem using Kconfig macros directly. But will
authors bother to hunt down available Kconfig settings? Or will they
only look in printk.h to see what is available?

IMHO if code wants to use settings from a foreign subsystem, it should
be taking those from headers of that subsystem, rather than using some
Kconfig settings from that subsystem. Headers exist to make information
available to external code. Kconfig (particularly for a subsystem) exist
to configure that subsystem.

But my feeling on this may be misguided. Is it generally accepted in the
kernel that any code can use Kconfig settings of any other code?

John Ogness


Re: [PATCH printk-rework 09/12] um: synchronize kmsg_dumper

2021-02-01 Thread John Ogness
Hi Richard,

On 2021-02-01, Richard Weinberger  wrote:
>>>> In preparation for removing printk's @logbuf_lock, dumpers that have
>>>> assumed to be protected against parallel calls must provide their own
>>>> synchronization. Add a locally static spinlock to synchronize the
>>>> kmsg_dump call and temporary buffer usage.
>>>> 
>>>> Signed-off-by: John Ogness 
>>>> ---
>>>>  arch/um/kernel/kmsg_dump.c | 8 
>>>>  1 file changed, 8 insertions(+)
>>>> 
>>>> diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
>>>> index f38349ad00ea..173999422ed8 100644
>>>> --- a/arch/um/kernel/kmsg_dump.c
>>>> +++ b/arch/um/kernel/kmsg_dump.c
>>>> @@ -1,5 +1,6 @@
>>>>  // SPDX-License-Identifier: GPL-2.0
>>>>  #include 
>>>> +#include 
>>>>  #include 
>>>>  #include 
>>>>  #include 
>>>> @@ -9,8 +10,10 @@ static void kmsg_dumper_stdout(struct kmsg_dumper 
>>>> *dumper,
>>>>enum kmsg_dump_reason reason,
>>>>struct kmsg_dumper_iter *iter)
>>>>  {
>>>> +  static DEFINE_SPINLOCK(lock);
>>>>static char line[1024];
>>>>struct console *con;
>>>> +  unsigned long flags;
>>>>size_t len = 0;
>>>>  
>>>>/* only dump kmsg when no console is available */
>>>> @@ -25,11 +28,16 @@ static void kmsg_dumper_stdout(struct kmsg_dumper 
>>>> *dumper,
>>>>if (con)
>>>>return;
>>>>  
>>>> +  if (!spin_trylock_irqsave(, flags))
>>>> +  return;
>>>> +
>>>>printf("kmsg_dump:\n");
>>>>while (kmsg_dump_get_line(iter, true, line, sizeof(line), )) {
>>>>line[len] = '\0';
>>>>printf("%s", line);
>>>>}
>>>> +
>>>> +  spin_unlock_irqrestore(, flags);
>>>
>>> What exactly is synchronized here, please?
>>> Access to @line buffer or @iter or both?
>> 
>> @line is being synchronized. @iter does not require synchronization.
>> 
>>> It looks to me that the access to @line buffer was not synchronized
>>> before. kmsg_dump_get_line() used a lock internally but
>>> it was not synchronized with the later printf("%s", line);
>> 
>> The line was previously synchronized for the kmsg_dump_get_line()
>> call. But yes, it was not synchronized after the call, which is a bug if
>> the dump is triggered on multiple CPUs simultaneously. The commit
>> message should also mention that it is handling that bug.
>> 
>>> IMHO, this patch is not needed.
>> 
>> I am not familiar enough with ARCH=um to know if dumps can be triggered
>> on multiple CPUs simultaneously. Perhaps ThomasM or Richard can chime in
>> here.
>
> Well, uml has no SMP support, so no parallel dumps. :-)

When I grep through arch/um, I see many uses of spinlocks. This would
imply that uml at least has some sort of preemption model where they are
needed. Dumps can trigger from any context and from multiple paths.

If you are sure that this is no concern, then I will drop this patch
from my series.

John Ogness


Re: [PATCH printk-rework 09/12] um: synchronize kmsg_dumper

2021-02-01 Thread John Ogness
(Added CC: Thomas Meyer, Richard Weinberger)

On 2021-02-01, Petr Mladek  wrote:
>> In preparation for removing printk's @logbuf_lock, dumpers that have
>> assumed to be protected against parallel calls must provide their own
>> synchronization. Add a locally static spinlock to synchronize the
>> kmsg_dump call and temporary buffer usage.
>> 
>> Signed-off-by: John Ogness 
>> ---
>>  arch/um/kernel/kmsg_dump.c | 8 
>>  1 file changed, 8 insertions(+)
>> 
>> diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
>> index f38349ad00ea..173999422ed8 100644
>> --- a/arch/um/kernel/kmsg_dump.c
>> +++ b/arch/um/kernel/kmsg_dump.c
>> @@ -1,5 +1,6 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -9,8 +10,10 @@ static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,
>>  enum kmsg_dump_reason reason,
>>  struct kmsg_dumper_iter *iter)
>>  {
>> +static DEFINE_SPINLOCK(lock);
>>  static char line[1024];
>>  struct console *con;
>> +unsigned long flags;
>>  size_t len = 0;
>>  
>>  /* only dump kmsg when no console is available */
>> @@ -25,11 +28,16 @@ static void kmsg_dumper_stdout(struct kmsg_dumper 
>> *dumper,
>>  if (con)
>>  return;
>>  
>> +if (!spin_trylock_irqsave(, flags))
>> +return;
>> +
>>  printf("kmsg_dump:\n");
>>  while (kmsg_dump_get_line(iter, true, line, sizeof(line), )) {
>>  line[len] = '\0';
>>  printf("%s", line);
>>  }
>> +
>> +spin_unlock_irqrestore(, flags);
>
> What exactly is synchronized here, please?
> Access to @line buffer or @iter or both?

@line is being synchronized. @iter does not require synchronization.

> It looks to me that the access to @line buffer was not synchronized
> before. kmsg_dump_get_line() used a lock internally but
> it was not synchronized with the later printf("%s", line);

The line was previously synchronized for the kmsg_dump_get_line()
call. But yes, it was not synchronized after the call, which is a bug if
the dump is triggered on multiple CPUs simultaneously. The commit
message should also mention that it is handling that bug.

> IMHO, this patch is not needed.

I am not familiar enough with ARCH=um to know if dumps can be triggered
on multiple CPUs simultaneously. Perhaps ThomasM or Richard can chime in
here.

John Ogness


Re: [PATCH printk-rework 08/12] printk: introduce a kmsg_dump iterator

2021-02-01 Thread John Ogness
On 2021-02-01, Petr Mladek  wrote:
>> Rather than store the iterator information into the registered
>> kmsg_dump structure, create a separate iterator structure. The
>> kmsg_dump_iter structure can reside on the stack of the caller,
>> thus allowing lockless use of the kmsg_dump functions.
>> 
>> This is in preparation for removal of @logbuf_lock.
>
>> diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
>> index 76cc4122d08e..ecc98f549d93 100644
>> --- a/include/linux/kmsg_dump.h
>> +++ b/include/linux/kmsg_dump.h
>> @@ -29,6 +29,18 @@ enum kmsg_dump_reason {
>>  KMSG_DUMP_MAX
>>  };
>>  
>> +/**
>> + * struct kmsg_dumper_iter - iterator for kernel crash message dumper
>> + * @active: Flag that specifies if this is currently dumping
>> + * @cur_seq:The record to dump (private)
>> + * @next_seq:   The first record of the next block (private)
>
> Just to be sure. This description should get update if you agree with
> the alternative one in the 1st patch.

Yes, I assumed so and adjusted my preparation-v2 series accordingly.

John


Re: [PATCH printk-rework 07/12] printk: add syslog_lock

2021-02-01 Thread John Ogness
On 2021-02-01, Petr Mladek  wrote:
>> The global variables @syslog_seq, @syslog_partial, @syslog_time
>> and write access to @clear_seq are protected by @logbuf_lock.
>> Once @logbuf_lock is removed, these variables will need their
>> own synchronization method. Introduce @syslog_lock for this
>> purpose.
>
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -390,8 +390,12 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
>>  printk_safe_exit_irqrestore(flags); \
>>  } while (0)
>>  
>> +/* syslog_lock protects syslog_* variables and write access to clear_seq. */
>> +static DEFINE_RAW_SPINLOCK(syslog_lock);
>
> I am not expert on RT code but I think that it prefers the generic
> spinlocks. syslog_lock seems to be used in a normal context.
> IMHO, it does not need to be a raw spinlock.
>
> Note that using normal spinlock would require switching the locking
> order. logbuf_lock is a raw lock. Normal spinlock must not be taken
> under a raw spinlock.
>
> Or we could switch syslog_lock to the normal spinlock later
> after logbuf_lock is removed.

I was planning on this last option because I think it is the
simplest. There are places such as syslog_print_all() where the
printk_safe_enter() and logbuf_lock locking are not at the same place as
the syslog_lock locking (and syslog_lock is inside).

Once the safe buffers are removed, syslog_lock can transition to a
spinlock. (spinlock's must not be under local_irq_save().)

>> +
>>  #ifdef CONFIG_PRINTK
>>  DECLARE_WAIT_QUEUE_HEAD(log_wait);
>> +/* All 3 protected by @syslog_lock. */
>>  /* the next printk record to read by syslog(READ) or /proc/kmsg */
>>  static u64 syslog_seq;
>>  static size_t syslog_partial;
>> @@ -1631,6 +1643,7 @@ int do_syslog(int type, char __user *buf, int len, int 
>> source)
>>  bool clear = false;
>>  static int saved_console_loglevel = LOGLEVEL_DEFAULT;
>>  int error;
>> +u64 seq;
>
> This allows to remove definition of the same temporary variable
> for case SYSLOG_ACTION_SIZE_UNREAD.

Right. I missed that.

>>  
>>  error = check_syslog_permissions(type, source);
>>  if (error)
>> @@ -1648,8 +1661,14 @@ int do_syslog(int type, char __user *buf, int len, 
>> int source)
>>  return 0;
>>  if (!access_ok(buf, len))
>>  return -EFAULT;
>> +
>> +/* Get a consistent copy of @syslog_seq. */
>> +raw_spin_lock_irq(_lock);
>> +seq = syslog_seq;
>> +raw_spin_unlock_irq(_lock);
>> +
>>  error = wait_event_interruptible(log_wait,
>> -prb_read_valid(prb, syslog_seq, NULL));
>> +prb_read_valid(prb, seq, NULL));
>
> Hmm, this will not detect when syslog_seq gets cleared in parallel.
> I hope that nobody rely on this behavior. But who knows?
>
> A solution might be to have also syslog_seq latched. But I am
> not sure if it is worth it.
>
> I am for taking the risk and use the patch as it is now. Let's keep
> the code for now. We could always use the latched variable when
> anyone complains. Just keep it in mind.

We could add a simple helper:

/* Get a consistent copy of @syslog_seq. */
static u64 syslog_seq_read(void)
{
unsigned long flags;

raw_spin_lock_irqsave(_lock, flags);
    seq = syslog_seq;
raw_spin_unlock_irqrestore(_lock, flags);
return seq;
}

Then change the code to:

error = wait_event_interruptible(log_wait,
prb_read_valid(prb, read_syslog_seq(), NULL));


register_console() could also make use of the function. (That is why I
am suggesting the flags variant.)

John Ogness


  1   2   3   4   5   6   7   >