Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections

2023-10-16 Thread John Ogness
Hi Dave,

On 2023-10-16, Dave Young  wrote:
>> > Does anyone really want explicit flushes in panic()?
>> 
>> So far you are the only one speaking against it. I expect as time
>> goes on it will get even more complex as it becomes tunable (also
>> something we talked about during the demo).
>
> Flush consoles in panic kexec case sounds not good, but I have no deep
> understanding about the atomic printk series, added kexec list and
> reviewers in cc.

Currently every printk() message tries to flush immediately.

This series introduced a new method of first allowing a set of printk()
messages to be stored to the ringbuffer and then flushing the full
set. That is what this discussion was about.

The issue with allowing a set of printk() messages to be stored is that
you need to explicitly mark in code where the actual flushing should
occur. Petr's argument is that we do not want to insert "flush points"
into the panic() function and instead we should do as we do now: flush
each printk() message immediately.

In the end (for my upcoming v3 series) I agreed with Petr. We will
continue to keep things as they are now: flush each printk() message
immediately.

Currently consoles try to flush unsafely before kexec. With the atomic
printk series our goal is to only perform _safe_ flushing until all
other panic operations are complete. Only at the very end of panic()
would unsafe flushing be attempted.

John

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] docs: gdbmacros: print newest record

2022-12-29 Thread John Ogness
@head_id points to the newest record, but the printing loop
exits when it increments to this value (before printing).

Exit the printing loop after the newest record has been printed.

The python-based function in scripts/gdb/linux/dmesg.py already
does this correctly.

Fixes: e60768311af8 ("scripts/gdb: update for lockless printk ringbuffer")
Cc: sta...@vger.kernel.org
Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 82aecdcae8a6..030de95e3e6b 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -312,10 +312,10 @@ define dmesg
set var $prev_flags = $info->flags
end
 
-   set var $id = ($id + 1) & $id_mask
if ($id == $end_id)
loop_break
end
+   set var $id = ($id + 1) & $id_mask
end
 end
 document dmesg

base-commit: 1b929c02afd37871d5afb9d498426f83432e71c2
-- 
2.30.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-11 Thread John Ogness
On 2022-05-10, Steven Rostedt  wrote:
>> As already mentioned in the other reply, panic() sometimes stops the
>> other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
>> 
>> Another situation is when the CPU using the lock ends in some
>> infinite loop because something went wrong. The system is in
>> an unpredictable state during panic().
>> 
>> I am not sure if this is possible with the code under gsmi_dev.lock
>> but such things really happen during panic() in other subsystems.
>> Using trylock in the panic() code path is a good practice.
>
> I believe that Peter Zijlstra had a special spin lock for NMIs or
> early printk, where it would not block if the lock was held on the
> same CPU. That is, if an NMI happened and paniced while this lock was
> held on the same CPU, it would not deadlock. But it would block if the
> lock was held on another CPU.

Yes. And starting with 5.19 it will be carrying the name that _you_ came
up with (cpu_sync):

printk_cpu_sync_get_irqsave()
printk_cpu_sync_put_irqrestore()

John

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v4 0/6] printk: remove safe buffers

2021-07-17 Thread John Ogness
Hi,

Here is v4 of a series to remove the safe buffers. v3 can be
found here [0]. The safe buffers are no longer needed because
messages can be stored directly into the log buffer from any
context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
implemented for this series.

The safe buffers also implicitly provided serialization
between multiple CPUs executing in NMI context. This was
particularly necessary for the nmi_backtrace() output. This
serializiation is now preserved by using the printk cpulock.

With the removal of the safe buffers, there is no need for
extra NMI enter/exit tracking. So this is also removed
(which includes removing the config option CONFIG_PRINTK_NMI).

And finally, there are a few places in the kernel that need to
specify code blocks where all printk calls are to be deferred
printing. Previously the NMI tracking API was being (mis)used
for this purpose. This series introduces an official and
explicit interface for such cases. (Note that all deferred
printing will be removed anyway, once printing kthreads are
introduced.)

Changes since v3:

- Remove safe context tracking in vprintk().

- Add safe context tracking for @console_owner usage since that
  is also a component of the printing code.

- Refactor syslog_print() so that it is easier to understand
  and follow the locking logic.

- Introduce printk_deferred_enter/exit functions to be used by
  code that needs to specify code block where all printk calls
  are to be deferred printing.

John Ogness

[0] https://lore.kernel.org/lkml/2021062448.5190-1-john.ogn...@linutronix.de

John Ogness (6):
  lib/nmi_backtrace: explicitly serialize banner and regs
  printk: track/limit recursion
  printk: remove safe buffers
  printk: remove NMI tracking
  printk: convert @syslog_lock to mutex
  printk: syslog: close window between wait and read

 arch/arm/kernel/smp.c  |   4 +-
 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 arch/powerpc/kexec/crash.c |   2 +-
 include/linux/hardirq.h|   2 -
 include/linux/printk.h |  41 ++--
 init/Kconfig   |   5 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |  25 ---
 kernel/printk/printk.c | 268 ++--
 kernel/printk/printk_safe.c| 364 +
 kernel/trace/trace.c   |   4 +-
 lib/nmi_backtrace.c|  13 +-
 14 files changed, 194 insertions(+), 544 deletions(-)


base-commit: 70333dec446292cd896cd051d2ebd6808b328949
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v4 3/6] printk: remove safe buffers

2021-07-15 Thread John Ogness
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console and console_owner locks is left
in place. This is because the console and console_owner locks are
needed for the actual printing.

Signed-off-by: John Ogness 
---
 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h |  10 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |  17 --
 kernel/printk/printk.c | 120 +---
 kernel/printk/printk_safe.c| 335 +
 lib/nmi_backtrace.c|   6 -
 9 files changed, 48 insertions(+), 450 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index b4ab95c9e94a..2522800217d1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -170,7 +170,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c9a8f4781a10..dc17d8903d4f 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -183,11 +183,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
-   /*
-* printk_safe_flush() seems to require another print
-* before anything actually goes out to console.
-*/
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1790a5521fd9..664612f75dac 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char 
*fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char 
*log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 #ifdef CONFIG_SMP
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index f099baee3578..69c6e9b7761c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -978,7 +978,6 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
__crash_kexec(NULL);
 
/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 */
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
-   /* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 51615c909b2f..6cc35c5de890 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,7 +22,6 @@ __printf(1, 0) int vprintk_deferred(const char *fmt, va_list 
args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)   \
@@ -37,18 +36,6 @@ bool printk_percpu_data_ready(void);
local_irq_restore(flags);   \
} while (0)
 
-#define printk_safe_enter_irq()\
-   do {\
-   local_irq_disable();\
-   __printk_safe_enter();  \
-   } while (0

Re: [PATCH printk v3 3/6] printk: remove safe buffers

2021-06-24 Thread John Ogness
On 2021-06-24, Petr Mladek  wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1852,7 +1839,7 @@ static int console_trylock_spinning(void)
>>  if (console_trylock())
>>  return 1;
>>  
>> -printk_safe_enter_irqsave(flags);
>> +local_irq_save(flags);
>>  
>>  raw_spin_lock(_owner_lock);
>
> This spin_lock is in the printk() path. We must make sure that
> it does not cause deadlock.
>
> printk_safe_enter_irqsave(flags) prevented the recursion because
> it deferred the console handling.
>
> One danger might be a lockdep report triggered by
> raw_spin_lock(_owner_lock) itself. But it should be safe.
> lockdep is checked before the lock is actually taken
> and lockdep should disable itself before printing anything.
>
> Another danger might be any printk() called under the lock.
> The code just compares and assigns values to some variables
> (static, on stack) so we should be on the safe side.
>
> Well, I would feel more comfortable if we add printk_safe_enter_irqsave()
> back around the sections guarded by this lock. It can be done
> in a separate patch. The code looks safe at the moment.

You are correct. printk_safe should also be wrapping @console_owner_lock
locking.

>> @@ -2716,19 +2700,22 @@ void console_unlock(void)
>>   * were to occur on another CPU, it may wait for this one to
>>   * finish. This task can not be preempted if there is a
>>   * waiter waiting to take over.
>> + *
>> + * Interrupts are disabled because the hand over to a waiter
>> + * must not be interrupted until the hand over is completed
>> + * (@console_waiter is cleared).
>>   */
>> +local_irq_save(flags);
>>  console_lock_spinning_enable();
>
> Same here. console_lock_spinning_enable() takes console_owner_lock.
> I would feel more comfortable if we added printk_safe_enter_irqsave(flags)
> inside console_lock_spinning_enable() around the locked code. The code
> is safe at the moment but...

Agreed.

>>  stop_critical_timings();/* don't trace print latency */
>>  call_console_drivers(ext_text, ext_len, text, len);
>>  start_critical_timings();
>>  
>> -if (console_lock_spinning_disable_and_check()) {
>> -printk_safe_exit_irqrestore(flags);
>> +handover = console_lock_spinning_disable_and_check();
>
> Same here. Also console_lock_spinning_disable_and_check() takes
> console_owner_lock. It looks safe at the moment but...

Agreed.

>> --- a/kernel/printk/printk_safe.c
>> +++ b/kernel/printk/printk_safe.c
>> @@ -369,7 +70,10 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>>   * Use the main logbuf even in NMI. But avoid calling console
>>   * drivers that might have their own locks.
>>   */
>> -if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
>> +if (this_cpu_read(printk_context) &
>> +(PRINTK_NMI_DIRECT_CONTEXT_MASK |
>> + PRINTK_NMI_CONTEXT_MASK |
>> + PRINTK_SAFE_CONTEXT_MASK)) {
>>  unsigned long flags;
>>  int len;
>>  
>
> There is the following code right below:
>
>   printk_safe_enter_irqsave(flags);
>   len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
>   printk_safe_exit_irqrestore(flags);
>   defer_console_output();
>   return len;
>
> printk_safe_enter_irqsave(flags) is not needed here. Any nested
> printk() ends here as well.

Ah, I missed that one. Good eye!

> Against this can be done in a separate patch. Well, the commit message
> mentions that the printk_safe context is removed everywhere except
> for the code manipulating console lock. But is it just a detail.

I would prefer a v4 with these fixes:

- wrap @console_owner_lock with printk_safe usage

- remove unnecessary printk_safe usage from printk_safe.c

- update commit message to say that safe context tracking is left in
  place for both the console and console_owner locks

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v3 3/6] printk: remove safe buffers

2021-06-24 Thread John Ogness
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console lock is left in place. This is
because the console lock is needed for the actual printing.

Signed-off-by: John Ogness 
---
 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h |  10 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |  17 --
 kernel/printk/printk.c | 126 +
 kernel/printk/printk_safe.c| 332 +
 lib/nmi_backtrace.c|   6 -
 9 files changed, 51 insertions(+), 450 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a44a30b0688c..5828c83eaca6 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -171,7 +171,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c9a8f4781a10..dc17d8903d4f 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -183,11 +183,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
-   /*
-* printk_safe_flush() seems to require another print
-* before anything actually goes out to console.
-*/
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1790a5521fd9..664612f75dac 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char 
*fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char 
*log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 #ifdef CONFIG_SMP
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a0b6780740c8..480d5f77ef4f 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -977,7 +977,6 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
__crash_kexec(NULL);
 
/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 */
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
-   /* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 51615c909b2f..6cc35c5de890 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,7 +22,6 @@ __printf(1, 0) int vprintk_deferred(const char *fmt, va_list 
args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)   \
@@ -37,18 +36,6 @@ bool printk_percpu_data_ready(void);
local_irq_restore(flags);   \
} while (0)
 
-#define printk_safe_enter_irq()\
-   do {\
-   local_irq_disable();\
-   __printk_safe_enter();  \
-   } while (0)
-
-#define printk_safe_exit_irq

[PATCH printk v3 0/6] printk: remove safe buffers

2021-06-24 Thread John Ogness
Hi,

Here is v3 of a series to remove the safe buffers. v2 can be
found here [0]. The safe buffers are no longer needed because
messages can be stored directly into the log buffer from any
context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
implemented for this series.

The safe buffers also implicitly provided serialization
between multiple CPUs executing in NMI context. This was
particularly necessary for the nmi_backtrace() output. This
serializiation is now preserved by using the printk_cpu_lock.

And finally, with the removal of the safe buffers, there is no
need for extra NMI enter/exit tracking. So this is also removed
(which includes removing config option CONFIG_PRINTK_NMI).

Changes since v2:

- Move irq disabling/enabling out of the
  console_lock_spinning_*() functions to simplify the patches
  keep the function prototypes simple.

- Change printk_enter_irqsave()/printk_exit_irqrestore() to
  macros to allow a more common calling convention for irq
  flags.

- Use the counter pointer from printk_enter_irqsave() in
  printk_exit_irqrestore() rather than fetching it again. This
  avoids any possible race conditions when printk's percpu
  flag is set.

- Use the printk_cpu_lock to serialize banner and regs with
  the stack dump in nmi_cpu_backtrace().

John Ogness

[0] https://lore.kernel.org/lkml/20210330153512.1182-1-john.ogn...@linutronix.de

John Ogness (6):
  lib/nmi_backtrace: explicitly serialize banner and regs
  printk: track/limit recursion
  printk: remove safe buffers
  printk: remove NMI tracking
  printk: convert @syslog_lock to mutex
  printk: syslog: close window between wait and read

 arch/arm/kernel/smp.c  |   2 -
 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 arch/powerpc/kexec/crash.c |   3 -
 include/linux/hardirq.h|   2 -
 include/linux/printk.h |  22 --
 init/Kconfig   |   5 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |  23 ---
 kernel/printk/printk.c | 273 +++--
 kernel/printk/printk_safe.c| 361 +
 kernel/trace/trace.c   |   2 -
 lib/nmi_backtrace.c|  13 +-
 14 files changed, 176 insertions(+), 540 deletions(-)


base-commit: 48e72544d6f06daedbf1d9b14610be89dba67526
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH printk v2 2/5] printk: remove safe buffers

2021-04-06 Thread John Ogness
On 2021-04-01, Petr Mladek  wrote:
>> Caller-id solves this problem and is easy to sort for anyone with
>> `grep'. Yes, it is a shame that `dmesg' does not show it, but
>> directly using any of the printk interfaces does show it (kmsg_dump,
>> /dev/kmsg, syslog, console).
>
> True but frankly, the current situation is _far_ from convenient:
>
>+ consoles do not show it by default
>+ none userspace tool (dmesg, journalctl, crash) is able to show it
>+ grep is a nightmare, especially if you have more than handful of CPUs
>
> Yes, everything is solvable but not easily.
>
>> > I get this with "echo l >/proc/sysrq-trigger" and this patchset:
>> 
>> Of course. Without caller-id, it is a mess. But this has nothing to do
>> with NMI. The same problem exists for WARN_ON() on multiple CPUs
>> simultaneously. If the user is not using caller-id, they are
>> lost. Caller-id is the current solution to the interlaced logs.
>
> Sure. But in reality, the risk of mixed WARN_ONs is small. While
> this patch makes backtraces from all CPUs always unusable without
> caller_id and non-trivial effort.

I would prefer we solve the situation for non-NMI as well, not just for
the sysrq "l" case.

>> For the long term, we should introduce a printk-context API that allows
>> callers to perfectly pack their multi-line output into a single
>> entry. We discussed [0][1] this back in August 2020.
>
> We need a "short" term solution. There are currently 3 solutions:
>
> 1. Keep nmi_safe() and all the hacks around.
>
> 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
>the special lock used also by atomic consoles.
>
> 3. Tell complaining people how to sort the messed logs.

Or we look into the long term solution now. If caller-id's cannot not be
used as the solution (because nobody turns it on, nobody knows about it,
and/or distros do not enable it), then we should look at how to make at
least the backtraces contiguous. I have a few ideas here.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v2 0/5] printk: remove safe buffers

2021-04-02 Thread John Ogness
Hi,

Here is v2 of a series to remove the safe buffers. v1 can be
found here [0]. The safe buffers are no longer needed because
messages can be stored directly into the log buffer from any
context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
also implemented for this series.

And finally, with the removal of the safe buffers, there is no
need for extra NMI enter/exit tracking. So this is also removed
(which includes removing config option CONFIG_PRINTK_NMI).

This series is based on the printk-rework branch of
printk/linux.git:

commit acebb5597ff1 ("kernel/printk.c: Fixed mundane typos")

Changes since v1:

- remove the printk nmi enter/exit tracking

- remove CONFIG_PRINTK_NMI config option

- use in_nmi() to detect NMI context

- remove unused printk_safe_enter/exit macros

- after switching to the dynamic buffer, copy over NMI records
  that may have arrived during the switch window

- use local_irq_*() instead of printk_safe_*() for console
  spinning

- use separate variables rather than arrays for the per-cpu
  recursion tracking

- make @syslog_lock a mutex instead of a spin_lock

- close the wait-read window for SYSLOG_ACTION_READ

- adjust various comments and commit messages as requested

John Ogness

[0] 
https://lore.kernel.org/lkml/20210316233326.10778-1-john.ogn...@linutronix.de

John Ogness (5):
  printk: track/limit recursion
  printk: remove safe buffers
  printk: remove NMI tracking
  printk: convert @syslog_lock to mutex
  printk: syslog: close window between wait and read

 arch/arm/kernel/smp.c  |   2 -
 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 arch/powerpc/kexec/crash.c |   3 -
 include/linux/hardirq.h|   2 -
 include/linux/printk.h |  22 --
 init/Kconfig   |   5 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |  23 ---
 kernel/printk/printk.c | 281 +++--
 kernel/printk/printk_safe.c| 362 +
 kernel/trace/trace.c   |   2 -
 lib/nmi_backtrace.c|   6 -
 14 files changed, 171 insertions(+), 547 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH printk v2 2/5] printk: remove safe buffers

2021-04-01 Thread John Ogness
On 2021-04-01, Petr Mladek  wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
>>   new_descs, ilog2(new_descs_count),
>>   new_infos);
>>  
>> -printk_safe_enter_irqsave(flags);
>> +local_irq_save(flags);
>
> IMHO, we actually do not have to disable IRQ here. We already copy
> messages that might appear in the small race window in NMI. It would
> work the same way also for IRQ context.

We do not have to, but why open up this window? We are still in early
boot and interrupts have always been disabled here. I am not happy that
this window even exists. I really prefer to keep it NMI-only.

>> --- a/lib/nmi_backtrace.c
>> +++ b/lib/nmi_backtrace.c
>> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>>  touch_softlockup_watchdog();
>>  }
>>  
>> -/*
>> - * Force flush any remote buffers that might be stuck in IRQ context
>> - * and therefore could not run their irq_work.
>> - */
>> -printk_safe_flush();
>
> Sigh, this reminds me that the nmi_safe buffers serialized backtraces
> from all CPUs.
>
> I am afraid that we have to put back the spinlock into
> nmi_cpu_backtrace().

Please no. That spinlock is a disaster. It can cause deadlocks with
other cpu-locks (such as in kdb) and it will cause a major problem for
atomic consoles. We need to be very careful about introducing locks
where NMIs are waiting on other CPUs.

> It has been repeatedly added and removed depending
> on whether the backtrace was printed into the main log buffer
> or into the per-CPU buffers. Last time it was removed by
> the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
> when accessing the main log buffer in NMI").
>
> It should be safe because there should not be any other locks in the
> code path. Note that only one backtrace might be triggered at the same
> time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().

It is adding a lock around a lockless ringbuffer. For me that is a step
backwards.

> We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
> looks less evil than the nmi_safe machinery. nmi_safe() shrinks
> too long backtraces, lose timestamps, needs to be explicitely
> flushed here and there, is a non-trivial code.
>
> [*] Non-serialized bactraces are real mess. Caller-id is visible
> only on consoles or via syslogd interface. And it is not much
> convenient.

Caller-id solves this problem and is easy to sort for anyone with
`grep'. Yes, it is a shame that `dmesg' does not show it, but directly
using any of the printk interfaces does show it (kmsg_dump, /dev/kmsg,
syslog, console).

> I get this with "echo l >/proc/sysrq-trigger" and this patchset:

Of course. Without caller-id, it is a mess. But this has nothing to do
with NMI. The same problem exists for WARN_ON() on multiple CPUs
simultaneously. If the user is not using caller-id, they are
lost. Caller-id is the current solution to the interlaced logs.

For the long term, we should introduce a printk-context API that allows
callers to perfectly pack their multi-line output into a single
entry. We discussed [0][1] this back in August 2020.

John Ogness

[0] 
https://lore.kernel.org/lkml/472f2e553805b52d9834d64e4056db965edee329.ca...@perches.com
[1] offlist message-id: 87d03k9ymz@jogness.linutronix.de

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH printk v2 2/5] printk: remove safe buffers

2021-03-31 Thread John Ogness
On 2021-03-30, John Ogness  wrote:
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index e971c0a9ec9e..f090d6a1b39e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1772,16 +1759,21 @@ static struct task_struct *console_owner;
>  static bool console_waiter;
>  
>  /**
> - * console_lock_spinning_enable - mark beginning of code where another
> + * console_lock_spinning_enable_irqsave - mark beginning of code where 
> another
>   *   thread might safely busy wait
>   *
>   * This basically converts console_lock into a spinlock. This marks
>   * the section where the console_lock owner can not sleep, because
>   * there may be a waiter spinning (like a spinlock). Also it must be
>   * ready to hand over the lock at the end of the section.
> + *
> + * This disables interrupts because the hand over to a waiter must not be
> + * interrupted until the hand over is completed (@console_waiter is cleared).
>   */
> -static void console_lock_spinning_enable(void)
> +static void console_lock_spinning_enable_irqsave(unsigned long *flags)

I missed the prototype change for the !CONFIG_PRINTK case, resulting in:

linux/kernel/printk/printk.c:2707:3: error: implicit declaration of function 
‘console_lock_spinning_enable_irqsave’; did you mean 
‘console_lock_spinning_enable’? [-Werror=implicit-function-declaration]
   console_lock_spinning_enable_irqsave();
   ^~~~
   console_lock_spinning_enable

Will be fixed for v3.

(I have now officially added !CONFIG_PRINTK to my CI tests.)

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v2 2/5] printk: remove safe buffers

2021-03-30 Thread John Ogness
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console lock is left in place. This is
because the console lock is needed for the actual printing.

Signed-off-by: John Ogness 
---
 Note: The follow-up patch removes the NMI tracking.

 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h |  10 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |  17 --
 kernel/printk/printk.c | 137 +-
 kernel/printk/printk_safe.c| 333 +
 lib/nmi_backtrace.c|   6 -
 9 files changed, 56 insertions(+), 457 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3ec7b443fe6b..7d2b339afcb0 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -170,7 +170,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index af3c15a1d41e..8ae46c5945d0 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -181,11 +181,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
-   /*
-* printk_safe_flush() seems to require another print
-* before anything actually goes out to console.
-*/
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe7eb2351610..2476796c1150 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char 
*fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char 
*log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 extern int kptr_restrict;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a0b6780740c8..480d5f77ef4f 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -977,7 +977,6 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
__crash_kexec(NULL);
 
/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 */
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
-   /* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 51615c909b2f..6cc35c5de890 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,7 +22,6 @@ __printf(1, 0) int vprintk_deferred(const char *fmt, va_list 
args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)   \
@@ -37,18 +36,6 @@ bool printk_percpu_data_ready(void);
local_irq_restore(flags);   \
} while (0)
 
-#define printk_safe_enter_irq()\
-   do {\
-   local_irq_disable();\
-   __printk_safe_enter

Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-29 Thread John Ogness
On 2021-03-29, Petr Mladek  wrote:
> I wonder if some console drivers rely on the fact that the write()
> callback is called with interrupts disabled.
>
> IMHO, it would be a bug when any write() callback expects that
> callers disabled the interrupts.

Agreed.

> Do you plan to remove the console-spinning stuff after offloading
> consoles to the kthreads?

Yes. Although a similar concept will be introduced to allow the threaded
printers and the atomic consoles to compete.

> Will you call console write() callback with irq enabled from the
> kthread?

No. That defeats the fundamental purpose of this entire rework
excercise. ;-)

> Anyway, we should at least add a comment why the interrupts are
> disabled.

I decided to move the local_irq_save/restore inside the console-spinning
functions and added a comment for v2.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-29 Thread John Ogness
On 2021-03-29, John Ogness  wrote:
>> Will you call console write() callback with irq enabled from the
>> kthread?
>
> No. That defeats the fundamental purpose of this entire rework
> excercise. ;-)

Sorry, I misread your question. The answer is "yes". We want to avoid a
local_irq_save() when calling into console->write().

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-26 Thread John Ogness
On 2021-03-23, Petr Mladek  wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1142,8 +1126,6 @@ void __init setup_log_buf(int early)
>>   new_descs, ilog2(new_descs_count),
>>   new_infos);
>>  
>> -printk_safe_enter_irqsave(flags);
>> -
>>  log_buf_len = new_log_buf_len;
>>  log_buf = new_log_buf;
>>  new_log_buf_len = 0;
>> @@ -1159,8 +1141,6 @@ void __init setup_log_buf(int early)
>>   */
>>  prb = _rb_dynamic;
>>  
>> -printk_safe_exit_irqrestore(flags);
>
> This will allow to add new messages from the IRQ context when we
> are copying them to the new buffer. They might get lost in
> the small race window.
>
> Also the messages from NMI might get lost because they are not
> longer stored in the per-CPU buffer.
>
> A possible solution might be to do something like this:
>
>   prb_for_each_record(0, _rb_static, seq, )
>   free -= add_to_rb(_rb_dynamic, );
>
>   prb = _rb_dynamic;
>
>   /*
>* Copy the remaining messages that might have appeared
>* from IRQ or NMI context after we ended copying and
>* before we switched the buffers. They must be finalized
>* because only one CPU is up at this stage.
>*/
>   prb_for_each_record(seq, _rb_static, seq, )
>   free -= add_to_rb(_rb_dynamic, );

OK. I'll probably rework it some and combine it with the "dropped" test
so that we can identify if messages were dropped during the transition
(because of static ringbuffer overrun).

>> -
>>  if (seq != prb_next_seq(_rb_static)) {
>>  pr_err("dropped %llu messages\n",
>> prb_next_seq(_rb_static) - seq);
>> @@ -2666,7 +2631,6 @@ void console_unlock(void)
>>  size_t ext_len = 0;
>>  size_t len;
>>  
>> -printk_safe_enter_irqsave(flags);
>>  skip:
>>  if (!prb_read_valid(prb, console_seq, ))
>>  break;
>> @@ -2711,6 +2675,8 @@ void console_unlock(void)
>>  printk_time);
>>  console_seq++;
>>  
>> +printk_safe_enter_irqsave(flags);
>
> What is the purpose of the printk_safe context here, please?

console_lock_spinning_enable() needs to be called with interrupts
disabled. I should have just used local_irq_save().

I could add local_irq_save() to console_lock_spinning_enable() and
restore them at the end of console_lock_spinning_disable_and_check(),
but then I would need to add a @flags argument to both functions. I
think it is simpler to just do the disable/enable from the caller,
console_unlock().

BTW, I could not find any sane way of disabling interrupts via a
raw_spin_lock_irqsave() of @console_owner_lock because of the how it is
used with lockdep. In particular for
console_lock_spinning_disable_and_check().

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-22 Thread John Ogness
On 2021-03-22, Petr Mladek  wrote:
> On Mon 2021-03-22 12:16:15, John Ogness wrote:
>> On 2021-03-21, Sergey Senozhatsky  wrote:
>> >> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, 
>> >> va_list args)
>> >>* Use the main logbuf even in NMI. But avoid calling console
>> >>* drivers that might have their own locks.
>> >>*/
>> >> - if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
>> >> + if (this_cpu_read(printk_context) &
>> >> + (PRINTK_NMI_DIRECT_CONTEXT_MASK |
>> >> +  PRINTK_NMI_CONTEXT_MASK |
>> >> +  PRINTK_SAFE_CONTEXT_MASK)) {
>> >
>> > Do we need printk_nmi_direct_enter/exit() and
>> > PRINTK_NMI_DIRECT_CONTEXT_MASK?  Seems like all printk_safe() paths
>> > are now DIRECT - we store messages to the prb, but don't call console
>> > drivers.
>>
>> I was planning on waiting until the kthreads are introduced, in which
>> case printk_safe.c is completely removed.
>
> You want to keep printk_safe() context because it prevents calling
> consoles even in normal context. Namely, it prevents deadlock by
> recursively taking, for example, sem->lock in console_lock() or
> console_owner_lock in console_trylock_spinning(). Am I right?

Correct.

>> But I suppose I could switch
>> the 1 printk_nmi_direct_enter() user to printk_nmi_enter() so that
>> PRINTK_NMI_DIRECT_CONTEXT_MASK can be removed now. I would do this in a
>> 4th patch of the series.
>
> Yes, please unify the PRINTK_NMI_CONTEXT. One is enough.

Agreed. (But I'll go even further. See below.)

> I wonder if it would make sense to go even further at this stage.
> There will still be 4 contexts that modify the printk behavior after
> this patchset:
>
>   + printk_count set by printk_enter()/exit()
>   + prevents: infinite recursion
>   + context: any context
>   + action: skips entire printk at 3rd recursion level
>
>   + prink_context set by printk_safe_enter()/exit()
>   + prevents: dead lock caused by recursion into some
>   console code in any context
>   + context: any
>   + action: skips console call at 1st recursion level

Technically, at this point printk_safe_enter() behavior is identical to
printk_nmi_enter(). Namely, prevent any recursive printk calls from
calling into the console code.

>   + printk_context set by printk_nmi_enter()/exit()
>   + prevents: dead lock caused by any console lock recursion
>   + context: NMI
>   + action: skips console calls at 0th recursion level
>
>   + kdb_trap_printk
>   + redirects printk() to kdb_printk() in kdb context
>
>
> What is possible?
>
> 1. We could get rid of printk_nmi_enter()/exit() and
>PRINTK_NMI_CONTEXT completely already now. It is enough
>to check in_nmi() in printk_func().
>
>printk_nmi_enter() was added by the commit 42a0bb3f71383b457a7db362
>("printk/nmi: generic solution for safe printk in NMI"). It was
>really needed to modify @printk_func pointer.
>
>We did not remove it later when printk_function became a real
>function. The idea was to track all printk contexts in a single
>variable. But we never added kdb context.
>
>It might make sense to remove it now. Peter Zijstra would be happy.
>There already were some churns with tracking printk_context in NMI.
>For example, see
>https://lore.kernel.org/r/20200219150744.428764...@infradead.org
>
>IMHO, it does not make sense to wait until the entire console-stuff
>rework is done in this case.

Agreed. in_nmi() within vprintk_emit() is enough to detect if the
console code should be skipped:

if (!in_sched && !in_nmi()) {
...
}

> 2. I thought about unifying printk_safe_enter()/exit() and
>printk_enter()/exit(). They both count recursion with
>IRQs disabled, have similar name. But they are used
>different way.
>
>But better might be to rename printk_safe_enter()/exit() to
>console_enter()/exit() or to printk_deferred_enter()/exit().
>It would make more clear what it does now. And it might help
>to better distinguish it from the new printk_enter()/exit().
>
>This patchset actually splits the original printk_safe()
>functionality into two:
>
>+ printk_count prevents infinite recursion
>+ printk_deferred_enter() deffers console handling.
>
>I am not sure if it is worth it. But it might help people (even me)
>when digging into the printk history. Different name will help to
>understand the functionality at the given time.

I 

Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-22 Thread John Ogness
On 2021-03-21, Sergey Senozhatsky  wrote:
>> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
>> args)
>>   * Use the main logbuf even in NMI. But avoid calling console
>>   * drivers that might have their own locks.
>>   */
>> -if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
>> +if (this_cpu_read(printk_context) &
>> +(PRINTK_NMI_DIRECT_CONTEXT_MASK |
>> + PRINTK_NMI_CONTEXT_MASK |
>> + PRINTK_SAFE_CONTEXT_MASK)) {
>
> Do we need printk_nmi_direct_enter/exit() and
> PRINTK_NMI_DIRECT_CONTEXT_MASK?  Seems like all printk_safe() paths
> are now DIRECT - we store messages to the prb, but don't call console
> drivers.

I was planning on waiting until the kthreads are introduced, in which
case printk_safe.c is completely removed. But I suppose I could switch
the 1 printk_nmi_direct_enter() user to printk_nmi_enter() so that
PRINTK_NMI_DIRECT_CONTEXT_MASK can be removed now. I would do this in a
4th patch of the series.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] printk: Use %zu to format size_t

2021-03-17 Thread John Ogness
On 2021-03-17, Geert Uytterhoeven  wrote:
> When compiling for 32-bit:
>
> util_lib/elf_info.c: In function ‘dump_dmesg_lockless’:
> util_lib/elf_info.c:1095:39: warning: format ‘%lu’ expects argument of 
> type ‘long unsigned int’, but argument 3 has type ‘size_t’ {aka ‘unsigned 
> int’} [-Wformat=]
>  1095 |   fprintf(stderr, "Failed to malloc %lu bytes for prb: %s\n",
> | ~~^
> |   |
> |   long unsigned int
> | %u
>  1096 |printk_ringbuffer_sz, strerror(errno));
> |
> ||
> |size_t {aka unsigned int}
> util_lib/elf_info.c:1101:49: warning: format ‘%lu’ expects
> argument of type ‘long unsigned int’, but argument 3 has type ‘size_t’
> {aka ‘unsigned int’} [-Wformat=]
>  1101 |   fprintf(stderr, "Failed to read prb of size %lu bytes: %s\n",
> |   ~~^
> | |
> | long unsigned int
> |   %u
>  1102 |printk_ringbuffer_sz, strerror(errno));
> |
> ||
> |size_t {aka unsigned int}
>
> Indeed, "size_t" is "unsigned int" on 32-bit platforms, and "unsigned
> long" on 64-bit platforms.
>
> Fix this by formatting using "%zu".
>
> Fixes: 4149df9005f2cdd2 ("printk: add support for lockless ringbuffer")
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: John Ogness 

> ---
>  util_lib/elf_info.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
> index 7c0a2c345379a7ca..676926ca8c5f3766 100644
> --- a/util_lib/elf_info.c
> +++ b/util_lib/elf_info.c
> @@ -1092,13 +1092,13 @@ static void dump_dmesg_lockless(int fd, void 
> (*handler)(char*, unsigned int))
>   kaddr = read_file_pointer(fd, vaddr_to_offset(prb_vaddr));
>   m.prb = calloc(1, printk_ringbuffer_sz);
>   if (!m.prb) {
> - fprintf(stderr, "Failed to malloc %lu bytes for prb: %s\n",
> + fprintf(stderr, "Failed to malloc %zu bytes for prb: %s\n",
>   printk_ringbuffer_sz, strerror(errno));
>   exit(64);
>   }
>   ret = pread(fd, m.prb, printk_ringbuffer_sz, vaddr_to_offset(kaddr));
>   if (ret != printk_ringbuffer_sz) {
> - fprintf(stderr, "Failed to read prb of size %lu bytes: %s\n",
> + fprintf(stderr, "Failed to read prb of size %zu bytes: %s\n",
>   printk_ringbuffer_sz, strerror(errno));
>   exit(65);
>   }

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/3] printk: Use ULL suffix for 64-bit constants

2021-03-17 Thread John Ogness
On 2021-03-17, Geert Uytterhoeven  wrote:
> When compiling for 32-bit:
>
> util_lib/elf_info.c: In function ‘get_desc_state’:
> util_lib/elf_info.c:923:31: warning: left shift count >= width of type 
> [-Wshift-count-overflow]
>   923 | #define DESC_FLAGS_MASK  (3UL << DESC_FLAGS_SHIFT)
> |   ^~
> util_lib/elf_info.c:925:25: note: in expansion of macro ‘DESC_FLAGS_MASK’
>   925 | #define DESC_ID_MASK  (~DESC_FLAGS_MASK)
> | ^~~
> util_lib/elf_info.c:926:30: note: in expansion of macro ‘DESC_ID_MASK’
>   926 | #define DESC_ID(sv)  ((sv) & DESC_ID_MASK)
> |  ^~~~
> util_lib/elf_info.c:947:12: note: in expansion of macro ‘DESC_ID’
>   947 |  if (id != DESC_ID(state_val))
> |^~~
> util_lib/elf_info.c: In function ‘id_inc’:
> util_lib/elf_info.c:923:31: warning: left shift count >= width of type 
> [-Wshift-count-overflow]
>   923 | #define DESC_FLAGS_MASK  (3UL << DESC_FLAGS_SHIFT)
> |   ^~
> util_lib/elf_info.c:925:25: note: in expansion of macro ‘DESC_FLAGS_MASK’
>   925 | #define DESC_ID_MASK  (~DESC_FLAGS_MASK)
> | ^~~
> util_lib/elf_info.c:981:15: note: in expansion of macro ‘DESC_ID_MASK’
>   981 |  return (id & DESC_ID_MASK);
> |   ^~~~
>
> Indeed, "unsigned long" constants are 32-bit on 32-bit platforms, and
> 64-bit on 64-bit platforms.
>
> Fix this by using a "ULL" suffix instead.
>
> Fixes: 4149df9005f2cdd2 ("printk: add support for lockless ringbuffer")
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: John Ogness 

> ---
>  util_lib/elf_info.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
> index 2f23a448da08ebdd..7c0a2c345379a7ca 100644
> --- a/util_lib/elf_info.c
> +++ b/util_lib/elf_info.c
> @@ -920,8 +920,8 @@ enum desc_state {
>  
>  #define DESC_SV_BITS (sizeof(uint64_t) * 8)
>  #define DESC_FLAGS_SHIFT (DESC_SV_BITS - 2)
> -#define DESC_FLAGS_MASK  (3UL << DESC_FLAGS_SHIFT)
> -#define DESC_STATE(sv)   (3UL & (sv >> DESC_FLAGS_SHIFT))
> +#define DESC_FLAGS_MASK  (3ULL << DESC_FLAGS_SHIFT)
> +#define DESC_STATE(sv)   (3ULL & (sv >> DESC_FLAGS_SHIFT))
>  #define DESC_ID_MASK (~DESC_FLAGS_MASK)
>  #define DESC_ID(sv)  ((sv) & DESC_ID_MASK)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH next v1 2/3] printk: remove safe buffers

2021-03-16 Thread John Ogness
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console lock is left in place. This is
because the console lock is needed for the actual printing.

Signed-off-by: John Ogness 
---
 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h |  10 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |   2 -
 kernel/printk/printk.c |  81 ++--
 kernel/printk/printk_safe.c| 332 +
 lib/nmi_backtrace.c|   6 -
 9 files changed, 18 insertions(+), 423 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a44a30b0688c..5828c83eaca6 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -171,7 +171,6 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c9a8f4781a10..dc17d8903d4f 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -183,11 +183,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
-   /*
-* printk_safe_flush() seems to require another print
-* before anything actually goes out to console.
-*/
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe7eb2351610..2476796c1150 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -207,8 +207,6 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char 
*fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -272,14 +270,6 @@ static inline void show_regs_print_info(const char 
*log_lvl)
 static inline void dump_stack(void)
 {
 }
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
 #endif
 
 extern int kptr_restrict;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index f04d04d1b855..64bf5d5cdd06 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -977,7 +977,6 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 332736a72a58..1f0df42f8d0c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -247,7 +247,6 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
__crash_kexec(NULL);
 
/*
@@ -271,8 +270,6 @@ void panic(const char *fmt, ...)
 */
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
-   /* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index e7acc2888c8e..e108b2ece8c7 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -23,7 +23,6 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
-void printk_safe_init(void);
 bool printk_percpu_data_ready(void);
 
 #define printk_safe_enter_irqsave(flags)   \
@@ -67,6 +66,5 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
args) { return 0; }
 #define printk_safe_enter_irq() local_irq_disable()
 #define printk_safe_exit_irq() local_irq_enable()
 
-static inline void printk_safe_init(void) { }
 static inline bool printk_percpu_data_ready(void) { return false; }
 #endif /* CONFIG_PRINTK */
diff --git a/kernel/printk/printk.c b

[PATCH next v1 0/3] printk: remove safe buffers

2021-03-16 Thread John Ogness
Hello,

Here is v1 of a series to remove the safe buffers. They are no
longer needed because messages can be stored directly into the
log buffer from any context.

However, the safe buffers also provided a form of recursion
protection. For that reason, explicit recursion protection is
also implemented for this series.

This series falls in line with the printk-rework plan as
presented [0] at Linux Plumbers in Lisbon 2019.

This series is based on next-20210316.

John Ogness

[0] 
https://linuxplumbersconf.org/event/4/contributions/290/attachments/276/463/lpc2019_jogness_printk.pdf
 (slide 23)

John Ogness (3):
  printk: track/limit recursion
  printk: remove safe buffers
  printk: convert @syslog_lock to spin_lock

 arch/powerpc/kernel/traps.c|   1 -
 arch/powerpc/kernel/watchdog.c |   5 -
 include/linux/printk.h |  10 -
 kernel/kexec_core.c|   1 -
 kernel/panic.c |   3 -
 kernel/printk/internal.h   |   2 -
 kernel/printk/printk.c | 171 +
 kernel/printk/printk_safe.c| 332 +
 lib/nmi_backtrace.c|   6 -
 9 files changed, 100 insertions(+), 431 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Issue in dmesg time with lockless ring buffer

2021-01-25 Thread John Ogness
On 2021-01-22, "J. Avila"  wrote:
> When doing some internal testing on a 5.10.4 kernel, we found that the
> time taken for dmesg seemed to increase from the order of milliseconds
> to the order of seconds when the dmesg size approached the ~1.2MB
> limit. After doing some digging, we found that by reverting all of the
> patches in printk/ up to and including
> 896fbe20b4e2333fb55cc9b9b783ebcc49eee7c7 ("use the lockless
> ringbuffer"), we were able to once more see normal dmesg times.
>
> This kernel had no meaningful diffs in the printk/ dir when compared
> to Linus' tree. This behavior was consistently reproducible using the
> following steps:
>
> 1) In one shell, run "time dmesg > /dev/null"
> 2) In another, constantly write to /dev/kmsg
>
> Within ~5 minutes, we saw that dmesg times increased to 1 second, only
> increasing further from there. Is this a known issue?

The last couple days I have tried to reproduce this issue with no
success.

Is your dmesg using /dev/kmsg or syslog() to read the buffer?

Are there any syslog daemons or systemd running? Perhaps you can run
your test within an initrd to see if this effect is still visible?

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: printk: add support for lockless ringbuffer

2020-11-29 Thread John Ogness
On 2020-11-24, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> After looking more closely, I see that your patch is still using the
>> old state flags. With the current version, there is now a value-based
>> state field.
>
> Thank you for pointing it out!  Could you submit a follow-up patch?

I have attached a follow-up patch. It is pretty much the exact same
patch as the one I sent for "crash".

John Ogness

>From 58396867cb3bfd1ca060cf5eb3a910d7f8c192c2 Mon Sep 17 00:00:00 2001
From: John Ogness 
Date: Wed, 25 Nov 2020 10:10:31 +0106
Subject: [PATCH] printk: use committed/finalized state values

The ringbuffer entries use 2 state values (committed and finalized)
rather than a single flag to represent being available for reading.
Copy the definitions and state lookup function directly from the
kernel source and use the new states.

Signed-off-by: John Ogness 
---
 printk.c | 48 +---
 1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/printk.c b/printk.c
index 8e00901..9cecbd1 100644
--- a/printk.c
+++ b/printk.c
@@ -1,12 +1,6 @@
 #include "makedumpfile.h"
 #include 
 
-#define DESC_SV_BITS		(sizeof(unsigned long) * 8)
-#define DESC_COMMITTED_MASK	(1UL << (DESC_SV_BITS - 1))
-#define DESC_REUSE_MASK		(1UL << (DESC_SV_BITS - 2))
-#define DESC_FLAGS_MASK		(DESC_COMMITTED_MASK | DESC_REUSE_MASK)
-#define DESC_ID_MASK		(~DESC_FLAGS_MASK)
-
 /* convenience struct for passing many values to helper functions */
 struct prb_map {
 	char		*prb;
@@ -21,12 +15,51 @@ struct prb_map {
 	char		*text_data;
 };
 
+/*
+ * desc_state and DESC_* definitions taken from kernel source:
+ *
+ * kernel/printk/printk_ringbuffer.h
+ */
+
+/* The possible responses of a descriptor state-query. */
+enum desc_state {
+	desc_miss	=  -1,	/* ID mismatch (pseudo state) */
+	desc_reserved	= 0x0,	/* reserved, in use by writer */
+	desc_committed	= 0x1,	/* committed by writer, could get reopened */
+	desc_finalized	= 0x2,	/* committed, no further modification allowed */
+	desc_reusable	= 0x3,	/* free, not yet used by any writer */
+};
+
+#define DESC_SV_BITS		(sizeof(unsigned long) * 8)
+#define DESC_FLAGS_SHIFT	(DESC_SV_BITS - 2)
+#define DESC_FLAGS_MASK		(3UL << DESC_FLAGS_SHIFT)
+#define DESC_STATE(sv)		(3UL & (sv >> DESC_FLAGS_SHIFT))
+#define DESC_ID_MASK		(~DESC_FLAGS_MASK)
+#define DESC_ID(sv)		((sv) & DESC_ID_MASK)
+
+/*
+ * get_desc_state() taken from kernel source:
+ *
+ * kernel/printk/printk_ringbuffer.c
+ */
+
+/* Query the state of a descriptor. */
+static enum desc_state get_desc_state(unsigned long id,
+  unsigned long state_val)
+{
+	if (id != DESC_ID(state_val))
+		return desc_miss;
+
+	return DESC_STATE(state_val);
+}
+
 static void
 dump_record(struct prb_map *m, unsigned long id)
 {
 	unsigned long long ts_nsec;
 	unsigned long state_var;
 	unsigned short text_len;
+	enum desc_state state;
 	unsigned long begin;
 	unsigned long next;
 	char buf[BUFSIZE];
@@ -45,7 +78,8 @@ dump_record(struct prb_map *m, unsigned long id)
 
 	/* skip non-committed record */
 	state_var = ULONG(desc + OFFSET(prb_desc.state_var) + OFFSET(atomic_long_t.counter));
-	if ((state_var & DESC_FLAGS_MASK) != DESC_COMMITTED_MASK)
+	state = get_desc_state(id, state_var);
+	if (state != desc_committed && state != desc_finalized)
 		return;
 
 	begin = ULONG(desc + OFFSET(prb_desc.text_blk_lpos) + OFFSET(prb_data_blk_lpos.begin)) %
-- 
2.20.1

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] printk: add support for lockless ringbuffer

2020-11-25 Thread John Ogness
Linux 5.10 moved to a new lockless ringbuffer. The new ringbuffer
is structured completely different to the previous iterations.
Add support for retrieving the ringbuffer using vmcoreinfo. The
new ringbuffer is detected based on the availability of the
"prb" symbol.

Signed-off-by: John Ogness 
---
 util_lib/elf_info.c | 438 +++-
 1 file changed, 437 insertions(+), 1 deletion(-)

diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
index 7803a94..2f23a44 100644
--- a/util_lib/elf_info.c
+++ b/util_lib/elf_info.c
@@ -27,6 +27,32 @@ static int num_pt_loads;
 
 static char osrelease[4096];
 
+/* VMCOREINFO symbols for lockless printk ringbuffer */
+static loff_t prb_vaddr;
+static size_t printk_ringbuffer_sz;
+static size_t prb_desc_sz;
+static size_t printk_info_sz;
+static uint64_t printk_ringbuffer_desc_ring_offset;
+static uint64_t printk_ringbuffer_text_data_ring_offset;
+static uint64_t prb_desc_ring_count_bits_offset;
+static uint64_t prb_desc_ring_descs_offset;
+static uint64_t prb_desc_ring_infos_offset;
+static uint64_t prb_data_ring_size_bits_offset;
+static uint64_t prb_data_ring_data_offset;
+static uint64_t prb_desc_ring_head_id_offset;
+static uint64_t prb_desc_ring_tail_id_offset;
+static uint64_t atomic_long_t_counter_offset;
+static uint64_t prb_desc_state_var_offset;
+static uint64_t prb_desc_info_offset;
+static uint64_t prb_desc_text_blk_lpos_offset;
+static uint64_t prb_data_blk_lpos_begin_offset;
+static uint64_t prb_data_blk_lpos_next_offset;
+static uint64_t printk_info_seq_offset;
+static uint64_t printk_info_caller_id_offset;
+static uint64_t printk_info_ts_nsec_offset;
+static uint64_t printk_info_level_offset;
+static uint64_t printk_info_text_len_offset;
+
 static loff_t log_buf_vaddr;
 static loff_t log_end_vaddr;
 static loff_t log_buf_len_vaddr;
@@ -304,6 +330,7 @@ void scan_vmcoreinfo(char *start, size_t size)
size_t len;
loff_t *vaddr;
} symbol[] = {
+   SYMBOL(prb),
SYMBOL(log_buf),
SYMBOL(log_end),
SYMBOL(log_buf_len),
@@ -361,6 +388,119 @@ void scan_vmcoreinfo(char *start, size_t size)
*symbol[i].vaddr = vaddr;
}
 
+   str = "SIZE(printk_ringbuffer)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   printk_ringbuffer_sz = strtoull(pos + strlen(str),
+   NULL, 10);
+
+   str = "SIZE(prb_desc)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   prb_desc_sz = strtoull(pos + strlen(str), NULL, 10);
+
+   str = "SIZE(printk_info)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   printk_info_sz = strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(printk_ringbuffer.desc_ring)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   printk_ringbuffer_desc_ring_offset =
+   strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(printk_ringbuffer.text_data_ring)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   printk_ringbuffer_text_data_ring_offset =
+   strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(prb_desc_ring.count_bits)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   prb_desc_ring_count_bits_offset =
+   strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(prb_desc_ring.descs)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   prb_desc_ring_descs_offset =
+   strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(prb_desc_ring.infos)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   prb_desc_ring_infos_offset =
+   strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(prb_data_ring.size_bits)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   prb_data_ring_size_bits_offset =
+   strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(prb_data_ring.data)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   prb_data_ring_data_offset =
+   strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(prb_desc_ring.head_id)=";
+   if (memcmp(str, pos, strlen(str)) == 0)
+   prb_desc_ring_head_id_offset =
+   strtoull(pos + strlen(str), NULL, 10);
+
+   str = "OFFSET(prb_desc_

RE: [PATCH] makedumpfile: printk: add support for lockless ringbuffer

2020-11-24 Thread John Ogness
Hi Kazu,

On 2020-11-20, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Thank you for confirming and testing.
> I will merge this after a few slight fixes and more tests.

After looking more closely, I see that your patch is still using the old
state flags. With the current version, there is now a value-based state
field. Both state values 1 (committed) and 2 (finalized) are valid for
printing. Should I submit a follow-up patch? Or are these the "slight
fixes" you are referring to?

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: printk: add support for lockless ringbuffer

2020-11-19 Thread John Ogness
On 2020-11-19, HAGIO KAZUHITO(萩尾 一仁) wrote:
> From: John Ogness 
>
> Linux 5.10 introduces a new lockless ringbuffer.  The new ringbuffer
> is structured completely different to the previous iterations.
> Add support for retrieving the ringbuffer from debug information
> and/or using vmcoreinfo.  The new ringbuffer is detected based on
> the availability of the "prb" symbol.
>
> Signed-off-by: John Ogness 
> Signed-off-by: Kazuhito Hagio 
> ---
> I've updated John's RFC makedumpfile patch to match 5.10-rc4 kernel.
> Changes from the RFC patch:
> - followed the following kernel commit
> cfe2790b163a ("printk: move printk_info into separate array")
> - divided members of struct printk_log in offset_table into each structure
>   for readability
> - added some error handlings
> - also dump head record that was missed

I confirm that these changes are correct. Thanks for updating this,
adding the needed error handling, and catching that the head record was
missed!

I tested this by:

1. Boot kernel with: crashkernel=512M

2. Setup and trigger crash:

   kexec -p /boot/bzImage --initrd=/boot/rescue-initrd 
--append="console=ttyS0,115200"
   echo c > /proc/sysrq-trigger

3. From rescue environment, copy crashed vmcore to external machine:

   cp /proc/vmcore /remote/nfs/mount/

4. From external machine, extract kernel log using vmcoreinfo:

   makedumpfile -g ./vmcoreinfo -x ./vmlinux
   makedumpfile --dump-dmesg -i ./vmcoreinfo ./vmcore dmesg1.txt

5. From external machine, extract kernel log using debug symbols:

   makedumpfile --dump-dmesg -x ./vmlinux ./vmcore dmesg2.txt

6. Compare and inspect the kernel logs:

   diff dmesg1.txt dmesg2.txt
   cat dmesg1.txt

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH printk v5 6/6] printk: reimplement log_cont using record extension

2020-09-25 Thread John Ogness
On 2020-09-25, Marek Szyprowski  wrote:
> This patch landed recently in linux-next as commit f5f022e53b87 
> ("printk: reimplement log_cont using record extension"). I've noticed 
> that it causes a regression on my test system (ARM 32bit Samsung Exynos 
> 4412-based Trats2 board). The messages are printed correctly on the 
> serial console during boot, but then when I run 'dmesg' command, the log 
> is truncated.
>
> Here is are the last lines of the dmesg log after this patch:
>
> [    6.649018] Waiting 2 sec before mounting root device...
> [    6.766423] dwc2 1248.hsotg: new device is high-speed
> [    6.845290] dwc2 1248.hsotg: new device is high-speed
> [    6.914217] dwc2 1248.hsotg: new address 51
> [    8.710351] RAMDISK: squashfs filesystem found at block 0
>
> The corresponding dmesg lines before applying this patch:
>
> [    8.864320] RAMDISK: squashfs filesystem found at block 0
> [    8.868410] RAMDISK: Loading 37692KiB [1 disk] into ram disk... /
> [    9.071670] /
> [    9.262498] /
> [    9.540711] /
> [    9.818031] done.

Ah. One of the more creative printk users...
init/do_mounts_rd.c:rd_load_image(). This is a set of LOG_CONT messages
that try to display a rotating line, complete with '\b' control
characters. The code is totally broken, but that is no excuse for printk
to break. It should be easy to reproduce on any architecture. I will
investigate it further. Thanks for reporting.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v4 2/3] printk: move dictionary keys to dev_printk_info

2020-09-21 Thread John Ogness
Dictionaries are only used for SUBSYSTEM and DEVICE properties. The
current implementation stores the property names each time they are
used. This requires more space than otherwise necessary. Also,
because the dictionary entries are currently considered optional,
it cannot be relied upon that they are always available, even if the
writer wanted to store them. These issues will increase should new
dictionary properties be introduced.

Rather than storing the subsystem and device properties in the
dict ring, introduce a struct dev_printk_info with separate fields
to store only the property values. Embed this struct within the
struct printk_info to provide guaranteed availability.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 Sorry. v3 did not include Petr's fixup correctly. @size was wrong.
 Now it is correct.

 Documentation/admin-guide/kdump/gdbmacros.txt |  73 
 drivers/base/core.c   |  46 ++---
 include/linux/dev_printk.h|   8 +
 include/linux/printk.h|   6 +-
 kernel/printk/internal.h  |   4 +-
 kernel/printk/printk.c| 166 +-
 kernel/printk/printk_ringbuffer.h |   3 +
 kernel/printk/printk_safe.c   |   2 +-
 scripts/gdb/linux/dmesg.py|  16 +-
 9 files changed, 164 insertions(+), 160 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 94fabb165abf..82aecdcae8a6 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -172,13 +172,13 @@ end
 
 define dump_record
set var $desc = $arg0
-   if ($argc > 1)
-   set var $prev_flags = $arg1
+   set var $info = $arg1
+   if ($argc > 2)
+   set var $prev_flags = $arg2
else
set var $prev_flags = 0
end
 
-   set var $info = &$desc->info
set var $prefix = 1
set var $newline = 1
 
@@ -237,44 +237,36 @@ define dump_record
 
# handle dictionary data
 
-   set var $begin = $desc->dict_blk_lpos.begin % (1U << 
prb->dict_data_ring.size_bits)
-   set var $next = $desc->dict_blk_lpos.next % (1U << 
prb->dict_data_ring.size_bits)
-
-   # handle data-less record
-   if ($begin & 1)
-   set var $dict_len = 0
-   set var $dict = ""
-   else
-   # handle wrapping data block
-   if ($begin > $next)
-   set var $begin = 0
-   end
-
-   # skip over descriptor id
-   set var $begin = $begin + sizeof(long)
-
-   # handle truncated message
-   if ($next - $begin < $info->dict_len)
-   set var $dict_len = $next - $begin
-   else
-   set var $dict_len = $info->dict_len
+   set var $dict = &$info->dev_info.subsystem[0]
+   set var $dict_len = sizeof($info->dev_info.subsystem)
+   if ($dict[0] != '\0')
+   printf " SUBSYSTEM="
+   set var $idx = 0
+   while ($idx < $dict_len)
+   set var $c = $dict[$idx]
+   if ($c == '\0')
+   loop_break
+   else
+   if ($c < ' ' || $c >= 127 || $c == '\\')
+   printf "\\x%02x", $c
+   else
+   printf "%c", $c
+   end
+   end
+   set var $idx = $idx + 1
end
-
-   set var $dict = >dict_data_ring.data[$begin]
+   printf "\n"
end
 
-   if ($dict_len > 0)
+   set var $dict = &$info->dev_info.device[0]
+   set var $dict_len = sizeof($info->dev_info.device)
+   if ($dict[0] != '\0')
+   printf " DEVICE="
set var $idx = 0
-   set var $line = 1
while ($idx < $dict_len)
-   if ($line)
-   printf " "
-   set var $line = 0
-   end
set var $c = $dict[$idx]
if ($c == '\0')
-   printf "\n"
-   set var $line = 1
+   loop_break
else
if ($c < ' ' || $c >= 127 || $c == '\\')
printf "\\x%02x", $c
@@ -288,10 +280,10 @@ define dump_record
end
 end
 document dump_record
-   Dump a single reco

[PATCH printk v3 2/3] printk: move dictionary keys to dev_printk_info

2020-09-21 Thread John Ogness
Dictionaries are only used for SUBSYSTEM and DEVICE properties. The
current implementation stores the property names each time they are
used. This requires more space than otherwise necessary. Also,
because the dictionary entries are currently considered optional,
it cannot be relied upon that they are always available, even if the
writer wanted to store them. These issues will increase should new
dictionary properties be introduced.

Rather than storing the subsystem and device properties in the
dict ring, introduce a struct dev_printk_info with separate fields
to store only the property values. Embed this struct within the
struct printk_info to provide guaranteed availability.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 Added Petr's fixup for msg_add_dict_text() to include the prefix
 whitespace for dictionary properties. Thanks!

 Documentation/admin-guide/kdump/gdbmacros.txt |  73 
 drivers/base/core.c   |  46 ++---
 include/linux/dev_printk.h|   8 +
 include/linux/printk.h|   6 +-
 kernel/printk/internal.h  |   4 +-
 kernel/printk/printk.c| 166 +-
 kernel/printk/printk_ringbuffer.h |   3 +
 kernel/printk/printk_safe.c   |   2 +-
 scripts/gdb/linux/dmesg.py|  16 +-
 9 files changed, 164 insertions(+), 160 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 94fabb165abf..82aecdcae8a6 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -172,13 +172,13 @@ end
 
 define dump_record
set var $desc = $arg0
-   if ($argc > 1)
-   set var $prev_flags = $arg1
+   set var $info = $arg1
+   if ($argc > 2)
+   set var $prev_flags = $arg2
else
set var $prev_flags = 0
end
 
-   set var $info = &$desc->info
set var $prefix = 1
set var $newline = 1
 
@@ -237,44 +237,36 @@ define dump_record
 
# handle dictionary data
 
-   set var $begin = $desc->dict_blk_lpos.begin % (1U << 
prb->dict_data_ring.size_bits)
-   set var $next = $desc->dict_blk_lpos.next % (1U << 
prb->dict_data_ring.size_bits)
-
-   # handle data-less record
-   if ($begin & 1)
-   set var $dict_len = 0
-   set var $dict = ""
-   else
-   # handle wrapping data block
-   if ($begin > $next)
-   set var $begin = 0
-   end
-
-   # skip over descriptor id
-   set var $begin = $begin + sizeof(long)
-
-   # handle truncated message
-   if ($next - $begin < $info->dict_len)
-   set var $dict_len = $next - $begin
-   else
-   set var $dict_len = $info->dict_len
+   set var $dict = &$info->dev_info.subsystem[0]
+   set var $dict_len = sizeof($info->dev_info.subsystem)
+   if ($dict[0] != '\0')
+   printf " SUBSYSTEM="
+   set var $idx = 0
+   while ($idx < $dict_len)
+   set var $c = $dict[$idx]
+   if ($c == '\0')
+   loop_break
+   else
+   if ($c < ' ' || $c >= 127 || $c == '\\')
+   printf "\\x%02x", $c
+   else
+   printf "%c", $c
+   end
+   end
+   set var $idx = $idx + 1
end
-
-   set var $dict = >dict_data_ring.data[$begin]
+   printf "\n"
end
 
-   if ($dict_len > 0)
+   set var $dict = &$info->dev_info.device[0]
+   set var $dict_len = sizeof($info->dev_info.device)
+   if ($dict[0] != '\0')
+   printf " DEVICE="
set var $idx = 0
-   set var $line = 1
while ($idx < $dict_len)
-   if ($line)
-   printf " "
-   set var $line = 0
-   end
set var $c = $dict[$idx]
if ($c == '\0')
-   printf "\n"
-   set var $line = 1
+   loop_break
else
if ($c < ' ' || $c >= 127 || $c == '\\')
printf "\\x%02x", $c
@@ -288,10 +280,10 @@ define dump_record
end
 end
 document dump_reco

[PATCH printk v2 1/3] printk: move printk_info into separate array

2020-09-18 Thread John Ogness
The majority of the size of a descriptor is taken up by meta data,
which is often not of interest to the ringbuffer (for example,
when performing state checks). Since descriptors are often
temporarily stored on the stack, keeping their size minimal will
help reduce stack pressure.

Rather than embedding the printk_info into the descriptor, create
a separate printk_info array. The index of a descriptor in the
descriptor array corresponds to the printk_info with the same
index in the printk_info array. The rules for validity of a
printk_info match the existing rules for the data blocks: the
descriptor must be in a consistent state.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c|  30 +--
 kernel/printk/printk_ringbuffer.c | 145 +++---
 kernel/printk/printk_ringbuffer.h |  29 +++---
 3 files changed, 133 insertions(+), 71 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9a2e23191576..25cfe4fe48af 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -959,11 +959,11 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
VMCOREINFO_OFFSET(prb_desc_ring, descs);
+   VMCOREINFO_OFFSET(prb_desc_ring, infos);
VMCOREINFO_OFFSET(prb_desc_ring, head_id);
VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
 
VMCOREINFO_STRUCT_SIZE(prb_desc);
-   VMCOREINFO_OFFSET(prb_desc, info);
VMCOREINFO_OFFSET(prb_desc, state_var);
VMCOREINFO_OFFSET(prb_desc, text_blk_lpos);
VMCOREINFO_OFFSET(prb_desc, dict_blk_lpos);
@@ -1097,11 +1097,13 @@ static char setup_dict_buf[CONSOLE_EXT_LOG_MAX] 
__initdata;
 
 void __init setup_log_buf(int early)
 {
+   struct printk_info *new_infos;
unsigned int new_descs_count;
struct prb_desc *new_descs;
struct printk_info info;
struct printk_record r;
size_t new_descs_size;
+   size_t new_infos_size;
unsigned long flags;
char *new_dict_buf;
char *new_log_buf;
@@ -1142,8 +1144,7 @@ void __init setup_log_buf(int early)
if (unlikely(!new_dict_buf)) {
pr_err("log_buf_len: %lu dict bytes not available\n",
   new_log_buf_len);
-   memblock_free(__pa(new_log_buf), new_log_buf_len);
-   return;
+   goto err_free_log_buf;
}
 
new_descs_size = new_descs_count * sizeof(struct prb_desc);
@@ -1151,9 +1152,15 @@ void __init setup_log_buf(int early)
if (unlikely(!new_descs)) {
pr_err("log_buf_len: %zu desc bytes not available\n",
   new_descs_size);
-   memblock_free(__pa(new_dict_buf), new_log_buf_len);
-   memblock_free(__pa(new_log_buf), new_log_buf_len);
-   return;
+   goto err_free_dict_buf;
+   }
+
+   new_infos_size = new_descs_count * sizeof(struct printk_info);
+   new_infos = memblock_alloc(new_infos_size, LOG_ALIGN);
+   if (unlikely(!new_infos)) {
+   pr_err("log_buf_len: %zu info bytes not available\n",
+  new_infos_size);
+   goto err_free_descs;
}
 
prb_rec_init_rd(, ,
@@ -1163,7 +1170,8 @@ void __init setup_log_buf(int early)
prb_init(_rb_dynamic,
 new_log_buf, ilog2(new_log_buf_len),
 new_dict_buf, ilog2(new_log_buf_len),
-new_descs, ilog2(new_descs_count));
+new_descs, ilog2(new_descs_count),
+new_infos);
 
logbuf_lock_irqsave(flags);
 
@@ -1192,6 +1200,14 @@ void __init setup_log_buf(int early)
pr_info("log_buf_len: %u bytes\n", log_buf_len);
pr_info("early log buf free: %u(%u%%)\n",
free, (free * 100) / __LOG_BUF_LEN);
+   return;
+
+err_free_descs:
+   memblock_free(__pa(new_descs), new_descs_size);
+err_free_dict_buf:
+   memblock_free(__pa(new_dict_buf), new_log_buf_len);
+err_free_log_buf:
+   memblock_free(__pa(new_log_buf), new_log_buf_len);
 }
 
 static bool __read_mostly ignore_loglevel;
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index f4e2e9890e0f..de4b10a98623 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -15,10 +15,10 @@
  * The printk_ringbuffer is made up of 3 internal ringbuffers:
  *
  *   desc_ring
- * A ring of descriptors. A descriptor contains all record meta data
- * (sequence number, timestamp, loglevel, etc.) as well as internal state
- * information about the record and logical positions specifying where in
- * the other ringbuffers the text and dictionary strings are located.
+ * A ring of descriptors and their meta data (such as sequence number,
+ * timestamp, loglevel, etc.) as well as internal state informat

[PATCH printk v2 3/3] printk: remove dict ring

2020-09-18 Thread John Ogness
Since there is no code that will ever store anything into the dict
ring, remove it. If any future dictionary properties are to be
added, these should be added to the struct printk_info.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c|  46 +++--
 kernel/printk/printk_ringbuffer.c | 155 +++---
 kernel/printk/printk_ringbuffer.h |  63 +++-
 3 files changed, 64 insertions(+), 200 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 269f0abd1ddf..77660354a7c5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -427,7 +427,6 @@ static u32 log_buf_len = __LOG_BUF_LEN;
  * Define the average message size. This only affects the number of
  * descriptors that will be available. Underestimating is better than
  * overestimating (too many available descriptors is better than not enough).
- * The dictionary buffer will be the same size as the text buffer.
  */
 #define PRB_AVGBITS 5  /* 32 character average length */
 
@@ -435,7 +434,7 @@ static u32 log_buf_len = __LOG_BUF_LEN;
 #error CONFIG_LOG_BUF_SHIFT value too small.
 #endif
 _DEFINE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS,
-PRB_AVGBITS, PRB_AVGBITS, &__log_buf[0]);
+PRB_AVGBITS, &__log_buf[0]);
 
 static struct printk_ringbuffer printk_rb_dynamic;
 
@@ -502,12 +501,12 @@ static int log_store(u32 caller_id, int facility, int 
level,
struct printk_record r;
u16 trunc_msg_len = 0;
 
-   prb_rec_init_wr(, text_len, 0);
+   prb_rec_init_wr(, text_len);
 
if (!prb_reserve(, prb, )) {
/* truncate the message if it is too long for empty buffer */
truncate_msg(_len, _msg_len);
-   prb_rec_init_wr(, text_len + trunc_msg_len, 0);
+   prb_rec_init_wr(, text_len + trunc_msg_len);
/* survive when the log buffer is too small for trunc_msg */
if (!prb_reserve(, prb, ))
return 0;
@@ -897,8 +896,7 @@ static int devkmsg_open(struct inode *inode, struct file 
*file)
mutex_init(>lock);
 
prb_rec_init_rd(>record, >info,
-   >text_buf[0], sizeof(user->text_buf),
-   NULL, 0);
+   >text_buf[0], sizeof(user->text_buf));
 
logbuf_lock_irq();
user->seq = prb_first_valid_seq(prb);
@@ -956,7 +954,6 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_STRUCT_SIZE(printk_ringbuffer);
VMCOREINFO_OFFSET(printk_ringbuffer, desc_ring);
VMCOREINFO_OFFSET(printk_ringbuffer, text_data_ring);
-   VMCOREINFO_OFFSET(printk_ringbuffer, dict_data_ring);
VMCOREINFO_OFFSET(printk_ringbuffer, fail);
 
VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
@@ -969,7 +966,6 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_STRUCT_SIZE(prb_desc);
VMCOREINFO_OFFSET(prb_desc, state_var);
VMCOREINFO_OFFSET(prb_desc, text_blk_lpos);
-   VMCOREINFO_OFFSET(prb_desc, dict_blk_lpos);
 
VMCOREINFO_STRUCT_SIZE(prb_data_blk_lpos);
VMCOREINFO_OFFSET(prb_data_blk_lpos, begin);
@@ -979,7 +975,6 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_OFFSET(printk_info, seq);
VMCOREINFO_OFFSET(printk_info, ts_nsec);
VMCOREINFO_OFFSET(printk_info, text_len);
-   VMCOREINFO_OFFSET(printk_info, dict_len);
VMCOREINFO_OFFSET(printk_info, caller_id);
VMCOREINFO_OFFSET(printk_info, dev_info);
 
@@ -1080,7 +1075,7 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
struct prb_reserved_entry e;
struct printk_record dest_r;
 
-   prb_rec_init_wr(_r, r->info->text_len, 0);
+   prb_rec_init_wr(_r, r->info->text_len);
 
if (!prb_reserve(, rb, _r))
return 0;
@@ -,7 +1106,6 @@ void __init setup_log_buf(int early)
size_t new_descs_size;
size_t new_infos_size;
unsigned long flags;
-   char *new_dict_buf;
char *new_log_buf;
unsigned int free;
u64 seq;
@@ -1146,19 +1140,12 @@ void __init setup_log_buf(int early)
return;
}
 
-   new_dict_buf = memblock_alloc(new_log_buf_len, LOG_ALIGN);
-   if (unlikely(!new_dict_buf)) {
-   pr_err("log_buf_len: %lu dict bytes not available\n",
-  new_log_buf_len);
-   goto err_free_log_buf;
-   }
-
new_descs_size = new_descs_count * sizeof(struct prb_desc);
new_descs = memblock_alloc(new_descs_size, LOG_ALIGN);
if (unlikely(!new_descs)) {
pr_err("log_buf_len: %zu desc bytes not available\n",
   new_descs_size);
-   goto err_free_dict_buf;
+   goto err_free_log_buf;
}
 
new_infos_size = new_descs_count * sizeof(struct printk_info);
@@ -1169,13 +1156

[PATCH printk v2 2/3] printk: move dictionary keys to dev_printk_info

2020-09-18 Thread John Ogness
Dictionaries are only used for SUBSYSTEM and DEVICE properties. The
current implementation stores the property names each time they are
used. This requires more space than otherwise necessary. Also,
because the dictionary entries are currently considered optional,
it cannot be relied upon that they are always available, even if the
writer wanted to store them. These issues will increase should new
dictionary properties be introduced.

Rather than storing the subsystem and device properties in the
dict ring, introduce a struct dev_printk_info with separate fields
to store only the property values. Embed this struct within the
struct printk_info to provide guaranteed availability.

Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt |  73 
 drivers/base/core.c   |  46 ++---
 include/linux/dev_printk.h|   8 +
 include/linux/printk.h|   6 +-
 kernel/printk/internal.h  |   4 +-
 kernel/printk/printk.c| 165 +-
 kernel/printk/printk_ringbuffer.h |   3 +
 kernel/printk/printk_safe.c   |   2 +-
 scripts/gdb/linux/dmesg.py|  16 +-
 9 files changed, 163 insertions(+), 160 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 94fabb165abf..82aecdcae8a6 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -172,13 +172,13 @@ end
 
 define dump_record
set var $desc = $arg0
-   if ($argc > 1)
-   set var $prev_flags = $arg1
+   set var $info = $arg1
+   if ($argc > 2)
+   set var $prev_flags = $arg2
else
set var $prev_flags = 0
end
 
-   set var $info = &$desc->info
set var $prefix = 1
set var $newline = 1
 
@@ -237,44 +237,36 @@ define dump_record
 
# handle dictionary data
 
-   set var $begin = $desc->dict_blk_lpos.begin % (1U << 
prb->dict_data_ring.size_bits)
-   set var $next = $desc->dict_blk_lpos.next % (1U << 
prb->dict_data_ring.size_bits)
-
-   # handle data-less record
-   if ($begin & 1)
-   set var $dict_len = 0
-   set var $dict = ""
-   else
-   # handle wrapping data block
-   if ($begin > $next)
-   set var $begin = 0
-   end
-
-   # skip over descriptor id
-   set var $begin = $begin + sizeof(long)
-
-   # handle truncated message
-   if ($next - $begin < $info->dict_len)
-   set var $dict_len = $next - $begin
-   else
-   set var $dict_len = $info->dict_len
+   set var $dict = &$info->dev_info.subsystem[0]
+   set var $dict_len = sizeof($info->dev_info.subsystem)
+   if ($dict[0] != '\0')
+   printf " SUBSYSTEM="
+   set var $idx = 0
+   while ($idx < $dict_len)
+   set var $c = $dict[$idx]
+   if ($c == '\0')
+   loop_break
+   else
+   if ($c < ' ' || $c >= 127 || $c == '\\')
+   printf "\\x%02x", $c
+   else
+   printf "%c", $c
+   end
+   end
+   set var $idx = $idx + 1
end
-
-   set var $dict = >dict_data_ring.data[$begin]
+   printf "\n"
end
 
-   if ($dict_len > 0)
+   set var $dict = &$info->dev_info.device[0]
+   set var $dict_len = sizeof($info->dev_info.device)
+   if ($dict[0] != '\0')
+   printf " DEVICE="
set var $idx = 0
-   set var $line = 1
while ($idx < $dict_len)
-   if ($line)
-   printf " "
-   set var $line = 0
-   end
set var $c = $dict[$idx]
if ($c == '\0')
-   printf "\n"
-   set var $line = 1
+   loop_break
else
if ($c < ' ' || $c >= 127 || $c == '\\')
printf "\\x%02x", $c
@@ -288,10 +280,10 @@ define dump_record
end
 end
 document dump_record
-   Dump a single record. The first parameter is the descriptor
-   sequence number, the second is optional and specifies the
-   p

[PATCH printk v2 0/3] printk: move dictionaries to meta data

2020-09-18 Thread John Ogness
Hello,

Here is v2 for a series to move all existing dictionary
properties (SUBSYSTEM and DEVICE) into the meta data of a
record, thus eliminating the need for the dict ring. This
change affects how the dictionaries are stored, but does not
affect how they are presented to userspace. (v1 is here [0]).

The main purpose of the change is to address concerns [1]
about the reliability of dictionary properties as well as
allowing to efficiently expand the type and amount of
meta data available [2].

This series is based heavily on the proof of concept [3] from
Petr Mladek. (Petr, feel free to add Co-developed-by tags.)

The series is based on the printk-rework branch of the printk
git tree:

f5f022e53b87 ("printk: reimplement log_cont using record extension")

The list of changes since v1:

drivers/base/core.c
===

- set_dev_info(): use strscpy() instead of snprintf() (thank
  you Rasmus Villemoes)

kernel/printk/printk.c
==

- setup_log_buf(): fix cleanup in error handling

- log_buf_vmcoreinfo_setup(): add VMCOREINFO for
  struct dev_printk_info array sizes so that crash tools
  do not need to rely on property value termination

John Ogness

[0] https://lkml.kernel.org/r/20200917131644.25838-1-john.ogn...@linutronix.de
[1] https://lkml.kernel.org/r/20200904151336.GC20558@alley
[2] https://lkml.kernel.org/r/008801d684f9$43e1c140$cba543c0$@samsung.com
[3] https://lkml.kernel.org/r/20200911095035.GI3864@alley

John Ogness (3):
  printk: move printk_info into separate array
  printk: move dictionary keys to dev_printk_info
  printk: remove dict ring

 Documentation/admin-guide/kdump/gdbmacros.txt |  73 ++---
 drivers/base/core.c   |  46 +--
 include/linux/dev_printk.h|   8 +
 include/linux/printk.h|   6 +-
 kernel/printk/internal.h  |   4 +-
 kernel/printk/printk.c| 221 ++---
 kernel/printk/printk_ringbuffer.c | 292 --
 kernel/printk/printk_ringbuffer.h |  95 ++
 kernel/printk/printk_safe.c   |   2 +-
 scripts/gdb/linux/dmesg.py|  16 +-
 10 files changed, 346 insertions(+), 417 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH printk 1/3] printk: move printk_info into separate array

2020-09-18 Thread John Ogness
On 2020-09-18, Petr Mladek  wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1097,6 +1097,7 @@ static char setup_dict_buf[CONSOLE_EXT_LOG_MAX] 
>> __initdata;
>>  
>>  void __init setup_log_buf(int early)
>>  {
>> +struct printk_info *new_infos;
>>  unsigned int new_descs_count;
>>  struct prb_desc *new_descs;
>>  struct printk_info info;
>> @@ -1156,6 +1157,17 @@ void __init setup_log_buf(int early)
>>  return;
>>  }
>>  
>> +new_descs_size = new_descs_count * sizeof(struct printk_info);
>
> Must be stored into new variable, e.g.  new_infos_size.=

Ack.

>> +new_infos = memblock_alloc(new_descs_size, LOG_ALIGN);
>> +if (unlikely(!new_infos)) {
>> +pr_err("log_buf_len: %zu info bytes not available\n",
>> +   new_descs_size);
>> +memblock_free(__pa(new_descs), new_log_buf_len);
>> +memblock_free(__pa(new_dict_buf), new_log_buf_len);
>
> The above two calls have wrong size.
>
> The same problem is there also in the error path when new_descs
> allocation fail. It might be better to handle this using some
> goto err_* tagrets.
>
> Please, fix the old problem in a separate patch.

The "old problem" didn't exist. The problem is introduced with this
series. I will fix it with appropriate goto err_* targets for v2.

>> --- a/kernel/printk/printk_ringbuffer.c
>> +++ b/kernel/printk/printk_ringbuffer.c
>> @@ -1726,12 +1762,12 @@ static bool copy_data(struct prb_data_ring 
>> *data_ring,
>>  /*
>>   * Actual cannot be less than expected. It can be more than expected
>>   * because of the trailing alignment padding.
>> + *
>> + * Note that invalid @len values can occur because the caller loads
>> + * the value during an allowed data race.
>
> I hope that this will not bite us in the future. The fact is that
> copying the entire struct printk_info in get_desc() is ugly and
> copy_data() has to be careful anyway.

It isn't an issue because the state is verified again at the end of
prb_read(). I added the comment because if all you are looking at is
copy_data(), you may not know that @len was read on a data-race. Whereas
inside of prb_read(), it is obvious that the memcpy() is a data-race.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk 3/3] printk: remove dict ring

2020-09-17 Thread John Ogness
Since there is no code that will ever store anything into the dict
ring, remove it. If any future dictionary properties are to be
added, these should be added to the struct printk_info.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c|  45 +++--
 kernel/printk/printk_ringbuffer.c | 155 +++---
 kernel/printk/printk_ringbuffer.h |  63 +++-
 3 files changed, 63 insertions(+), 200 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b2e2bdd37028..107c09744026 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -427,7 +427,6 @@ static u32 log_buf_len = __LOG_BUF_LEN;
  * Define the average message size. This only affects the number of
  * descriptors that will be available. Underestimating is better than
  * overestimating (too many available descriptors is better than not enough).
- * The dictionary buffer will be the same size as the text buffer.
  */
 #define PRB_AVGBITS 5  /* 32 character average length */
 
@@ -435,7 +434,7 @@ static u32 log_buf_len = __LOG_BUF_LEN;
 #error CONFIG_LOG_BUF_SHIFT value too small.
 #endif
 _DEFINE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS,
-PRB_AVGBITS, PRB_AVGBITS, &__log_buf[0]);
+PRB_AVGBITS, &__log_buf[0]);
 
 static struct printk_ringbuffer printk_rb_dynamic;
 
@@ -502,12 +501,12 @@ static int log_store(u32 caller_id, int facility, int 
level,
struct printk_record r;
u16 trunc_msg_len = 0;
 
-   prb_rec_init_wr(, text_len, 0);
+   prb_rec_init_wr(, text_len);
 
if (!prb_reserve(, prb, )) {
/* truncate the message if it is too long for empty buffer */
truncate_msg(_len, _msg_len);
-   prb_rec_init_wr(, text_len + trunc_msg_len, 0);
+   prb_rec_init_wr(, text_len + trunc_msg_len);
/* survive when the log buffer is too small for trunc_msg */
if (!prb_reserve(, prb, ))
return 0;
@@ -897,8 +896,7 @@ static int devkmsg_open(struct inode *inode, struct file 
*file)
mutex_init(>lock);
 
prb_rec_init_rd(>record, >info,
-   >text_buf[0], sizeof(user->text_buf),
-   NULL, 0);
+   >text_buf[0], sizeof(user->text_buf));
 
logbuf_lock_irq();
user->seq = prb_first_valid_seq(prb);
@@ -954,7 +952,6 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_STRUCT_SIZE(printk_ringbuffer);
VMCOREINFO_OFFSET(printk_ringbuffer, desc_ring);
VMCOREINFO_OFFSET(printk_ringbuffer, text_data_ring);
-   VMCOREINFO_OFFSET(printk_ringbuffer, dict_data_ring);
VMCOREINFO_OFFSET(printk_ringbuffer, fail);
 
VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
@@ -967,7 +964,6 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_STRUCT_SIZE(prb_desc);
VMCOREINFO_OFFSET(prb_desc, state_var);
VMCOREINFO_OFFSET(prb_desc, text_blk_lpos);
-   VMCOREINFO_OFFSET(prb_desc, dict_blk_lpos);
 
VMCOREINFO_STRUCT_SIZE(prb_data_blk_lpos);
VMCOREINFO_OFFSET(prb_data_blk_lpos, begin);
@@ -977,7 +973,6 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_OFFSET(printk_info, seq);
VMCOREINFO_OFFSET(printk_info, ts_nsec);
VMCOREINFO_OFFSET(printk_info, text_len);
-   VMCOREINFO_OFFSET(printk_info, dict_len);
VMCOREINFO_OFFSET(printk_info, caller_id);
VMCOREINFO_OFFSET(printk_info, dev_info);
 
@@ -1076,7 +1071,7 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
struct prb_reserved_entry e;
struct printk_record dest_r;
 
-   prb_rec_init_wr(_r, r->info->text_len, 0);
+   prb_rec_init_wr(_r, r->info->text_len);
 
if (!prb_reserve(, rb, _r))
return 0;
@@ -1106,7 +1101,6 @@ void __init setup_log_buf(int early)
struct printk_record r;
size_t new_descs_size;
unsigned long flags;
-   char *new_dict_buf;
char *new_log_buf;
unsigned int free;
u64 seq;
@@ -1141,20 +1135,11 @@ void __init setup_log_buf(int early)
return;
}
 
-   new_dict_buf = memblock_alloc(new_log_buf_len, LOG_ALIGN);
-   if (unlikely(!new_dict_buf)) {
-   pr_err("log_buf_len: %lu dict bytes not available\n",
-  new_log_buf_len);
-   memblock_free(__pa(new_log_buf), new_log_buf_len);
-   return;
-   }
-
new_descs_size = new_descs_count * sizeof(struct prb_desc);
new_descs = memblock_alloc(new_descs_size, LOG_ALIGN);
if (unlikely(!new_descs)) {
pr_err("log_buf_len: %zu desc bytes not available\n",
   new_descs_size);
-   memblock_free(__pa(new_dict_buf), new_log_buf_len);
memblock_free(__pa(new_log_buf), new_log_buf_len)

[PATCH printk 2/3] printk: move dictionary keys to dev_printk_info

2020-09-17 Thread John Ogness
Dictionaries are only used for SUBSYSTEM and DEVICE properties. The
current implementation stores the property names each time they are
used. This requires more space than otherwise necessary. Also,
because the dictionary entries are currently considered optional,
it cannot be relied upon that they are always available, even if the
writer wanted to store them. These issues will increase should new
dictionary properties be introduced.

Rather than storing the subsystem and device properties in the
dict ring, introduce a struct dev_printk_info with separate fields
to store only the property values. Embed this struct within the
struct printk_info to provide guaranteed availability.

Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt |  73 
 drivers/base/core.c   |  46 ++---
 include/linux/dev_printk.h|   8 +
 include/linux/printk.h|   6 +-
 kernel/printk/internal.h  |   4 +-
 kernel/printk/printk.c| 161 +-
 kernel/printk/printk_ringbuffer.h |   3 +
 kernel/printk/printk_safe.c   |   2 +-
 scripts/gdb/linux/dmesg.py|  16 +-
 9 files changed, 159 insertions(+), 160 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 94fabb165abf..82aecdcae8a6 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -172,13 +172,13 @@ end
 
 define dump_record
set var $desc = $arg0
-   if ($argc > 1)
-   set var $prev_flags = $arg1
+   set var $info = $arg1
+   if ($argc > 2)
+   set var $prev_flags = $arg2
else
set var $prev_flags = 0
end
 
-   set var $info = &$desc->info
set var $prefix = 1
set var $newline = 1
 
@@ -237,44 +237,36 @@ define dump_record
 
# handle dictionary data
 
-   set var $begin = $desc->dict_blk_lpos.begin % (1U << 
prb->dict_data_ring.size_bits)
-   set var $next = $desc->dict_blk_lpos.next % (1U << 
prb->dict_data_ring.size_bits)
-
-   # handle data-less record
-   if ($begin & 1)
-   set var $dict_len = 0
-   set var $dict = ""
-   else
-   # handle wrapping data block
-   if ($begin > $next)
-   set var $begin = 0
-   end
-
-   # skip over descriptor id
-   set var $begin = $begin + sizeof(long)
-
-   # handle truncated message
-   if ($next - $begin < $info->dict_len)
-   set var $dict_len = $next - $begin
-   else
-   set var $dict_len = $info->dict_len
+   set var $dict = &$info->dev_info.subsystem[0]
+   set var $dict_len = sizeof($info->dev_info.subsystem)
+   if ($dict[0] != '\0')
+   printf " SUBSYSTEM="
+   set var $idx = 0
+   while ($idx < $dict_len)
+   set var $c = $dict[$idx]
+   if ($c == '\0')
+   loop_break
+   else
+   if ($c < ' ' || $c >= 127 || $c == '\\')
+   printf "\\x%02x", $c
+   else
+   printf "%c", $c
+   end
+   end
+   set var $idx = $idx + 1
end
-
-   set var $dict = >dict_data_ring.data[$begin]
+   printf "\n"
end
 
-   if ($dict_len > 0)
+   set var $dict = &$info->dev_info.device[0]
+   set var $dict_len = sizeof($info->dev_info.device)
+   if ($dict[0] != '\0')
+   printf " DEVICE="
set var $idx = 0
-   set var $line = 1
while ($idx < $dict_len)
-   if ($line)
-   printf " "
-   set var $line = 0
-   end
set var $c = $dict[$idx]
if ($c == '\0')
-   printf "\n"
-   set var $line = 1
+   loop_break
else
if ($c < ' ' || $c >= 127 || $c == '\\')
printf "\\x%02x", $c
@@ -288,10 +280,10 @@ define dump_record
end
 end
 document dump_record
-   Dump a single record. The first parameter is the descriptor
-   sequence number, the second is optional and specifies the
-   p

[PATCH printk 0/3] printk: move dictionaries to meta data

2020-09-17 Thread John Ogness
Hello,

Here is a series to move dictionary properties (currently only
SUBSYSTEM and DEVICE exist) into the meta data of a record,
thus eliminating the need for the dict ring. This change
affects how the dictionaries are stored, but does not affect
how they are presented to userspace.

The main purpose of the change is to address concerns [0]
about the reliability of dictionary properties as well as
allowing to efficiently expand the type and number of
properties available [1].

This series is based heavily on the proof of concept [2] from
Petr Mladek. (Petr, feel free to add Co-developed-by tags.)

The series is based on the printk-rework branch of the printk git   
   
tree:   
   

   
f5f022e53b87 ("printk: reimplement log_cont using record extension")

John Ogness

[0] https://lkml.kernel.org/r/20200904151336.GC20558@alley
[1] https://lkml.kernel.org/r/008801d684f9$43e1c140$cba543c0$@samsung.com
[2] https://lkml.kernel.org/r/20200911095035.GI3864@alley

John Ogness (3):
  printk: move printk_info into separate array
  printk: move dictionary keys to dev_printk_info
  printk: remove dict ring

 Documentation/admin-guide/kdump/gdbmacros.txt |  73 ++---
 drivers/base/core.c   |  46 +--
 include/linux/dev_printk.h|   8 +
 include/linux/printk.h|   6 +-
 kernel/printk/internal.h  |   4 +-
 kernel/printk/printk.c| 209 ++---
 kernel/printk/printk_ringbuffer.c | 292 --
 kernel/printk/printk_ringbuffer.h |  95 ++
 kernel/printk/printk_safe.c   |   2 +-
 scripts/gdb/linux/dmesg.py|  16 +-
 10 files changed, 336 insertions(+), 415 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk 1/3] printk: move printk_info into separate array

2020-09-17 Thread John Ogness
The majority of the size of a descriptor is taken up by meta data,
which is often not of interest to the ringbuffer (for example,
when performing state checks). Since descriptors are often
temporarily stored on the stack, keeping their size minimal will
help reduce stack pressure.

Rather than embedding the printk_info into the descriptor, create
a separate printk_info array. The index of a descriptor in the
descriptor array corresponds to the printk_info with the same
index in the printk_info array. The rules for validity of a
printk_info match the existing rules for the data blocks: the
descriptor must be in a consistent state.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c|  17 +++-
 kernel/printk/printk_ringbuffer.c | 145 +++---
 kernel/printk/printk_ringbuffer.h |  29 +++---
 3 files changed, 125 insertions(+), 66 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9a2e23191576..7ad45d897277 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -959,11 +959,11 @@ void log_buf_vmcoreinfo_setup(void)
VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
VMCOREINFO_OFFSET(prb_desc_ring, descs);
+   VMCOREINFO_OFFSET(prb_desc_ring, infos);
VMCOREINFO_OFFSET(prb_desc_ring, head_id);
VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
 
VMCOREINFO_STRUCT_SIZE(prb_desc);
-   VMCOREINFO_OFFSET(prb_desc, info);
VMCOREINFO_OFFSET(prb_desc, state_var);
VMCOREINFO_OFFSET(prb_desc, text_blk_lpos);
VMCOREINFO_OFFSET(prb_desc, dict_blk_lpos);
@@ -1097,6 +1097,7 @@ static char setup_dict_buf[CONSOLE_EXT_LOG_MAX] 
__initdata;
 
 void __init setup_log_buf(int early)
 {
+   struct printk_info *new_infos;
unsigned int new_descs_count;
struct prb_desc *new_descs;
struct printk_info info;
@@ -1156,6 +1157,17 @@ void __init setup_log_buf(int early)
return;
}
 
+   new_descs_size = new_descs_count * sizeof(struct printk_info);
+   new_infos = memblock_alloc(new_descs_size, LOG_ALIGN);
+   if (unlikely(!new_infos)) {
+   pr_err("log_buf_len: %zu info bytes not available\n",
+  new_descs_size);
+   memblock_free(__pa(new_descs), new_log_buf_len);
+   memblock_free(__pa(new_dict_buf), new_log_buf_len);
+   memblock_free(__pa(new_log_buf), new_log_buf_len);
+   return;
+   }
+
prb_rec_init_rd(, ,
_text_buf[0], sizeof(setup_text_buf),
_dict_buf[0], sizeof(setup_dict_buf));
@@ -1163,7 +1175,8 @@ void __init setup_log_buf(int early)
prb_init(_rb_dynamic,
 new_log_buf, ilog2(new_log_buf_len),
 new_dict_buf, ilog2(new_log_buf_len),
-new_descs, ilog2(new_descs_count));
+new_descs, ilog2(new_descs_count),
+new_infos);
 
logbuf_lock_irqsave(flags);
 
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index f4e2e9890e0f..de4b10a98623 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -15,10 +15,10 @@
  * The printk_ringbuffer is made up of 3 internal ringbuffers:
  *
  *   desc_ring
- * A ring of descriptors. A descriptor contains all record meta data
- * (sequence number, timestamp, loglevel, etc.) as well as internal state
- * information about the record and logical positions specifying where in
- * the other ringbuffers the text and dictionary strings are located.
+ * A ring of descriptors and their meta data (such as sequence number,
+ * timestamp, loglevel, etc.) as well as internal state information about
+ * the record and logical positions specifying where in the other
+ * ringbuffers the text and dictionary strings are located.
  *
  *   text_data_ring
  * A ring of data blocks. A data block consists of an unsigned long
@@ -38,13 +38,14 @@
  *
  * Descriptor Ring
  * ~~~
- * The descriptor ring is an array of descriptors. A descriptor contains all
- * the meta data of a printk record as well as blk_lpos structs pointing to
- * associated text and dictionary data blocks (see "Data Rings" below). Each
- * descriptor is assigned an ID that maps directly to index values of the
- * descriptor array and has a state. The ID and the state are bitwise combined
- * into a single descriptor field named @state_var, allowing ID and state to
- * be synchronously and atomically updated.
+ * The descriptor ring is an array of descriptors. A descriptor contains
+ * essential meta data to track the data of a printk record using
+ * blk_lpos structs pointing to associated text and dictionary data blocks
+ * (see "Data Rings" below). Each descriptor is assigned an ID that maps
+ * directly to index values of the de

[PATCH printk v5 0/6] printk: reimplement LOG_CONT handling

2020-09-14 Thread John Ogness
Hello,

Here is v5 for the second series to rework the printk subsystem.
(The v4 is here [0].) This series implements a new ringbuffer
feature that allows the last record to be extended. Petr Mladek
provided the initial proof of concept [1] for this.

Using the record extension feature, LOG_CONT is re-implemented
in a way that exactly preserves its behavior, but avoids the
need for an extra buffer. In particular, it avoids the need for
any synchronization that such a buffer requires.

This series deviates from the agreements [2] made at the meeting
during LPC2019 in Lisbon. The test results of the v1 series,
which implemented LOG_CONT as agreed upon, showed that the
effects on existing userspace tools using /dev/kmsg (journalctl,
dmesg) were not acceptable [3].

Patch 5 introduces *four* new memory barrier pairs. Two of them
are insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose and context of
the two existing memory barrier pairs they provide an alternate
path for. The other two new memory barrier pairs are significant
additions:

desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
ensure the state transitions back to desc_reserved before
fully trusting the descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
ensure the state transitions to desc_committed before checking
the head ID to see if the descriptor needs to be finalized.

The test module used to test the ringbuffer is available
here [4].

The series is based on the printk-rework branch of the printk git
tree:

e60768311af8 ("scripts/gdb: update for lockless printk ringbuffer")

The list of changes since v4:

printk_ringbuffer
=

- desc_read(): revert setting @state_var when inconsistent (a
  separate series [5] is addressing this bug)

- desc_reserve(): use DESC_SV() when setting reserved

- data_realloc(): also do nothing if the size is the same

- prb_reserve_in_last(): adjust dataless checks/warnings to match
  the non-dataless case

- prb_reserve_in_last(): fix length modifier in warnings

- change comments about "state flags" to just talk about "states"

John Ogness

[0] https://lkml.kernel.org/r/20200908202859.2736-1-john.ogn...@linutronix.de
[1] https://lkml.kernel.org/r/20200812163908.GH12903@alley
[2] https://lkml.kernel.org/r/87k1acz5rx@linutronix.de
[3] https://lkml.kernel.org/r/20200811160551.GC12903@alley
[4] https://github.com/Linutronix/prb-test.git
[5] https://lkml.kernel.org/r/20200914094803.27365-1-john.ogn...@linutronix.de

John Ogness (6):
  printk: ringbuffer: relocate get_data()
  printk: ringbuffer: add BLK_DATALESS() macro
  printk: ringbuffer: clear initial reserved fields
  printk: ringbuffer: change representation of states
  printk: ringbuffer: add finalization/extension support
  printk: reimplement log_cont using record extension

 Documentation/admin-guide/kdump/gdbmacros.txt |  13 +-
 kernel/printk/printk.c| 110 +--
 kernel/printk/printk_ringbuffer.c | 683 ++
 kernel/printk/printk_ringbuffer.h |  35 +-
 scripts/gdb/linux/dmesg.py|  12 +-
 5 files changed, 615 insertions(+), 238 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v5 4/6] printk: ringbuffer: change representation of states

2020-09-14 Thread John Ogness
Rather than deriving the state by evaluating bits within the flags
area of the state variable, assign the states explicit values and
set those values in the flags area. Introduce macros to make it
simple to read and write state values for the state variable.

Although the functionality is preserved, the binary representation
for the states is changed.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 Documentation/admin-guide/kdump/gdbmacros.txt | 12 ---
 kernel/printk/printk_ringbuffer.c | 28 +
 kernel/printk/printk_ringbuffer.h | 31 ---
 scripts/gdb/linux/dmesg.py| 11 ---
 4 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 7adece30237e..8f533b751c46 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -295,9 +295,12 @@ document dump_record
 end
 
 define dmesg
-   set var $desc_committed = 1UL << ((sizeof(long) * 8) - 1)
-   set var $flags_mask = 3UL << ((sizeof(long) * 8) - 2)
-   set var $id_mask = ~$flags_mask
+   # definitions from kernel/printk/printk_ringbuffer.h
+   set var $desc_committed = 1
+   set var $desc_sv_bits = sizeof(long) * 8
+   set var $desc_flags_shift = $desc_sv_bits - 2
+   set var $desc_flags_mask = 3 << $desc_flags_shift
+   set var $id_mask = ~$desc_flags_mask
 
set var $desc_count = 1U << prb->desc_ring.count_bits
set var $prev_flags = 0
@@ -309,7 +312,8 @@ define dmesg
set var $desc = >desc_ring.descs[$id % $desc_count]
 
# skip non-committed record
-   if (($desc->state_var.counter & $flags_mask) == $desc_committed)
+   set var $state = 3 & ($desc->state_var.counter >> 
$desc_flags_shift)
+   if ($state == $desc_committed)
dump_record $desc $prev_flags
set var $prev_flags = $desc->info.flags
end
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 82347abb22a5..911fbe150e9a 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -348,14 +348,6 @@ static bool data_check_size(struct prb_data_ring 
*data_ring, unsigned int size)
return true;
 }
 
-/* The possible responses of a descriptor state-query. */
-enum desc_state {
-   desc_miss,  /* ID mismatch */
-   desc_reserved,  /* reserved, in use by writer */
-   desc_committed, /* committed, writer is done */
-   desc_reusable,  /* free, not yet used by any writer */
-};
-
 /* Query the state of a descriptor. */
 static enum desc_state get_desc_state(unsigned long id,
  unsigned long state_val)
@@ -363,13 +355,7 @@ static enum desc_state get_desc_state(unsigned long id,
if (id != DESC_ID(state_val))
return desc_miss;
 
-   if (state_val & DESC_REUSE_MASK)
-   return desc_reusable;
-
-   if (state_val & DESC_COMMITTED_MASK)
-   return desc_committed;
-
-   return desc_reserved;
+   return DESC_STATE(state_val);
 }
 
 /*
@@ -467,8 +453,8 @@ static enum desc_state desc_read(struct prb_desc_ring 
*desc_ring,
 static void desc_make_reusable(struct prb_desc_ring *desc_ring,
   unsigned long id)
 {
-   unsigned long val_committed = id | DESC_COMMITTED_MASK;
-   unsigned long val_reusable = val_committed | DESC_REUSE_MASK;
+   unsigned long val_committed = DESC_SV(id, desc_committed);
+   unsigned long val_reusable = DESC_SV(id, desc_reusable);
struct prb_desc *desc = to_desc(desc_ring, id);
atomic_long_t *state_var = >state_var;
 
@@ -904,7 +890,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 */
prev_state_val = atomic_long_read(>state_var); /* 
LMM(desc_reserve:E) */
if (prev_state_val &&
-   prev_state_val != (id_prev_wrap | DESC_COMMITTED_MASK | 
DESC_REUSE_MASK)) {
+   get_desc_state(id_prev_wrap, prev_state_val) != desc_reusable) {
WARN_ON_ONCE(1);
return false;
}
@@ -918,7 +904,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 * This pairs with desc_read:D.
 */
if (!atomic_long_try_cmpxchg(>state_var, _state_val,
-id | 0)) { /* LMM(desc_reserve:F) */
+   DESC_SV(id, desc_reserved))) { /* LMM(desc_reserve:F) */
WARN_ON_ONCE(1);
return false;
}
@@ -1237,7 +1223,7 @@ void prb_commit(struct prb_reserved_entry *e)
 {
struct prb_desc_ring *desc_ring = >rb->desc_ring;

[PATCH printk v5 2/6] printk: ringbuffer: add BLK_DATALESS() macro

2020-09-14 Thread John Ogness
Rather than continually needing to explicitly check @begin and @next
to identify a dataless block, introduce and use a BLK_DATALESS()
macro.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index aa6e31a27601..6ee5ebce1450 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -266,6 +266,8 @@
 
 /* Determine if a logical position refers to a data-less block. */
 #define LPOS_DATALESS(lpos)((lpos) & 1UL)
+#define BLK_DATALESS(blk)  (LPOS_DATALESS((blk)->begin) && \
+LPOS_DATALESS((blk)->next))
 
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
@@ -1021,7 +1023,7 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
/* Data-less blocks take no space. */
-   if (LPOS_DATALESS(blk_lpos->begin))
+   if (BLK_DATALESS(blk_lpos))
return 0;
 
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
@@ -1054,7 +1056,7 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
struct prb_data_block *db;
 
/* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (BLK_DATALESS(blk_lpos)) {
if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
*data_size = 0;
return "";
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v5 1/6] printk: ringbuffer: relocate get_data()

2020-09-14 Thread John Ogness
Move the internal get_data() function as-is above prb_reserve() so
that a later change can make use of the static function.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 116 +++---
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 0659b50872b5..aa6e31a27601 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1038,6 +1038,64 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
DATA_SIZE(data_ring) - DATA_INDEX(data_ring, blk_lpos->begin));
 }
 
+/*
+ * Given @blk_lpos, return a pointer to the writer data from the data block
+ * and calculate the size of the data part. A NULL pointer is returned if
+ * @blk_lpos specifies values that could never be legal.
+ *
+ * This function (used by readers) performs strict validation on the lpos
+ * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
+ * triggered if an internal error is detected.
+ */
+static const char *get_data(struct prb_data_ring *data_ring,
+   struct prb_data_blk_lpos *blk_lpos,
+   unsigned int *data_size)
+{
+   struct prb_data_block *db;
+
+   /* Data-less data block description. */
+   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
+   *data_size = 0;
+   return "";
+   }
+   return NULL;
+   }
+
+   /* Regular data block: @begin less than @next and in same wrap. */
+   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
+   blk_lpos->begin < blk_lpos->next) {
+   db = to_block(data_ring, blk_lpos->begin);
+   *data_size = blk_lpos->next - blk_lpos->begin;
+
+   /* Wrapping data block: @begin is one wrap behind @next. */
+   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
+  DATA_WRAPS(data_ring, blk_lpos->next)) {
+   db = to_block(data_ring, 0);
+   *data_size = DATA_INDEX(data_ring, blk_lpos->next);
+
+   /* Illegal block description. */
+   } else {
+   WARN_ON_ONCE(1);
+   return NULL;
+   }
+
+   /* A valid data block will always be aligned to the ID size. */
+   if (WARN_ON_ONCE(blk_lpos->begin != ALIGN(blk_lpos->begin, 
sizeof(db->id))) ||
+   WARN_ON_ONCE(blk_lpos->next != ALIGN(blk_lpos->next, 
sizeof(db->id {
+   return NULL;
+   }
+
+   /* A valid data block will always have at least an ID. */
+   if (WARN_ON_ONCE(*data_size < sizeof(db->id)))
+   return NULL;
+
+   /* Subtract block ID space from size to reflect data size. */
+   *data_size -= sizeof(db->id);
+
+   return >data[0];
+}
+
 /**
  * prb_reserve() - Reserve space in the ringbuffer.
  *
@@ -1192,64 +1250,6 @@ void prb_commit(struct prb_reserved_entry *e)
local_irq_restore(e->irqflags);
 }
 
-/*
- * Given @blk_lpos, return a pointer to the writer data from the data block
- * and calculate the size of the data part. A NULL pointer is returned if
- * @blk_lpos specifies values that could never be legal.
- *
- * This function (used by readers) performs strict validation on the lpos
- * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
- * triggered if an internal error is detected.
- */
-static const char *get_data(struct prb_data_ring *data_ring,
-   struct prb_data_blk_lpos *blk_lpos,
-   unsigned int *data_size)
-{
-   struct prb_data_block *db;
-
-   /* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
-   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
-   *data_size = 0;
-   return "";
-   }
-   return NULL;
-   }
-
-   /* Regular data block: @begin less than @next and in same wrap. */
-   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
-   blk_lpos->begin < blk_lpos->next) {
-   db = to_block(data_ring, blk_lpos->begin);
-   *data_size = blk_lpos->next - blk_lpos->begin;
-
-   /* Wrapping data block: @begin is one wrap behind @next. */
-   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
-  DATA_WRAPS(data_ring, blk_lpos->next)) {
-   db = to_block(data_ring, 0

[PATCH printk v5 6/6] printk: reimplement log_cont using record extension

2020-09-14 Thread John Ogness
Use the record extending feature of the ringbuffer to implement
continuous messages. This preserves the existing continuous message
behavior.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 98 +-
 1 file changed, 20 insertions(+), 78 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 964b5701688f..9a2e23191576 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -535,7 +535,10 @@ static int log_store(u32 caller_id, int facility, int 
level,
r.info->caller_id = caller_id;
 
/* insert message */
-   prb_commit();
+   if ((flags & LOG_CONT) || !(flags & LOG_NEWLINE))
+   prb_commit();
+   else
+   prb_final_commit();
 
return (text_len + trunc_msg_len);
 }
@@ -1084,7 +1087,7 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
dest_r.info->ts_nsec = r->info->ts_nsec;
dest_r.info->caller_id = r->info->caller_id;
 
-   prb_commit();
+   prb_final_commit();
 
return prb_record_text_space();
 }
@@ -1884,87 +1887,26 @@ static inline u32 printk_caller_id(void)
0x8000 + raw_smp_processor_id();
 }
 
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-   char buf[LOG_LINE_MAX];
-   size_t len; /* length == 0 means unused buffer */
-   u32 caller_id;  /* printk_caller_id() of first print */
-   u64 ts_nsec;/* time of first print */
-   u8 level;   /* log level of first message */
-   u8 facility;/* log facility of first message */
-   enum log_flags flags;   /* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-   if (cont.len == 0)
-   return;
-
-   log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-   cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-enum log_flags flags, const char *text, size_t len)
-{
-   /* If the line gets too long, split it up in separate records. */
-   if (cont.len + len > sizeof(cont.buf)) {
-   cont_flush();
-   return false;
-   }
-
-   if (!cont.len) {
-   cont.facility = facility;
-   cont.level = level;
-   cont.caller_id = caller_id;
-   cont.ts_nsec = local_clock();
-   cont.flags = flags;
-   }
-
-   memcpy(cont.buf + cont.len, text, len);
-   cont.len += len;
-
-   // The original flags come from the first line,
-   // but later continuations can add a newline.
-   if (flags & LOG_NEWLINE) {
-   cont.flags |= LOG_NEWLINE;
-   cont_flush();
-   }
-
-   return true;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const 
char *dict, size_t dictlen, char *text, size_t text_len)
 {
const u32 caller_id = printk_caller_id();
 
-   /*
-* If an earlier line was buffered, and we're a continuation
-* write from the same context, try to add it to the buffer.
-*/
-   if (cont.len) {
-   if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
-   return text_len;
-   }
-   /* Otherwise, make sure it's flushed */
-   cont_flush();
-   }
-
-   /* Skip empty continuation lines that couldn't be added - they just 
flush */
-   if (!text_len && (lflags & LOG_CONT))
-   return 0;
-
-   /* If it doesn't end in a newline, try to buffer the current line */
-   if (!(lflags & LOG_NEWLINE)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
+   if (lflags & LOG_CONT) {
+   struct prb_reserved_entry e;
+   struct printk_record r;
+
+   prb_rec_init_wr(, text_len, 0);
+   if (prb_reserve_in_last(, prb, , caller_id)) {
+   memcpy(_buf[r.info->text_len], text, text_len);
+   r.info->text_len += text_len;
+   if (lflags & LOG_NEWLINE) {
+   r.info->flags |= LOG_NEWLINE;
+   prb_final_commit();
+   } else {
+   prb_commit();
+   }
  

[PATCH printk v5 5/6] printk: ringbuffer: add finalization/extension support

2020-09-14 Thread John Ogness
Add support for extending the newest data block. For this, introduce
a new finalization state (desc_finalized) denoting a committed
descriptor that cannot be extended.

Until a record is finalized, a writer can reopen that record to
append new data. Reopening a record means transitioning from the
desc_committed state back to the desc_reserved state.

A writer can explicitly finalize a record if there is no intention
of extending it. Also, records are automatically finalized when a
new record is reserved. This relieves writers of needing to
explicitly finalize while also making such records available to
readers sooner. (Readers can only traverse finalized records.)

Four new memory barrier pairs are introduced. Two of them are
insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose, pairing, and
context of the two existing memory barrier pairs they provide an
alternate path for. The other two new memory barrier pairs are
significant additions:

desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
ensure the state transitions back to desc_reserved before
fully trusting the descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
ensure the state transitions to desc_committed before checking
the head ID to see if the descriptor needs to be finalized.

Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt |   3 +-
 kernel/printk/printk_ringbuffer.c | 525 --
 kernel/printk/printk_ringbuffer.h |   6 +-
 scripts/gdb/linux/dmesg.py|   3 +-
 4 files changed, 480 insertions(+), 57 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 8f533b751c46..94fabb165abf 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -297,6 +297,7 @@ end
 define dmesg
# definitions from kernel/printk/printk_ringbuffer.h
set var $desc_committed = 1
+   set var $desc_finalized = 2
set var $desc_sv_bits = sizeof(long) * 8
set var $desc_flags_shift = $desc_sv_bits - 2
set var $desc_flags_mask = 3 << $desc_flags_shift
@@ -313,7 +314,7 @@ define dmesg
 
# skip non-committed record
set var $state = 3 & ($desc->state_var.counter >> 
$desc_flags_shift)
-   if ($state == $desc_committed)
+   if ($state == $desc_committed || $state == $desc_finalized)
dump_record $desc $prev_flags
set var $prev_flags = $desc->info.flags
end
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 911fbe150e9a..4e526c79f89c 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -46,20 +46,26 @@
  * into a single descriptor field named @state_var, allowing ID and state to
  * be synchronously and atomically updated.
  *
- * Descriptors have three states:
+ * Descriptors have four states:
  *
  *   reserved
  * A writer is modifying the record.
  *
  *   committed
- * The record and all its data are complete and available for reading.
+ * The record and all its data are written. A writer can reopen the
+ * descriptor (transitioning it back to reserved), but in the committed
+ * state the data is consistent.
+ *
+ *   finalized
+ * The record and all its data are complete and available for reading. A
+ * writer cannot reopen the descriptor.
  *
  *   reusable
  * The record exists, but its text and/or dictionary data may no longer
  * be available.
  *
  * Querying the @state_var of a record requires providing the ID of the
- * descriptor to query. This can yield a possible fourth (pseudo) state:
+ * descriptor to query. This can yield a possible fifth (pseudo) state:
  *
  *   miss
  * The descriptor being queried has an unexpected ID.
@@ -79,6 +85,28 @@
  * committed or reusable queried state. This makes it possible that a valid
  * sequence number of the tail is always available.
  *
+ * Descriptor Finalization
+ * ~~~
+ * When a writer calls the commit function prb_commit(), record data is
+ * fully stored and is consistent within the ringbuffer. However, a writer can
+ * reopen that record, claiming exclusive access (as with prb_reserve()), and
+ * modify that record. When finished, the writer must again commit the record.
+ *
+ * In order for a record to be made available to readers (and also become
+ * recyclable for writers), it must be finalized. A finalized record cannot be
+ * reopened and can never become "unfinalized". Record finalization can occur
+ * in three different scenarios:
+ *
+ *   1) A writer can simultaneously commit and finalize its record by c

[PATCH printk v5 3/6] printk: ringbuffer: clear initial reserved fields

2020-09-14 Thread John Ogness
prb_reserve() will set some meta data values and leave others
uninitialized (or rather, containing the values of the previous
wrap). Simplify the API by always clearing out all the fields.
Only the sequence number is filled in. The caller is now
responsible for filling in the rest of the meta data fields.
In particular, for correctly filling in text and dict lengths.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c| 12 
 kernel/printk/printk_ringbuffer.c | 30 ++
 2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fec71229169e..964b5701688f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -520,8 +520,11 @@ static int log_store(u32 caller_id, int facility, int 
level,
memcpy(_buf[0], text, text_len);
if (trunc_msg_len)
memcpy(_buf[text_len], trunc_msg, trunc_msg_len);
-   if (r.dict_buf)
+   r.info->text_len = text_len + trunc_msg_len;
+   if (r.dict_buf) {
memcpy(_buf[0], dict, dict_len);
+   r.info->dict_len = dict_len;
+   }
r.info->facility = facility;
r.info->level = level & 7;
r.info->flags = flags & 0x1f;
@@ -1069,10 +1072,11 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
if (!prb_reserve(, rb, _r))
return 0;
 
-   memcpy(_r.text_buf[0], >text_buf[0], dest_r.text_buf_size);
+   memcpy(_r.text_buf[0], >text_buf[0], r->info->text_len);
+   dest_r.info->text_len = r->info->text_len;
if (dest_r.dict_buf) {
-   memcpy(_r.dict_buf[0], >dict_buf[0],
-  dest_r.dict_buf_size);
+   memcpy(_r.dict_buf[0], >dict_buf[0], r->info->dict_len);
+   dest_r.info->dict_len = r->info->dict_len;
}
dest_r.info->facility = r->info->facility;
dest_r.info->level = r->info->level;
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 6ee5ebce1450..82347abb22a5 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -146,10 +146,13 @@
  *
  * if (prb_reserve(, _rb, )) {
  * snprintf(r.text_buf, r.text_buf_size, "%s", textstr);
+ * r.info->text_len = strlen(textstr);
  *
  * // dictionary allocation may have failed
- * if (r.dict_buf)
+ * if (r.dict_buf) {
  * snprintf(r.dict_buf, r.dict_buf_size, "%s", dictstr);
+ * r.info->dict_len = strlen(dictstr);
+ * }
  *
  * r.info->ts_nsec = local_clock();
  *
@@ -1125,9 +1128,9 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
  * @dict_buf_size is set to 0. Writers must check this before writing to
  * dictionary space.
  *
- * @info->text_len and @info->dict_len will already be set to @text_buf_size
- * and @dict_buf_size, respectively. If dictionary space reservation fails,
- * @info->dict_len is set to 0.
+ * Important: @info->text_len and @info->dict_len need to be set correctly by
+ *the writer in order for data to be readable and/or extended.
+ *Their values are initialized to 0.
  */
 bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 struct printk_record *r)
@@ -1135,6 +1138,7 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
struct prb_desc_ring *desc_ring = >desc_ring;
struct prb_desc *d;
unsigned long id;
+   u64 seq;
 
if (!data_check_size(>text_data_ring, r->text_buf_size))
goto fail;
@@ -1159,6 +1163,14 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 
d = to_desc(desc_ring, id);
 
+   /*
+* All @info fields (except @seq) are cleared and must be filled in
+* by the writer. Save @seq before clearing because it is used to
+* determine the new sequence number.
+*/
+   seq = d->info.seq;
+   memset(>info, 0, sizeof(d->info));
+
/*
 * Set the @e fields here so that prb_commit() can be used if
 * text data allocation fails.
@@ -1177,17 +1189,15 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 * See the "Bootstrap" comment block in printk_ringbuffer.h for
 * details about how the initializer bootstraps the descriptors.
 */
-   if (d->info.seq == 0 && DESC_INDEX(desc_ring, id) != 0)
+   if (seq == 0 && DESC_INDEX(desc_ring, id) != 0)
d->info.seq = DESC_INDEX(desc_ring, id);
else
-   d->info.seq += DESCS_COUNT(desc_ring);
+

[PATCH printk v4 2/6] printk: ringbuffer: add BLK_DATALESS() macro

2020-09-08 Thread John Ogness
Rather than continually needing to explicitly check @begin and @next
to identify a dataless block, introduce and use a BLK_DATALESS()
macro.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index aa6e31a27601..6ee5ebce1450 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -266,6 +266,8 @@
 
 /* Determine if a logical position refers to a data-less block. */
 #define LPOS_DATALESS(lpos)((lpos) & 1UL)
+#define BLK_DATALESS(blk)  (LPOS_DATALESS((blk)->begin) && \
+LPOS_DATALESS((blk)->next))
 
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
@@ -1021,7 +1023,7 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
/* Data-less blocks take no space. */
-   if (LPOS_DATALESS(blk_lpos->begin))
+   if (BLK_DATALESS(blk_lpos))
return 0;
 
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
@@ -1054,7 +1056,7 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
struct prb_data_block *db;
 
/* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (BLK_DATALESS(blk_lpos)) {
if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
*data_size = 0;
return "";
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v4 6/6] printk: reimplement log_cont using record extension

2020-09-08 Thread John Ogness
Use the record extending feature of the ringbuffer to implement
continuous messages. This preserves the existing continuous message
behavior.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 98 +-
 1 file changed, 20 insertions(+), 78 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 964b5701688f..9a2e23191576 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -535,7 +535,10 @@ static int log_store(u32 caller_id, int facility, int 
level,
r.info->caller_id = caller_id;
 
/* insert message */
-   prb_commit();
+   if ((flags & LOG_CONT) || !(flags & LOG_NEWLINE))
+   prb_commit();
+   else
+   prb_final_commit();
 
return (text_len + trunc_msg_len);
 }
@@ -1084,7 +1087,7 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
dest_r.info->ts_nsec = r->info->ts_nsec;
dest_r.info->caller_id = r->info->caller_id;
 
-   prb_commit();
+   prb_final_commit();
 
return prb_record_text_space();
 }
@@ -1884,87 +1887,26 @@ static inline u32 printk_caller_id(void)
0x8000 + raw_smp_processor_id();
 }
 
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-   char buf[LOG_LINE_MAX];
-   size_t len; /* length == 0 means unused buffer */
-   u32 caller_id;  /* printk_caller_id() of first print */
-   u64 ts_nsec;/* time of first print */
-   u8 level;   /* log level of first message */
-   u8 facility;/* log facility of first message */
-   enum log_flags flags;   /* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-   if (cont.len == 0)
-   return;
-
-   log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-   cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-enum log_flags flags, const char *text, size_t len)
-{
-   /* If the line gets too long, split it up in separate records. */
-   if (cont.len + len > sizeof(cont.buf)) {
-   cont_flush();
-   return false;
-   }
-
-   if (!cont.len) {
-   cont.facility = facility;
-   cont.level = level;
-   cont.caller_id = caller_id;
-   cont.ts_nsec = local_clock();
-   cont.flags = flags;
-   }
-
-   memcpy(cont.buf + cont.len, text, len);
-   cont.len += len;
-
-   // The original flags come from the first line,
-   // but later continuations can add a newline.
-   if (flags & LOG_NEWLINE) {
-   cont.flags |= LOG_NEWLINE;
-   cont_flush();
-   }
-
-   return true;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const 
char *dict, size_t dictlen, char *text, size_t text_len)
 {
const u32 caller_id = printk_caller_id();
 
-   /*
-* If an earlier line was buffered, and we're a continuation
-* write from the same context, try to add it to the buffer.
-*/
-   if (cont.len) {
-   if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
-   return text_len;
-   }
-   /* Otherwise, make sure it's flushed */
-   cont_flush();
-   }
-
-   /* Skip empty continuation lines that couldn't be added - they just 
flush */
-   if (!text_len && (lflags & LOG_CONT))
-   return 0;
-
-   /* If it doesn't end in a newline, try to buffer the current line */
-   if (!(lflags & LOG_NEWLINE)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
+   if (lflags & LOG_CONT) {
+   struct prb_reserved_entry e;
+   struct printk_record r;
+
+   prb_rec_init_wr(, text_len, 0);
+   if (prb_reserve_in_last(, prb, , caller_id)) {
+   memcpy(_buf[r.info->text_len], text, text_len);
+   r.info->text_len += text_len;
+   if (lflags & LOG_NEWLINE) {
+   r.info->flags |= LOG_NEWLINE;
+   prb_final_commit();
+   } else {
+   prb_commit();
+   }
  

[PATCH printk v4 1/6] printk: ringbuffer: relocate get_data()

2020-09-08 Thread John Ogness
Move the internal get_data() function as-is above prb_reserve() so
that a later change can make use of the static function.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 116 +++---
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 0659b50872b5..aa6e31a27601 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1038,6 +1038,64 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
DATA_SIZE(data_ring) - DATA_INDEX(data_ring, blk_lpos->begin));
 }
 
+/*
+ * Given @blk_lpos, return a pointer to the writer data from the data block
+ * and calculate the size of the data part. A NULL pointer is returned if
+ * @blk_lpos specifies values that could never be legal.
+ *
+ * This function (used by readers) performs strict validation on the lpos
+ * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
+ * triggered if an internal error is detected.
+ */
+static const char *get_data(struct prb_data_ring *data_ring,
+   struct prb_data_blk_lpos *blk_lpos,
+   unsigned int *data_size)
+{
+   struct prb_data_block *db;
+
+   /* Data-less data block description. */
+   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
+   *data_size = 0;
+   return "";
+   }
+   return NULL;
+   }
+
+   /* Regular data block: @begin less than @next and in same wrap. */
+   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
+   blk_lpos->begin < blk_lpos->next) {
+   db = to_block(data_ring, blk_lpos->begin);
+   *data_size = blk_lpos->next - blk_lpos->begin;
+
+   /* Wrapping data block: @begin is one wrap behind @next. */
+   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
+  DATA_WRAPS(data_ring, blk_lpos->next)) {
+   db = to_block(data_ring, 0);
+   *data_size = DATA_INDEX(data_ring, blk_lpos->next);
+
+   /* Illegal block description. */
+   } else {
+   WARN_ON_ONCE(1);
+   return NULL;
+   }
+
+   /* A valid data block will always be aligned to the ID size. */
+   if (WARN_ON_ONCE(blk_lpos->begin != ALIGN(blk_lpos->begin, 
sizeof(db->id))) ||
+   WARN_ON_ONCE(blk_lpos->next != ALIGN(blk_lpos->next, 
sizeof(db->id {
+   return NULL;
+   }
+
+   /* A valid data block will always have at least an ID. */
+   if (WARN_ON_ONCE(*data_size < sizeof(db->id)))
+   return NULL;
+
+   /* Subtract block ID space from size to reflect data size. */
+   *data_size -= sizeof(db->id);
+
+   return >data[0];
+}
+
 /**
  * prb_reserve() - Reserve space in the ringbuffer.
  *
@@ -1192,64 +1250,6 @@ void prb_commit(struct prb_reserved_entry *e)
local_irq_restore(e->irqflags);
 }
 
-/*
- * Given @blk_lpos, return a pointer to the writer data from the data block
- * and calculate the size of the data part. A NULL pointer is returned if
- * @blk_lpos specifies values that could never be legal.
- *
- * This function (used by readers) performs strict validation on the lpos
- * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
- * triggered if an internal error is detected.
- */
-static const char *get_data(struct prb_data_ring *data_ring,
-   struct prb_data_blk_lpos *blk_lpos,
-   unsigned int *data_size)
-{
-   struct prb_data_block *db;
-
-   /* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
-   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
-   *data_size = 0;
-   return "";
-   }
-   return NULL;
-   }
-
-   /* Regular data block: @begin less than @next and in same wrap. */
-   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
-   blk_lpos->begin < blk_lpos->next) {
-   db = to_block(data_ring, blk_lpos->begin);
-   *data_size = blk_lpos->next - blk_lpos->begin;
-
-   /* Wrapping data block: @begin is one wrap behind @next. */
-   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
-  DATA_WRAPS(data_ring, blk_lpos->next)) {
-   db = to_block(data_ring, 0

[PATCH printk v4 5/6] printk: ringbuffer: add finalization/extension support

2020-09-08 Thread John Ogness
Add support for extending the newest data block. For this, introduce
a new finalization state (desc_finalized) denoting a committed
descriptor that cannot be extended.

Until a record is finalized, a writer can reopen that record to
append new data. Reopening a record means transitioning from the
desc_committed state back to the desc_reserved state.

A writer can explicitly finalize a record if there is no intention
of extending it. Also, records are automatically finalized when a
new record is reserved. This relieves writers of needing to
explicitly finalize while also making such records available to
readers sooner. (Readers can only traverse finalized records.)

Four new memory barrier pairs are introduced. Two of them are
insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose, pairing, and
context of the two existing memory barrier pairs they provide an
alternate path for. The other two new memory barrier pairs are
significant additions:

desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
ensure the state transitions back to desc_reserved before
fully trusting the descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
ensure the state transitions to desc_committed before checking
the head ID to see if the descriptor needs to be finalized.

Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt |   3 +-
 kernel/printk/printk_ringbuffer.c | 541 --
 kernel/printk/printk_ringbuffer.h |   6 +-
 scripts/gdb/linux/dmesg.py|   3 +-
 4 files changed, 491 insertions(+), 62 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 8f533b751c46..94fabb165abf 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -297,6 +297,7 @@ end
 define dmesg
# definitions from kernel/printk/printk_ringbuffer.h
set var $desc_committed = 1
+   set var $desc_finalized = 2
set var $desc_sv_bits = sizeof(long) * 8
set var $desc_flags_shift = $desc_sv_bits - 2
set var $desc_flags_mask = 3 << $desc_flags_shift
@@ -313,7 +314,7 @@ define dmesg
 
# skip non-committed record
set var $state = 3 & ($desc->state_var.counter >> 
$desc_flags_shift)
-   if ($state == $desc_committed)
+   if ($state == $desc_committed || $state == $desc_finalized)
dump_record $desc $prev_flags
set var $prev_flags = $desc->info.flags
end
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 911fbe150e9a..f1fab8c82819 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -46,20 +46,26 @@
  * into a single descriptor field named @state_var, allowing ID and state to
  * be synchronously and atomically updated.
  *
- * Descriptors have three states:
+ * Descriptors have four states:
  *
  *   reserved
  * A writer is modifying the record.
  *
  *   committed
- * The record and all its data are complete and available for reading.
+ * The record and all its data are written. A writer can reopen the
+ * descriptor (transitioning it back to reserved), but in the committed
+ * state the data is consistent.
+ *
+ *   finalized
+ * The record and all its data are complete and available for reading. A
+ * writer cannot reopen the descriptor.
  *
  *   reusable
  * The record exists, but its text and/or dictionary data may no longer
  * be available.
  *
  * Querying the @state_var of a record requires providing the ID of the
- * descriptor to query. This can yield a possible fourth (pseudo) state:
+ * descriptor to query. This can yield a possible fifth (pseudo) state:
  *
  *   miss
  * The descriptor being queried has an unexpected ID.
@@ -79,6 +85,28 @@
  * committed or reusable queried state. This makes it possible that a valid
  * sequence number of the tail is always available.
  *
+ * Descriptor Finalization
+ * ~~~
+ * When a writer calls the commit function prb_commit(), record data is
+ * fully stored and is consistent within the ringbuffer. However, a writer can
+ * reopen that record, claiming exclusive access (as with prb_reserve()), and
+ * modify that record. When finished, the writer must again commit the record.
+ *
+ * In order for a record to be made available to readers (and also become
+ * recyclable for writers), it must be finalized. A finalized record cannot be
+ * reopened and can never become "unfinalized". Record finalization can occur
+ * in three different scenarios:
+ *
+ *   1) A writer can simultaneously commit and finalize its record by c

[PATCH printk v4 4/6] printk: ringbuffer: change representation of states

2020-09-08 Thread John Ogness
Rather than deriving the state by evaluating bits within the flags
area of the state variable, assign the states explicit values and
set those values in the flags area. Introduce macros to make it
simple to read and write state values for the state variable.

Although the functionality is preserved, the binary representation
for the states is changed.

Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt | 12 ---
 kernel/printk/printk_ringbuffer.c | 28 +
 kernel/printk/printk_ringbuffer.h | 31 ---
 scripts/gdb/linux/dmesg.py| 11 ---
 4 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 7adece30237e..8f533b751c46 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -295,9 +295,12 @@ document dump_record
 end
 
 define dmesg
-   set var $desc_committed = 1UL << ((sizeof(long) * 8) - 1)
-   set var $flags_mask = 3UL << ((sizeof(long) * 8) - 2)
-   set var $id_mask = ~$flags_mask
+   # definitions from kernel/printk/printk_ringbuffer.h
+   set var $desc_committed = 1
+   set var $desc_sv_bits = sizeof(long) * 8
+   set var $desc_flags_shift = $desc_sv_bits - 2
+   set var $desc_flags_mask = 3 << $desc_flags_shift
+   set var $id_mask = ~$desc_flags_mask
 
set var $desc_count = 1U << prb->desc_ring.count_bits
set var $prev_flags = 0
@@ -309,7 +312,8 @@ define dmesg
set var $desc = >desc_ring.descs[$id % $desc_count]
 
# skip non-committed record
-   if (($desc->state_var.counter & $flags_mask) == $desc_committed)
+   set var $state = 3 & ($desc->state_var.counter >> 
$desc_flags_shift)
+   if ($state == $desc_committed)
dump_record $desc $prev_flags
set var $prev_flags = $desc->info.flags
end
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 82347abb22a5..911fbe150e9a 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -348,14 +348,6 @@ static bool data_check_size(struct prb_data_ring 
*data_ring, unsigned int size)
return true;
 }
 
-/* The possible responses of a descriptor state-query. */
-enum desc_state {
-   desc_miss,  /* ID mismatch */
-   desc_reserved,  /* reserved, in use by writer */
-   desc_committed, /* committed, writer is done */
-   desc_reusable,  /* free, not yet used by any writer */
-};
-
 /* Query the state of a descriptor. */
 static enum desc_state get_desc_state(unsigned long id,
  unsigned long state_val)
@@ -363,13 +355,7 @@ static enum desc_state get_desc_state(unsigned long id,
if (id != DESC_ID(state_val))
return desc_miss;
 
-   if (state_val & DESC_REUSE_MASK)
-   return desc_reusable;
-
-   if (state_val & DESC_COMMITTED_MASK)
-   return desc_committed;
-
-   return desc_reserved;
+   return DESC_STATE(state_val);
 }
 
 /*
@@ -467,8 +453,8 @@ static enum desc_state desc_read(struct prb_desc_ring 
*desc_ring,
 static void desc_make_reusable(struct prb_desc_ring *desc_ring,
   unsigned long id)
 {
-   unsigned long val_committed = id | DESC_COMMITTED_MASK;
-   unsigned long val_reusable = val_committed | DESC_REUSE_MASK;
+   unsigned long val_committed = DESC_SV(id, desc_committed);
+   unsigned long val_reusable = DESC_SV(id, desc_reusable);
struct prb_desc *desc = to_desc(desc_ring, id);
atomic_long_t *state_var = >state_var;
 
@@ -904,7 +890,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 */
prev_state_val = atomic_long_read(>state_var); /* 
LMM(desc_reserve:E) */
if (prev_state_val &&
-   prev_state_val != (id_prev_wrap | DESC_COMMITTED_MASK | 
DESC_REUSE_MASK)) {
+   get_desc_state(id_prev_wrap, prev_state_val) != desc_reusable) {
WARN_ON_ONCE(1);
return false;
}
@@ -918,7 +904,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 * This pairs with desc_read:D.
 */
if (!atomic_long_try_cmpxchg(>state_var, _state_val,
-id | 0)) { /* LMM(desc_reserve:F) */
+   DESC_SV(id, desc_reserved))) { /* LMM(desc_reserve:F) */
WARN_ON_ONCE(1);
return false;
}
@@ -1237,7 +1223,7 @@ void prb_commit(struct prb_reserved_entry *e)
 {
struct prb_desc_ring *desc_ring = >rb->desc_ring;
struct prb_d

[PATCH printk v4 3/6] printk: ringbuffer: clear initial reserved fields

2020-09-08 Thread John Ogness
prb_reserve() will set some meta data values and leave others
uninitialized (or rather, containing the values of the previous
wrap). Simplify the API by always clearing out all the fields.
Only the sequence number is filled in. The caller is now
responsible for filling in the rest of the meta data fields.
In particular, for correctly filling in text and dict lengths.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c| 12 
 kernel/printk/printk_ringbuffer.c | 30 ++
 2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fec71229169e..964b5701688f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -520,8 +520,11 @@ static int log_store(u32 caller_id, int facility, int 
level,
memcpy(_buf[0], text, text_len);
if (trunc_msg_len)
memcpy(_buf[text_len], trunc_msg, trunc_msg_len);
-   if (r.dict_buf)
+   r.info->text_len = text_len + trunc_msg_len;
+   if (r.dict_buf) {
memcpy(_buf[0], dict, dict_len);
+   r.info->dict_len = dict_len;
+   }
r.info->facility = facility;
r.info->level = level & 7;
r.info->flags = flags & 0x1f;
@@ -1069,10 +1072,11 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
if (!prb_reserve(, rb, _r))
return 0;
 
-   memcpy(_r.text_buf[0], >text_buf[0], dest_r.text_buf_size);
+   memcpy(_r.text_buf[0], >text_buf[0], r->info->text_len);
+   dest_r.info->text_len = r->info->text_len;
if (dest_r.dict_buf) {
-   memcpy(_r.dict_buf[0], >dict_buf[0],
-  dest_r.dict_buf_size);
+   memcpy(_r.dict_buf[0], >dict_buf[0], r->info->dict_len);
+   dest_r.info->dict_len = r->info->dict_len;
}
dest_r.info->facility = r->info->facility;
dest_r.info->level = r->info->level;
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 6ee5ebce1450..82347abb22a5 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -146,10 +146,13 @@
  *
  * if (prb_reserve(, _rb, )) {
  * snprintf(r.text_buf, r.text_buf_size, "%s", textstr);
+ * r.info->text_len = strlen(textstr);
  *
  * // dictionary allocation may have failed
- * if (r.dict_buf)
+ * if (r.dict_buf) {
  * snprintf(r.dict_buf, r.dict_buf_size, "%s", dictstr);
+ * r.info->dict_len = strlen(dictstr);
+ * }
  *
  * r.info->ts_nsec = local_clock();
  *
@@ -1125,9 +1128,9 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
  * @dict_buf_size is set to 0. Writers must check this before writing to
  * dictionary space.
  *
- * @info->text_len and @info->dict_len will already be set to @text_buf_size
- * and @dict_buf_size, respectively. If dictionary space reservation fails,
- * @info->dict_len is set to 0.
+ * Important: @info->text_len and @info->dict_len need to be set correctly by
+ *the writer in order for data to be readable and/or extended.
+ *Their values are initialized to 0.
  */
 bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 struct printk_record *r)
@@ -1135,6 +1138,7 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
struct prb_desc_ring *desc_ring = >desc_ring;
struct prb_desc *d;
unsigned long id;
+   u64 seq;
 
if (!data_check_size(>text_data_ring, r->text_buf_size))
goto fail;
@@ -1159,6 +1163,14 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 
d = to_desc(desc_ring, id);
 
+   /*
+* All @info fields (except @seq) are cleared and must be filled in
+* by the writer. Save @seq before clearing because it is used to
+* determine the new sequence number.
+*/
+   seq = d->info.seq;
+   memset(>info, 0, sizeof(d->info));
+
/*
 * Set the @e fields here so that prb_commit() can be used if
 * text data allocation fails.
@@ -1177,17 +1189,15 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 * See the "Bootstrap" comment block in printk_ringbuffer.h for
 * details about how the initializer bootstraps the descriptors.
 */
-   if (d->info.seq == 0 && DESC_INDEX(desc_ring, id) != 0)
+   if (seq == 0 && DESC_INDEX(desc_ring, id) != 0)
d->info.seq = DESC_INDEX(desc_ring, id);
else
-   d->info.seq += DESCS_COUNT(desc_ring);
+   d->info.seq 

[PATCH printk v4 0/6] printk: reimplement LOG_CONT handling

2020-09-08 Thread John Ogness
Hello,

Here is v4 for the second series to rework the printk subsystem.
(The v3 is here [0].) This series implements a new ringbuffer
feature that allows the last record to be extended. Petr Mladek
provided the initial proof of concept [1] for this.

Using the record extension feature, LOG_CONT is re-implemented
in a way that exactly preserves its behavior, but avoids the
need for an extra buffer. In particular, it avoids the need for
any synchronization that such a buffer requires.

This series deviates from the agreements [2] made at the meeting
during LPC2019 in Lisbon. The test results of the v1 series,
which implemented LOG_CONT as agreed upon, showed that the
effects on existing userspace tools using /dev/kmsg (journalctl,
dmesg) were not acceptable [3].

Patch 5 introduces *four* new memory barrier pairs. Two of them
are insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose and context of
the two existing memory barrier pairs they provide an alternate
path for. The other two new memory barrier pairs are significant
additions:

desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
ensure the state transitions back to desc_reserved before
fully trusting the descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
ensure the state transitions to desc_committed before checking
the head ID to see if the descriptor needs to be finalized.

The test module used to test the ringbuffer is available
here [4].

The series is based on the printk-rework branch of the printk git
tree:
e60768311af8 ("scripts/gdb: update for lockless printk ringbuffer")

The list of changes since v3:

printk_ringbuffer
=

- move enum desc_state definition to printk_ringbuffer.h

- change enum desc_state to define the exact state values used
  in the state variable

- add DESC_STATE() macro to retrieve the state from the state
  variable

- add DESC_SV() macro to build a state variable value given an
  ID and state

- get_desc_state(): simply return the state value rather than
  processing state flags

- desc_finalized is now a queried state instead of a state flag

- desc_read(): always return a set @state_var, even if the
  descriptor is in an inconsistent state (desc_reopen_last()
  relies on this)

- change state logic that tested for desc_committed to now test
  for desc_finalized, since this is the new state directly
  preceding desc_reusable

- data_realloc(): add a check if the data block should shrink
  (and in that case, do not modify the data block, i.e. data
  blocks will never shrink)

- prb_reserve_in_last(): add WARN_ON for unexpected @text_len
  value

- prb_reserve(): save a copy of @seq and use use memset() to
  clear @info

- desc_read_committed_seq(): rename function to
  desc_read_finalized_seq() since desc_finalized is the desired
  state for readers

- documentation: update state and finalization descriptions

printk.c


- use @text_len and @dict_len for memcpy() size

gdb scripts
===

- update to use new state representation

John Ogness

[0] https://lkml.kernel.org/r/20200831011058.6286-1-john.ogn...@linutronix.de
[1] https://lkml.kernel.org/r/20200812163908.GH12903@alley
[2] https://lkml.kernel.org/r/87k1acz5rx@linutronix.de
[3] https://lkml.kernel.org/r/20200811160551.GC12903@alley
[4] https://github.com/Linutronix/prb-test.git


John Ogness (6):
  printk: ringbuffer: relocate get_data()
  printk: ringbuffer: add BLK_DATALESS() macro
  printk: ringbuffer: clear initial reserved fields
  printk: ringbuffer: change representation of states
  printk: ringbuffer: add finalization/extension support
  printk: reimplement log_cont using record extension

 Documentation/admin-guide/kdump/gdbmacros.txt |  13 +-
 kernel/printk/printk.c| 110 +--
 kernel/printk/printk_ringbuffer.c | 695 ++
 kernel/printk/printk_ringbuffer.h |  35 +-
 scripts/gdb/linux/dmesg.py|  12 +-
 5 files changed, 624 insertions(+), 241 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: state names: vas: Re: [PATCH next v3 6/8] printk: ringbuffer: add finalization/extension support

2020-09-02 Thread John Ogness
On 2020-09-02, Petr Mladek  wrote:
>> +static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
>> + u32 caller_id, unsigned long *id_out)
>> +{
>> +unsigned long prev_state_val;
>> +enum desc_state d_state;
>> +struct prb_desc desc;
>> +struct prb_desc *d;
>> +unsigned long id;
>> +
>> +id = atomic_long_read(_ring->head_id);
>> +
>> +/*
>> + * To minimize unnecessarily reopening a descriptor, first check the
>> + * descriptor is in the correct state and has a matching caller ID.
>> + */
>> +d_state = desc_read(desc_ring, id, );
>> +if (d_state != desc_reserved ||
>> +!(atomic_long_read(_var) & DESC_COMMIT_MASK) ||
>
> This looks like a hack. And similar extra check of the bit is needed
> also in desc_read(), see
> https://lore.kernel.org/r/878sdvq8kd@jogness.linutronix.de

Agreed.

> I has been actually getting less and less happy with the inconsistency
> between names of the bits and states.
>
> ...
>
> First, define 5 desc_states, something like:
>
> enum desc_state {
>   desc_miss = -1, /* ID mismatch */
>   desc_modified =  0x0,   /* reserved, being modified by writer */

I prefer the "desc_reserved" name. It may or may not have be modified yet.

>   desc_committed = 0x1,   /* committed by writer, could get reopened */
>   desc_finalized = 0x2,   /* committed, could not longer get modified */
>   desc_reusable =  0x3,   /* free, not yet used by any writer */
> };
>
> Second, only 4 variants of the 3 state bits are currently used.
> It means that two bits are enough and they might use exactly
> the above names:
>
> I mean to do something like:
>
> #define DESC_SV_BITS  (sizeof(unsigned long) * 8)
> #define DESC_SV(desc_state)   ((unsigned long)desc_state << (DESC_SV_BITS - 
> 2))
> #define DESC_ST(state_val)((unsigned long)state_val >> (DESC_SV_BITS - 2))

This makes sense and will get us back the bit we lost because of
finalization.

> I am sorry that I did not came up with this earlier. I know how
> painful it is to rework bigger patchsets. But it affects format
> of the ring buffer, so we should do it early.

Agreed.

I am wondering if VMCOREINFO should include a DESC_FLAGS_MASK so that
crash tools could at least successfully iterate the ID's, even if they
didn't know what all the flag values mean (in the case that more bits
are added later).

> PS: I am still middle of review. It looks good so far. I wanted to
> send this early and separately because it is a bigger change.

Thanks for the heads up.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH next v3 6/8] printk: ringbuffer: add finalization/extension support

2020-08-31 Thread John Ogness
This critical piece was missing from patch 6...


>From 0b745d507f0c38e6d1612ed9468aa52845ca025b Mon Sep 17 00:00:00 2001
From: John Ogness 
Date: Mon, 31 Aug 2020 14:45:40 +0206
Subject: [PATCH] printk: ringbuffer: allow reading consistent descriptors

desc_read() will fail to read if a descriptor is in the desc_reserved
queried state because such data would be inconsistent. However, since
("printk: ringbuffer: add finalization/extension support") the
desc_reserved state can have the DESC_COMMIT_MASK flag set, in which
case it _is_ consistent. And indeed, desc_reopen_last() is expecting
a read in this case.

Allow desc_read() to read desc_reserved descriptors if the
DESC_COMMIT_MASK flag is set.

Signed-off-by: John Ogness 
Reported-by: Andy Lavr 
Fixes: ("printk: ringbuffer: add finalization/extension support")
---
 kernel/printk/printk_ringbuffer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 0731d5e2..6ba7d3fc96f1 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -446,8 +446,10 @@ static enum desc_state desc_read(struct prb_desc_ring 
*desc_ring,
/* Check the descriptor state. */
state_val = atomic_long_read(state_var); /* LMM(desc_read:A) */
d_state = get_desc_state(id, state_val);
-   if (d_state != desc_committed && d_state != desc_reusable)
+   if (d_state == desc_miss ||
+   (d_state == desc_reserved && !(state_val & DESC_COMMIT_MASK))) {
return d_state;
+   }
 
/*
 * Guarantee the state is loaded before copying the descriptor
-- 
2.20.1

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH next v3 7/8] printk: reimplement log_cont using record extension

2020-08-30 Thread John Ogness
Use the record extending feature of the ringbuffer to implement
continuous messages. This preserves the existing continuous message
behavior.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 98 +-
 1 file changed, 20 insertions(+), 78 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7e7d596c8878..d0b2bea1fd81 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -535,7 +535,10 @@ static int log_store(u32 caller_id, int facility, int 
level,
r.info->caller_id = caller_id;
 
/* insert message */
-   prb_commit();
+   if ((flags & LOG_CONT) || !(flags & LOG_NEWLINE))
+   prb_commit();
+   else
+   prb_final_commit();
 
return (text_len + trunc_msg_len);
 }
@@ -1093,7 +1096,7 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
dest_r.info->ts_nsec = r->info->ts_nsec;
dest_r.info->caller_id = r->info->caller_id;
 
-   prb_commit();
+   prb_final_commit();
 
return prb_record_text_space();
 }
@@ -1893,87 +1896,26 @@ static inline u32 printk_caller_id(void)
0x8000 + raw_smp_processor_id();
 }
 
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-   char buf[LOG_LINE_MAX];
-   size_t len; /* length == 0 means unused buffer */
-   u32 caller_id;  /* printk_caller_id() of first print */
-   u64 ts_nsec;/* time of first print */
-   u8 level;   /* log level of first message */
-   u8 facility;/* log facility of first message */
-   enum log_flags flags;   /* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-   if (cont.len == 0)
-   return;
-
-   log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-   cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-enum log_flags flags, const char *text, size_t len)
-{
-   /* If the line gets too long, split it up in separate records. */
-   if (cont.len + len > sizeof(cont.buf)) {
-   cont_flush();
-   return false;
-   }
-
-   if (!cont.len) {
-   cont.facility = facility;
-   cont.level = level;
-   cont.caller_id = caller_id;
-   cont.ts_nsec = local_clock();
-   cont.flags = flags;
-   }
-
-   memcpy(cont.buf + cont.len, text, len);
-   cont.len += len;
-
-   // The original flags come from the first line,
-   // but later continuations can add a newline.
-   if (flags & LOG_NEWLINE) {
-   cont.flags |= LOG_NEWLINE;
-   cont_flush();
-   }
-
-   return true;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const 
char *dict, size_t dictlen, char *text, size_t text_len)
 {
const u32 caller_id = printk_caller_id();
 
-   /*
-* If an earlier line was buffered, and we're a continuation
-* write from the same context, try to add it to the buffer.
-*/
-   if (cont.len) {
-   if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
-   return text_len;
-   }
-   /* Otherwise, make sure it's flushed */
-   cont_flush();
-   }
-
-   /* Skip empty continuation lines that couldn't be added - they just 
flush */
-   if (!text_len && (lflags & LOG_CONT))
-   return 0;
-
-   /* If it doesn't end in a newline, try to buffer the current line */
-   if (!(lflags & LOG_NEWLINE)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
+   if (lflags & LOG_CONT) {
+   struct prb_reserved_entry e;
+   struct printk_record r;
+
+   prb_rec_init_wr(, text_len, 0);
+   if (prb_reserve_in_last(, prb, , caller_id)) {
+   memcpy(_buf[r.info->text_len], text, text_len);
+   r.info->text_len += text_len;
+   if (lflags & LOG_NEWLINE) {
+   r.info->flags |= LOG_NEWLINE;
+   prb_final_commit();
+   } else {
+   prb_commit();
+   }
return text_len;

[PATCH next v3 6/8] printk: ringbuffer: add finalization/extension support

2020-08-30 Thread John Ogness
Add support for extending the newest data block. For this, introduce
a new finalization state flag (DESC_FINAL_MASK) that denotes when a
descriptor may not be extended, i.e. is finalized.

The DESC_COMMIT_MASK is still set when the record data is in a
consistent state, i.e. the writer is no longer modifying the record.
However, the record remains in the desc_reserved queried state until
it is finalized, in which case it transitions to the desc_committed
queried state.

Until a record is finalized, a writer can reopen that record to
append new data. Reopening a record means clearing the
DESC_COMMIT_MASK flag.

A writer can explicitly finalize a record if there is no intention
of extending it. Also, records are automatically finalized when a
new record is reserved. This relieves writers of needing to
explicitly finalize while also making such records available to
readers sooner. (Readers can only traverse finalized records.)

Three new memory barrier pairs are introduced. Two of them are not
significant because they are alternate path memory barriers that
exactly correspond to existing memory barriers.

But the third (_prb_commit:B / desc_reserve:D) is new and guarantees
that descriptors will always be finalized, either because a
descriptor setting DESC_COMMIT_MASK sees that there is a newer
descriptor and so finalizes itself or because a new descriptor being
reserved sees that the previous descriptor has DESC_COMMIT_MASK set
and finalizes that descriptor.

Signed-off-by: John Ogness 
---
 kernel/printk/printk_ringbuffer.c | 467 --
 kernel/printk/printk_ringbuffer.h |   8 +-
 2 files changed, 443 insertions(+), 32 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index da54d4fadf96..0731d5e2 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -49,14 +49,16 @@
  * Descriptors have three states:
  *
  *   reserved
- * A writer is modifying the record.
+ * A writer is modifying the record. Internally represented as either "0"
+ * or "DESC_COMMIT_MASK".
  *
  *   committed
  * The record and all its data are complete and available for reading.
+ * Internally represented as "DESC_COMMIT_MASK | DESC_FINAL_MASK".
  *
  *   reusable
  * The record exists, but its text and/or dictionary data may no longer
- * be available.
+ * be available. Internally represented as "DESC_REUSE_MASK".
  *
  * Querying the @state_var of a record requires providing the ID of the
  * descriptor to query. This can yield a possible fourth (pseudo) state:
@@ -79,6 +81,25 @@
  * committed or reusable queried state. This makes it possible that a valid
  * sequence number of the tail is always available.
  *
+ * Descriptor Finalization
+ * ~~~
+ * When a writer calls the commit function prb_commit(), the record may still
+ * continue to be in the reserved queried state. In order for that record to
+ * enter into the committed queried state, that record also must be finalized.
+ * A record can be finalized by three different scenarios:
+ *
+ *   1) A writer can finalize its record immediately by calling
+ *  prb_final_commit() instead of prb_commit().
+ *
+ *   2) When a new record is reserved and the previous record has been
+ *  committed via prb_commit(), that previous record is finalized.
+ *
+ *   3) When a record is committed via prb_commit() and a newer record
+ *  already exists, the record being committed is finalized.
+ *
+ * Until a record is finalized (represented by "DESC_FINAL_MASK"), a writer
+ * may "reopen" that record and extend it with more data.
+ *
  * Data Rings
  * ~~
  * The two data rings (text and dictionary) function identically. They exist
@@ -156,9 +177,38 @@
  *
  * r.info->ts_nsec = local_clock();
  *
+ * prb_final_commit();
+ * }
+ *
+ * Note that additional writer functions are available to extend a record
+ * after it has been committed but not yet finalized. This can be done as
+ * long as no new records have been reserved and the caller is the same.
+ *
+ * Sample writer code (record extending)::
+ *
+ * // alternate rest of previous example
+ * r.info->ts_nsec = local_clock();
+ * r.info->text_len = strlen(textstr);
+ * r.info->caller_id = printk_caller_id();
+ *
+ * // commit the record (but do not finalize yet)
  * prb_commit();
  * }
  *
+ * ...
+ *
+ * // specify additional 5 bytes text space to extend
+ * prb_rec_init_wr(, 5, 0);
+ *
+ * if (prb_reserve_in_last(, _rb, , printk_caller_id())) {
+ * snprintf(_buf[r.info->text_len],
+ *  r.text_buf_size - r.info->text_len, "hello");
+ *
+ * r.info->text_len += 5;
+ *
+ * prb_final_commit();
+ *

[PATCH next v3 3/8] printk: ringbuffer: relocate get_data()

2020-08-30 Thread John Ogness
Move the internal get_data() function as-is above prb_reserve() so
that a later change can make use of the static function.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 116 +++---
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index d339ff7647da..86af38c2cf77 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1038,6 +1038,64 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
DATA_SIZE(data_ring) - DATA_INDEX(data_ring, blk_lpos->begin));
 }
 
+/*
+ * Given @blk_lpos, return a pointer to the writer data from the data block
+ * and calculate the size of the data part. A NULL pointer is returned if
+ * @blk_lpos specifies values that could never be legal.
+ *
+ * This function (used by readers) performs strict validation on the lpos
+ * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
+ * triggered if an internal error is detected.
+ */
+static const char *get_data(struct prb_data_ring *data_ring,
+   struct prb_data_blk_lpos *blk_lpos,
+   unsigned int *data_size)
+{
+   struct prb_data_block *db;
+
+   /* Data-less data block description. */
+   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
+   *data_size = 0;
+   return "";
+   }
+   return NULL;
+   }
+
+   /* Regular data block: @begin less than @next and in same wrap. */
+   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
+   blk_lpos->begin < blk_lpos->next) {
+   db = to_block(data_ring, blk_lpos->begin);
+   *data_size = blk_lpos->next - blk_lpos->begin;
+
+   /* Wrapping data block: @begin is one wrap behind @next. */
+   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
+  DATA_WRAPS(data_ring, blk_lpos->next)) {
+   db = to_block(data_ring, 0);
+   *data_size = DATA_INDEX(data_ring, blk_lpos->next);
+
+   /* Illegal block description. */
+   } else {
+   WARN_ON_ONCE(1);
+   return NULL;
+   }
+
+   /* A valid data block will always be aligned to the ID size. */
+   if (WARN_ON_ONCE(blk_lpos->begin != ALIGN(blk_lpos->begin, 
sizeof(db->id))) ||
+   WARN_ON_ONCE(blk_lpos->next != ALIGN(blk_lpos->next, 
sizeof(db->id {
+   return NULL;
+   }
+
+   /* A valid data block will always have at least an ID. */
+   if (WARN_ON_ONCE(*data_size < sizeof(db->id)))
+   return NULL;
+
+   /* Subtract block ID space from size to reflect data size. */
+   *data_size -= sizeof(db->id);
+
+   return >data[0];
+}
+
 /**
  * prb_reserve() - Reserve space in the ringbuffer.
  *
@@ -1192,64 +1250,6 @@ void prb_commit(struct prb_reserved_entry *e)
local_irq_restore(e->irqflags);
 }
 
-/*
- * Given @blk_lpos, return a pointer to the writer data from the data block
- * and calculate the size of the data part. A NULL pointer is returned if
- * @blk_lpos specifies values that could never be legal.
- *
- * This function (used by readers) performs strict validation on the lpos
- * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
- * triggered if an internal error is detected.
- */
-static const char *get_data(struct prb_data_ring *data_ring,
-   struct prb_data_blk_lpos *blk_lpos,
-   unsigned int *data_size)
-{
-   struct prb_data_block *db;
-
-   /* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
-   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
-   *data_size = 0;
-   return "";
-   }
-   return NULL;
-   }
-
-   /* Regular data block: @begin less than @next and in same wrap. */
-   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
-   blk_lpos->begin < blk_lpos->next) {
-   db = to_block(data_ring, blk_lpos->begin);
-   *data_size = blk_lpos->next - blk_lpos->begin;
-
-   /* Wrapping data block: @begin is one wrap behind @next. */
-   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
-  DATA_WRAPS(data_ring, blk_lpos->next)) {
-   db = to_block(data_ring, 0

[PATCH next v3 0/8] printk: reimplement LOG_CONT handling

2020-08-30 Thread John Ogness
Hello,

Here is v3 for the second series to rework the printk subsystem.
(The v2 is here [0].) This series implements a new ringbuffer
feature that allows the last record to be extended. Petr Mladek
provided the initial proof of concept [1] for this.

Using the record extension feature, LOG_CONT is re-implemented
in a way that exactly preserves its behavior, but avoids the
need for an extra buffer. In particular, it avoids the need for
any synchronization that such a buffer requires.

This series deviates from the agreements [2] made at the meeting
during LPC2019 in Lisbon. The test results of the v1 series,
which implemented LOG_CONT as agreed upon, showed that the
effects on existing userspace tools using /dev/kmsg (journalctl,
dmesg) were not acceptable [3].

The main difference to v2 is the implementation of the new
descriptor finalization. For v3 the implementation closely
follows the example [4] from Petr Mladek.

Patch 6 introduces *four* new memory barrier pairs. Two of them
are insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose and context of
the two existing memory barrier pairs they provide an alternate
path for. The other two new memory barrier pairs are significant
additions:

desc_reopen_last:A/_prb_commit:B - When reopening a descriptor,
ensure the commit flag is removed before fully trusing the
descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
ensure the commit flag is set before checking the head ID
to see if the finalize flag should be set.

Patch 8 assumes the gdb script series [5] for the new printk
ringbuffer has been applied.

The test module used to test the ringbuffer is available
here [6].

The series is based on next-20200828.

The list of changes since v2:

printk_ringbuffer
=

- prb_commit(): finalize self if no longer the head

- prb_reserve(): clear @info fields on success

- prb_reserve(): do not finalize the -1 placeholder descriptor

- desc_make_final(): renamed from desc_finalize()

- desc_make_final(): remove loop, change to single shot attempt

- prb_reserve_in_last(): renamed from prb_reserve_last()

- prb_reserve_in_last(): add new fail goto target

- prb_reserve_in_last(): fix logic for calculating
  @text_buf_size and add size check

- desc_reopen_last(): add extra caller ID check before reopening

- desc_reopen_last(): change cmpcxhg() to full memory barrier

- get_desc_state(): remove unneeded @is_final argument

- documentation: update finalization, sample code, and memory
  barrier list

printk.c


- set @text_len and @dict_len as required by prb_reserve() change

John Ogness

[0] https://lkml.kernel.org/r/20200824103538.31446-1-john.ogn...@linutronix.de
[1] https://lkml.kernel.org/r/20200812163908.GH12903@alley
[2] https://lkml.kernel.org/r/87k1acz5rx@linutronix.de
[3] https://lkml.kernel.org/r/20200811160551.GC12903@alley
[4] https://lkml.kernel.org/r/20200827151710.GB4928@alley
[5] 
https://lkml.kernel.org/r/CAHk-=wj_b6Bh=d-wwh0xyqoqbhhkyeexhszkpxdra6gjtvk...@mail.gmail.com
[6] https://lkml.kernel.org/r/20200814212525.6118-1-john.ogn...@linutronix.de
[7] https://github.com/Linutronix/prb-test.git

John Ogness (8):
  printk: ringbuffer: rename DESC_COMMITTED_MASK flag
  printk: ringbuffer: change representation of reusable
  printk: ringbuffer: relocate get_data()
  printk: ringbuffer: add BLK_DATALESS() macro
  printk: ringbuffer: clear initial reserved fields
  printk: ringbuffer: add finalization/extension support
  printk: reimplement log_cont using record extension
  scripts/gdb: support printk finalized records

 Documentation/admin-guide/kdump/gdbmacros.txt |  10 +-
 kernel/printk/printk.c| 105 +--
 kernel/printk/printk_ringbuffer.c | 604 +++---
 kernel/printk/printk_ringbuffer.h |  12 +-
 scripts/gdb/linux/dmesg.py|  10 +-
 5 files changed, 558 insertions(+), 183 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH next v3 1/8] printk: ringbuffer: rename DESC_COMMITTED_MASK flag

2020-08-30 Thread John Ogness
An upcoming ringbuffer support for continuous lines will allow to
reopen records with DESC_COMMITTED_MASK set. As a result, the flag
will no longer describe the final committed state. Rename it to
DESC_COMMIT_MASK as a preparation step.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 8 
 kernel/printk/printk_ringbuffer.h | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 0659b50872b5..76248c82d557 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -361,7 +361,7 @@ static enum desc_state get_desc_state(unsigned long id,
if (state_val & DESC_REUSE_MASK)
return desc_reusable;
 
-   if (state_val & DESC_COMMITTED_MASK)
+   if (state_val & DESC_COMMIT_MASK)
return desc_committed;
 
return desc_reserved;
@@ -462,7 +462,7 @@ static enum desc_state desc_read(struct prb_desc_ring 
*desc_ring,
 static void desc_make_reusable(struct prb_desc_ring *desc_ring,
   unsigned long id)
 {
-   unsigned long val_committed = id | DESC_COMMITTED_MASK;
+   unsigned long val_committed = id | DESC_COMMIT_MASK;
unsigned long val_reusable = val_committed | DESC_REUSE_MASK;
struct prb_desc *desc = to_desc(desc_ring, id);
atomic_long_t *state_var = >state_var;
@@ -899,7 +899,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 */
prev_state_val = atomic_long_read(>state_var); /* 
LMM(desc_reserve:E) */
if (prev_state_val &&
-   prev_state_val != (id_prev_wrap | DESC_COMMITTED_MASK | 
DESC_REUSE_MASK)) {
+   prev_state_val != (id_prev_wrap | DESC_COMMIT_MASK | 
DESC_REUSE_MASK)) {
WARN_ON_ONCE(1);
return false;
}
@@ -1184,7 +1184,7 @@ void prb_commit(struct prb_reserved_entry *e)
 * this. This pairs with desc_read:B.
 */
if (!atomic_long_try_cmpxchg(>state_var, _state_val,
-e->id | DESC_COMMITTED_MASK)) { /* 
LMM(prb_commit:B) */
+e->id | DESC_COMMIT_MASK)) { /* 
LMM(prb_commit:B) */
WARN_ON_ONCE(1);
}
 
diff --git a/kernel/printk/printk_ringbuffer.h 
b/kernel/printk/printk_ringbuffer.h
index e6302da041f9..dcda5e9b4676 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -115,9 +115,9 @@ struct prb_reserved_entry {
 #define _DATA_SIZE(sz_bits)(1UL << (sz_bits))
 #define _DESCS_COUNT(ct_bits)  (1U << (ct_bits))
 #define DESC_SV_BITS   (sizeof(unsigned long) * 8)
-#define DESC_COMMITTED_MASK(1UL << (DESC_SV_BITS - 1))
+#define DESC_COMMIT_MASK   (1UL << (DESC_SV_BITS - 1))
 #define DESC_REUSE_MASK(1UL << (DESC_SV_BITS - 2))
-#define DESC_FLAGS_MASK(DESC_COMMITTED_MASK | 
DESC_REUSE_MASK)
+#define DESC_FLAGS_MASK(DESC_COMMIT_MASK | 
DESC_REUSE_MASK)
 #define DESC_ID_MASK   (~DESC_FLAGS_MASK)
 #define DESC_ID(sv)((sv) & DESC_ID_MASK)
 #define FAILED_LPOS0x1
@@ -213,7 +213,7 @@ struct prb_reserved_entry {
  */
 #define BLK0_LPOS(sz_bits) (-(_DATA_SIZE(sz_bits)))
 #define DESC0_ID(ct_bits)  DESC_ID(-(_DESCS_COUNT(ct_bits) + 1))
-#define DESC0_SV(ct_bits)  (DESC_COMMITTED_MASK | DESC_REUSE_MASK | 
DESC0_ID(ct_bits))
+#define DESC0_SV(ct_bits)  (DESC_COMMIT_MASK | DESC_REUSE_MASK | 
DESC0_ID(ct_bits))
 
 /*
  * Define a ringbuffer with an external text data buffer. The same as
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH next v3 8/8] scripts/gdb: support printk finalized records

2020-08-30 Thread John Ogness
With commit ("printk: ringbuffer: add finalization/extension support")
a new state bit for finalized records was added. This not only changed
the bit representation of committed records, but also reduced the size
for record IDs.

Update the gdb scripts to correctly interpret the state variable.

Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt | 10 +++---
 scripts/gdb/linux/dmesg.py| 10 ++
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 7adece30237e..bcb78368b381 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -295,8 +295,11 @@ document dump_record
 end
 
 define dmesg
-   set var $desc_committed = 1UL << ((sizeof(long) * 8) - 1)
-   set var $flags_mask = 3UL << ((sizeof(long) * 8) - 2)
+   # definitions from kernel/printk/printk_ringbuffer.h
+   set var $desc_commit = 1UL << ((sizeof(long) * 8) - 1)
+   set var $desc_final = 1UL << ((sizeof(long) * 8) - 2)
+   set var $desc_reuse = 1UL << ((sizeof(long) * 8) - 3)
+   set var $flags_mask = $desc_commit | $desc_final | $desc_reuse
set var $id_mask = ~$flags_mask
 
set var $desc_count = 1U << prb->desc_ring.count_bits
@@ -309,7 +312,8 @@ define dmesg
set var $desc = >desc_ring.descs[$id % $desc_count]
 
# skip non-committed record
-   if (($desc->state_var.counter & $flags_mask) == $desc_committed)
+   # (note that commit+!final records will be displayed)
+   if (($desc->state_var.counter & $desc_commit) == $desc_commit)
dump_record $desc $prev_flags
set var $prev_flags = $desc->info.flags
end
diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
index 6c6022012ea8..367523c5c270 100644
--- a/scripts/gdb/linux/dmesg.py
+++ b/scripts/gdb/linux/dmesg.py
@@ -79,9 +79,10 @@ class LxDmesg(gdb.Command):
 
 # definitions from kernel/printk/printk_ringbuffer.h
 desc_sv_bits = utils.get_long_type().sizeof * 8
-desc_committed_mask = 1 << (desc_sv_bits - 1)
-desc_reuse_mask = 1 << (desc_sv_bits - 2)
-desc_flags_mask = desc_committed_mask | desc_reuse_mask
+desc_commit_mask = 1 << (desc_sv_bits - 1)
+desc_final_mask = 1 << (desc_sv_bits - 2)
+desc_reuse_mask = 1 << (desc_sv_bits - 3)
+desc_flags_mask = desc_commit_mask | desc_final_mask | desc_reuse_mask
 desc_id_mask = ~desc_flags_mask
 
 # read in tail and head descriptor ids
@@ -96,8 +97,9 @@ class LxDmesg(gdb.Command):
 desc_off = desc_sz * ind
 
 # skip non-committed record
+# (note that commit+!final records will be displayed)
 state = utils.read_u64(descs, desc_off + sv_off + counter_off) & 
desc_flags_mask
-if state != desc_committed_mask:
+if state & desc_commit_mask != desc_commit_mask:
 if did == head_id:
 break
 did = (did + 1) & desc_id_mask
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH next v3 2/8] printk: ringbuffer: change representation of reusable

2020-08-30 Thread John Ogness
The reusable queried state is represented by the combined flags:

DESC_COMMIT_MASK | DESC_REUSE_MASK

There is no reason for the DESC_COMMIT_MASK to be part of that
representation. In particular, this will add confusion when more
state flags are available.

Change the representation of the reusable queried state to just
the DESC_REUSE_MASK flag.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 4 ++--
 kernel/printk/printk_ringbuffer.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 76248c82d557..d339ff7647da 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -463,7 +463,7 @@ static void desc_make_reusable(struct prb_desc_ring 
*desc_ring,
   unsigned long id)
 {
unsigned long val_committed = id | DESC_COMMIT_MASK;
-   unsigned long val_reusable = val_committed | DESC_REUSE_MASK;
+   unsigned long val_reusable = id | DESC_REUSE_MASK;
struct prb_desc *desc = to_desc(desc_ring, id);
atomic_long_t *state_var = >state_var;
 
@@ -899,7 +899,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 */
prev_state_val = atomic_long_read(>state_var); /* 
LMM(desc_reserve:E) */
if (prev_state_val &&
-   prev_state_val != (id_prev_wrap | DESC_COMMIT_MASK | 
DESC_REUSE_MASK)) {
+   get_desc_state(id_prev_wrap, prev_state_val) != desc_reusable) {
WARN_ON_ONCE(1);
return false;
}
diff --git a/kernel/printk/printk_ringbuffer.h 
b/kernel/printk/printk_ringbuffer.h
index dcda5e9b4676..96ef997d7bd6 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -213,7 +213,7 @@ struct prb_reserved_entry {
  */
 #define BLK0_LPOS(sz_bits) (-(_DATA_SIZE(sz_bits)))
 #define DESC0_ID(ct_bits)  DESC_ID(-(_DESCS_COUNT(ct_bits) + 1))
-#define DESC0_SV(ct_bits)  (DESC_COMMIT_MASK | DESC_REUSE_MASK | 
DESC0_ID(ct_bits))
+#define DESC0_SV(ct_bits)  (DESC_REUSE_MASK | DESC0_ID(ct_bits))
 
 /*
  * Define a ringbuffer with an external text data buffer. The same as
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH next v3 5/8] printk: ringbuffer: clear initial reserved fields

2020-08-30 Thread John Ogness
prb_reserve() will set some meta data values and leave others
uninitialized (or rather, containing the values of the previous
wrap). Simplify the API by always clearing out all the fields.
Only the sequence number is filled in. The caller is now
responsible for filling in the rest of the meta data fields.
In particular, for correctly filling in text and dict lengths.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c|  7 ++-
 kernel/printk/printk_ringbuffer.c | 29 +++--
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ad8d1dfe5fbe..7e7d596c8878 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -520,8 +520,11 @@ static int log_store(u32 caller_id, int facility, int 
level,
memcpy(_buf[0], text, text_len);
if (trunc_msg_len)
memcpy(_buf[text_len], trunc_msg, trunc_msg_len);
-   if (r.dict_buf)
+   r.info->text_len = text_len + trunc_msg_len;
+   if (r.dict_buf) {
memcpy(_buf[0], dict, dict_len);
+   r.info->dict_len = dict_len;
+   }
r.info->facility = facility;
r.info->level = level & 7;
r.info->flags = flags & 0x1f;
@@ -1078,9 +1081,11 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
return 0;
 
memcpy(_r.text_buf[0], >text_buf[0], dest_r.text_buf_size);
+   dest_r.info->text_len = r->info->text_len;
if (dest_r.dict_buf) {
memcpy(_r.dict_buf[0], >dict_buf[0],
   dest_r.dict_buf_size);
+   dest_r.info->dict_len = r->info->dict_len;
}
dest_r.info->facility = r->info->facility;
dest_r.info->level = r->info->level;
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index d66718e74aae..da54d4fadf96 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -146,10 +146,13 @@
  *
  * if (prb_reserve(, _rb, )) {
  * snprintf(r.text_buf, r.text_buf_size, "%s", textstr);
+ * r.info->text_len = strlen(textstr);
  *
  * // dictionary allocation may have failed
- * if (r.dict_buf)
+ * if (r.dict_buf) {
  * snprintf(r.dict_buf, r.dict_buf_size, "%s", dictstr);
+ * r.info->dict_len = strlen(dictstr);
+ * }
  *
  * r.info->ts_nsec = local_clock();
  *
@@ -1125,9 +1128,9 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
  * @dict_buf_size is set to 0. Writers must check this before writing to
  * dictionary space.
  *
- * @info->text_len and @info->dict_len will already be set to @text_buf_size
- * and @dict_buf_size, respectively. If dictionary space reservation fails,
- * @info->dict_len is set to 0.
+ * Important: @info->text_len and @info->dict_len need to be set correctly by
+ *the writer in order for data to be readable and/or extended.
+ *Their values are initialized to 0.
  */
 bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 struct printk_record *r)
@@ -1159,6 +1162,18 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 
d = to_desc(desc_ring, id);
 
+   /*
+* Clear all @info fields except for @seq, which is used to determine
+* the new sequence number. The writer must fill in new values.
+*/
+   d->info.ts_nsec = 0;
+   d->info.text_len = 0;
+   d->info.dict_len = 0;
+   d->info.facility = 0;
+   d->info.flags = 0;
+   d->info.level = 0;
+   d->info.caller_id = 0;
+
/*
 * Set the @e fields here so that prb_commit() can be used if
 * text data allocation fails.
@@ -1186,8 +1201,6 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 >text_blk_lpos, id);
/* If text data allocation fails, a data-less record is committed. */
if (r->text_buf_size && !r->text_buf) {
-   d->info.text_len = 0;
-   d->info.dict_len = 0;
prb_commit(e);
/* prb_commit() re-enabled interrupts. */
goto fail;
@@ -1204,10 +1217,6 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 
r->info = >info;
 
-   /* Set default values for the sizes. */
-   d->info.text_len = r->text_buf_size;
-   d->info.dict_len = r->dict_buf_size;
-
/* Record full text space used by record. */
e->text_space = space_used(>text_data_ring, >text_blk_lpos);
 
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH next v3 4/8] printk: ringbuffer: add BLK_DATALESS() macro

2020-08-30 Thread John Ogness
Rather than continually needing to explicitly check @begin and @next
to identify a dataless block, introduce and use a BLK_DATALESS()
macro.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 86af38c2cf77..d66718e74aae 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -266,6 +266,8 @@
 
 /* Determine if a logical position refers to a data-less block. */
 #define LPOS_DATALESS(lpos)((lpos) & 1UL)
+#define BLK_DATALESS(blk)  (LPOS_DATALESS((blk)->begin) && \
+LPOS_DATALESS((blk)->next))
 
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
@@ -1021,7 +1023,7 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
/* Data-less blocks take no space. */
-   if (LPOS_DATALESS(blk_lpos->begin))
+   if (BLK_DATALESS(blk_lpos))
return 0;
 
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
@@ -1054,7 +1056,7 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
struct prb_data_block *db;
 
/* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (BLK_DATALESS(blk_lpos)) {
if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
*data_size = 0;
return "";
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-28 Thread John Ogness
On 2020-08-28, Petr Mladek  wrote:
>> Below is a patch against this series that adds support for finalizing
>> all 4 queried states. It passes all my tests. Note that the code handles
>> 2 corner cases:
>> 
>> 1. When seq is 0, there is no previous descriptor to finalize. This
>>exception is important because we don't want to finalize the -1
>>placeholder. Otherwise, upon the first wrap, a descriptor will be
>>prematurely finalized.
>> 
>> 2. When a previous descriptor is being reserved for the first time, it
>>might have a state_var value of 0 because the writer is still in
>>prb_reserve() and has not set the initial value yet. I added
>>considerable comments on this special case.
>> 
>> I am comfortable with adding this new code, although it clearly adds
>> complexity.
>> 
>> John Ogness
>> 
>> diff --git a/kernel/printk/printk_ringbuffer.c 
>> b/kernel/printk/printk_ringbuffer.c
>> index 90d48973ac9e..1ed1e9eb930f 100644
>> --- a/kernel/printk/printk_ringbuffer.c
>> +++ b/kernel/printk/printk_ringbuffer.c
>> @@ -860,9 +860,11 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
>> unsigned long *id_out)
>>  struct prb_desc_ring *desc_ring = >desc_ring;
>>  unsigned long prev_state_val;
>>  unsigned long id_prev_wrap;
>> +unsigned long state_val;
>>  struct prb_desc *desc;
>>  unsigned long head_id;
>>  unsigned long id;
>> +bool is_final;
>>  
>>  head_id = atomic_long_read(_ring->head_id); /* LMM(desc_reserve:A) 
>> */
>>  
>> @@ -953,10 +955,17 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
>> unsigned long *id_out)
>>   * See "ABA Issues" about why this verification is performed.
>>   */
>>  prev_state_val = atomic_long_read(>state_var); /* 
>> LMM(desc_reserve:E) */
>> -if (prev_state_val &&
>> -get_desc_state(id_prev_wrap, prev_state_val, NULL) != 
>> desc_reusable) {
>> -WARN_ON_ONCE(1);
>> -return false;
>> +if (get_desc_state(id_prev_wrap, prev_state_val, _final) != 
>> desc_reusable) {
>> +/*
>> + * If this descriptor has never been used, @prev_state_val
>> + * will be 0. However, even though it may have never been
>> + * used, it may have been finalized. So that flag must be
>> + * ignored.
>> + */
>> +if ((prev_state_val & ~DESC_FINAL_MASK)) {
>> +WARN_ON_ONCE(1);
>> +return false;
>> +}
>>  }
>>  
>>  /*
>> @@ -967,10 +976,25 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
>> unsigned long *id_out)
>>   * any other changes. A write memory barrier is sufficient for this.
>>   * This pairs with desc_read:D.
>>   */
>> -if (!atomic_long_try_cmpxchg(>state_var, _state_val,
>> - id | 0)) { /* LMM(desc_reserve:F) */
>> -WARN_ON_ONCE(1);
>> -return false;
>> +if (is_final)
>> +state_val = id | 0 | DESC_FINAL_MASK;
>
> The state from the previous wrap always have to have DESC_FINAL_MASK set.
> Do I miss something, please?

Important: FINAL is not a _state_. It is a _flag_ that marks a
descriptor as non-reopenable. This was a simple change because it does
not affect any state logic. The number of states and possible
transitions have not changed.

When a descriptor transitions to reusable, the FINAL flag is cleared. It
has reached the end of its lifecycle. See desc_make_reusable().

(In order to have transitioned to reusable, the FINAL and COMMIT flags
must have been set.)

In the case of desc_reserve(), a reusable descriptor is transitioning to
reserved. When this transition happens, there may already be a later
descriptor that has been reserved and finalized this descriptor. If the
FINAL flag is set here, it means that the FINAL flag is set for the
_new_ descriptor being reserved.

In summary, the FINAL flag can be set in _any_ state. Once set, it is
preserved for all further state transitions. And it is cleared when that
descriptor becomes reusable.

>> +else
>> +state_val = id | 0;
>> +if (atomic_long_cmpxchg(>state_var, prev_state_val,
>> +state_val) != prev_state_val) { /* 
>> LMM(desc_reserve:F) */
>> +/*
>> + * This reusable descriptor must have been finalized already.
>> + * Retry with a reusable+final 

Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-27 Thread John Ogness
ead:B.
>>   */
>>  if (!atomic_long_try_cmpxchg(>state_var, _state_val,
>> - e->id | DESC_COMMIT_MASK)) { /* 
>> LMM(prb_commit:B) */
>> -WARN_ON_ONCE(1);
>> + e->id | DESC_COMMIT_MASK |
>> + final_mask)) { /* 
>> LMM(_prb_commit:B) */
>> +/*
>> + * This reserved descriptor must have been finalized already.
>> + * Retry with a reserved+final expected value.
>> + */
>> +prev_state_val = e->id | 0 | DESC_FINAL_MASK;
>
> This does not make sense to me. The state "e->id | 0 | DESC_FINAL_MASK"
> must never happen. It would mean that someone finalized
> record that is still being modified.

Correct. Setting the FINAL flag means the descriptor cannot be
_reopened_. It has nothing to do with the current state of the
descriptor. Once the FINAL flag is set, it remains set for the remaining
lifetime of that record.

> Or we both have different understanding of the logic.

Yes.

> Well, there are actually two approaches:
>
>+ I originally expected that FINAL bit could be set only when
>  COMMIT bit is set. But this brings the problems that prb_commit()
>  would need to set FINAL when it is not longer the last descriptor.

My first attempt was to implement this. It turned out complex because it
involves descriptors finalizing themselves _and_ descriptors finalizing
their predecessor. This required two new memory barrier pairs:

  - between a writer committing and re-checking the head_id that another
writer may have modified

  - between a writer setting the state and another writer checking that
state

After re-evaluating the purpose of the FINAL flag, I decided that it
would be simpler to implement the 2nd approach (below) and would not
require any new memory barrier pairs.

>+ Another approach is that FINAL bit could be set even when the
>  COMMIT is not set. It would always be set by the next
>  prb_reserve(). But it causes that there are more possible
>  combinations of COMMIT and FINAL bits. As a result, the caller
>  would  need try more variants of the cmpxchg() calls. And
>  it creates another races/cycles, ...

It does not cause more races. And I don't see where it will cause more
cmpxchg() calls. It probably _does_ lead to more cmpxchg() _code_. But
those are fallbacks for when the common case fails.

> I guess that you wanted to implement the 2nd approach and ended in
> many troubles. I wonder if the 1st approach might be easier.

Well, the "many troubles" were due to my naive assumption about the
previous descriptor state. Once I realized that, the missing piece was
obvious.

I will reconsider the first approach. Perhaps adding memory barriers is
preferable if it reduces lines of code.

And we will need to clarify partial continuous line reading because
right now that will not work.

John Ogness

[0] https://lkml.kernel.org/r/875z9nvvl2@jogness.linutronix.de

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-27 Thread John Ogness
On 2020-08-26, Petr Mladek  wrote:
>> This series makes a very naive assumption that the previous
>> descriptor is either in the reserved or committed queried states. The
>> fact is, it can be in any of the 4 queried states. Adding support for
>> finalization of all the states then gets quite complex, since any
>> state transition (cmpxchg) may have to deal with an unexpected FINAL
>> flag.
>
> It has to be done in two steps to avoid race:
>
> prb_commit()
>
>+ set PRB_COMMIT_MASK
>+ check if it is still the last descriptor in the array
>+ set PRB_FINAL_MASK when it is not the last descriptor
>
> It should work because prb_reserve() finalizes the previous
> descriptor after the new one is reserved. As a result:
>
>+ prb_reserve() should either see PRB_COMMIT_MASK in the previous
>  descriptor and be able to finalize it.
>
>+ or prb_commit() will see that the head moved and it is not
>  longer the last reserved one.

I do not like the idea of relying on descriptors to finalize
themselves. I worry that there might be some hole there. Failing to
finalize basically disables printk, so that is pretty serious.

Below is a patch against this series that adds support for finalizing
all 4 queried states. It passes all my tests. Note that the code handles
2 corner cases:

1. When seq is 0, there is no previous descriptor to finalize. This
   exception is important because we don't want to finalize the -1
   placeholder. Otherwise, upon the first wrap, a descriptor will be
   prematurely finalized.

2. When a previous descriptor is being reserved for the first time, it
   might have a state_var value of 0 because the writer is still in
   prb_reserve() and has not set the initial value yet. I added
   considerable comments on this special case.

I am comfortable with adding this new code, although it clearly adds
complexity.

John Ogness

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 90d48973ac9e..1ed1e9eb930f 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -860,9 +860,11 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
struct prb_desc_ring *desc_ring = >desc_ring;
unsigned long prev_state_val;
unsigned long id_prev_wrap;
+   unsigned long state_val;
struct prb_desc *desc;
unsigned long head_id;
unsigned long id;
+   bool is_final;
 
head_id = atomic_long_read(_ring->head_id); /* LMM(desc_reserve:A) 
*/
 
@@ -953,10 +955,17 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 * See "ABA Issues" about why this verification is performed.
 */
prev_state_val = atomic_long_read(>state_var); /* 
LMM(desc_reserve:E) */
-   if (prev_state_val &&
-   get_desc_state(id_prev_wrap, prev_state_val, NULL) != 
desc_reusable) {
-   WARN_ON_ONCE(1);
-   return false;
+   if (get_desc_state(id_prev_wrap, prev_state_val, _final) != 
desc_reusable) {
+   /*
+* If this descriptor has never been used, @prev_state_val
+* will be 0. However, even though it may have never been
+* used, it may have been finalized. So that flag must be
+* ignored.
+*/
+   if ((prev_state_val & ~DESC_FINAL_MASK)) {
+   WARN_ON_ONCE(1);
+   return false;
+   }
}
 
/*
@@ -967,10 +976,25 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 * any other changes. A write memory barrier is sufficient for this.
 * This pairs with desc_read:D.
 */
-   if (!atomic_long_try_cmpxchg(>state_var, _state_val,
-id | 0)) { /* LMM(desc_reserve:F) */
-   WARN_ON_ONCE(1);
-   return false;
+   if (is_final)
+   state_val = id | 0 | DESC_FINAL_MASK;
+   else
+   state_val = id | 0;
+   if (atomic_long_cmpxchg(>state_var, prev_state_val,
+   state_val) != prev_state_val) { /* 
LMM(desc_reserve:F) */
+   /*
+* This reusable descriptor must have been finalized already.
+* Retry with a reusable+final expected value.
+*/
+   prev_state_val |= DESC_FINAL_MASK;
+   state_val |= DESC_FINAL_MASK;
+
+   if (!atomic_long_try_cmpxchg(>state_var, _state_val,
+state_val)) { /* 
LMM(desc_reserve:FIXME) */
+
+   WARN_ON_ONCE(1);
+   return false;
+   }
}
 
/* Now data in @desc can be modified: LMM(desc_r

Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-26 Thread John Ogness
On 2020-08-26, Sergey Senozhatsky  wrote:
>>> @@ -1157,6 +1431,14 @@ bool prb_reserve(struct prb_reserved_entry *e, 
>>> struct printk_ringbuffer *rb,
>>> goto fail;
>>> }
>>>  
>>> +   /*
>>> +* New data is about to be reserved. Once that happens, previous
>>> +* descriptors are no longer able to be extended. Finalize the
>>> +* previous descriptor now so that it can be made available to
>>> +* readers (when committed).
>>> +*/
>>> +   desc_finalize(desc_ring, DESC_ID(id - 1));
>>> +
>>> d = to_desc(desc_ring, id);
>>>  
>>> /*
>> 
>> Apparently this is not enough to guarantee that past descriptors are
>> finalized. I am able to reproduce a scenario where the finalization
>> of a certain descriptor never happens. That leaves the descriptor
>> permanently in the reserved queried state, which prevents any new
>> records from being created. I am investigating.
>
> Good to know. I also run into problems:
> - broken dmesg (and broken journalctl -f /dev/kmsg poll) and broken
>   syslog read
>
> $ strace dmesg
>
> ...
> openat(AT_FDCWD, "/dev/kmsg", O_RDONLY|O_NONBLOCK) = 3
> lseek(3, 0, SEEK_DATA)  = 0
> read(3, 0x55dda8c240a8, 8191)   = -1 EAGAIN (Resource temporarily 
> unavailable)
> close(3)= 0
> syslog(10 /* SYSLOG_ACTION_SIZE_BUFFER */) = 524288
> mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
> 0x7f43ea847000
> syslog(3 /* SYSLOG_ACTION_READ_ALL */, "", 524296) = 0

Yes, this a consequence of the problem. The tail is in the reserved
queried state, so readers will not advance beyond it.

This series makes a very naive assumption that the previous descriptor
is either in the reserved or committed queried states. The fact is, it
can be in any of the 4 queried states. Adding support for finalization
of all the states then gets quite complex, since any state transition
(cmpxchg) may have to deal with an unexpected FINAL flag.

The ringbuffer was designed so that descriptors are completely
self-contained. So adding logic where an action on one descriptor should
affect another descriptor is far more complex than I initially expected.

Keep in mind the finalization concept satisfies 3 things:

- denote if a record can be extended (i.e. transition back to reserved)
- denote if a reader may read the record
- denote if a writer may recycle a record

I have not yet given up on the idea of finalization (particularly
because it allows mainline LOG_CONT behavior to be preserved locklessy),
but I am no longer sure if this is the direction we want to take.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-26 Thread John Ogness
On 2020-08-24, John Ogness  wrote:
> @@ -1157,6 +1431,14 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
> printk_ringbuffer *rb,
>   goto fail;
>   }
>  
> + /*
> +  * New data is about to be reserved. Once that happens, previous
> +  * descriptors are no longer able to be extended. Finalize the
> +  * previous descriptor now so that it can be made available to
> +  * readers (when committed).
> +  */
> + desc_finalize(desc_ring, DESC_ID(id - 1));
> +
>   d = to_desc(desc_ring, id);
>  
>   /*

Apparently this is not enough to guarantee that past descriptors are
finalized. I am able to reproduce a scenario where the finalization of a
certain descriptor never happens. That leaves the descriptor permanently
in the reserved queried state, which prevents any new records from being
created. I am investigating.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-24 Thread John Ogness
Add support for extending the last data block. For this, introduce a new
finalization state flag that identifies if a descriptor may be extended.

When a writer calls the commit function prb_commit(), the record may still
continue to be in the reserved queried state. In order for that record to
enter into the committed queried state, that record also must be finalized.
Finalization can occur anytime while the record is in the reserved queried
state, even before the writer has called prb_commit().

Until a record is finalized (represented by "DESC_FINAL_MASK"), a writer
may "reopen" that record and extend it with more data.

Note that existing descriptors are automatically finalized whenever new
descriptors are created. A record can never be "unfinalized".

Two new memory barrier pairs are introduced, but these are really just
alternate path barriers that exactly correspond to existing memory
barriers.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c|   4 +-
 kernel/printk/printk_ringbuffer.c | 386 +++---
 kernel/printk/printk_ringbuffer.h |   8 +-
 3 files changed, 364 insertions(+), 34 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ad8d1dfe5fbe..e063edd8adc2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -532,7 +532,7 @@ static int log_store(u32 caller_id, int facility, int level,
r.info->caller_id = caller_id;
 
/* insert message */
-   prb_commit();
+   prb_commit_finalize();
 
return (text_len + trunc_msg_len);
 }
@@ -1088,7 +1088,7 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
dest_r.info->ts_nsec = r->info->ts_nsec;
dest_r.info->caller_id = r->info->caller_id;
 
-   prb_commit();
+   prb_commit_finalize();
 
return prb_record_text_space();
 }
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index d66718e74aae..90d48973ac9e 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -49,14 +49,16 @@
  * Descriptors have three states:
  *
  *   reserved
- * A writer is modifying the record.
+ * A writer is modifying the record. Internally represented as either "0"
+ * or "DESC_FINAL_MASK" or "DESC_COMMIT_MASK".
  *
  *   committed
  * The record and all its data are complete and available for reading.
+ * Internally represented as "DESC_COMMIT_MASK | DESC_FINAL_MASK".
  *
  *   reusable
  * The record exists, but its text and/or dictionary data may no longer
- * be available.
+ * be available. Internally represented as "DESC_REUSE_MASK".
  *
  * Querying the @state_var of a record requires providing the ID of the
  * descriptor to query. This can yield a possible fourth (pseudo) state:
@@ -79,6 +81,20 @@
  * committed or reusable queried state. This makes it possible that a valid
  * sequence number of the tail is always available.
  *
+ * Descriptor Finalization
+ * ~~~
+ * When a writer calls the commit function prb_commit(), the record may still
+ * continue to be in the reserved queried state. In order for that record to
+ * enter into the committed queried state, that record also must be finalized.
+ * Finalization can occur anytime while the record is in the reserved queried
+ * state, even before the writer has called prb_commit().
+ *
+ * Until a record is finalized (represented by "DESC_FINAL_MASK"), a writer
+ * may "reopen" that record and extend it with more data.
+ *
+ * Note that existing descriptors are automatically finalized whenever new
+ * descriptors are created. A record can never be "unfinalized".
+ *
  * Data Rings
  * ~~
  * The two data rings (text and dictionary) function identically. They exist
@@ -153,9 +169,38 @@
  *
  * r.info->ts_nsec = local_clock();
  *
+ * prb_commit_finalize();
+ * }
+ *
+ * Note that additional writer functions are available to extend a record
+ * after it has been committed but not yet finalized. This can be done as
+ * long as no new records have been reserved and the caller is the same.
+ *
+ * Sample writer code (record extending)::
+ *
+ * // alternate rest of previous example
+ * r.info->ts_nsec = local_clock();
+ * r.info->text_len = strlen(textstr);
+ * r.info->caller_id = printk_caller_id();
+ *
+ * // commit the record (but do not finalize yet)
  * prb_commit();
  * }
  *
+ * ...
+ *
+ * // specify additional 5 bytes text space to extend
+ * prb_rec_init_wr(, 5, 0);
+ *
+ * if (prb_reserve_last(, _rb, , printk_caller_id())) {
+ * snprintf(_buf[r.info->text_len],
+ *  r.text_buf_size - r.info->text_len, "hello");
+ *
+ *  

[PATCH v2 2/7][next] printk: ringbuffer: change representation of reusable

2020-08-24 Thread John Ogness
The reusable queried state is represented by the combined flags:

DESC_COMMIT_MASK | DESC_REUSE_MASK

There is no reason for the DESC_COMMIT_MASK to be part of that
representation. In particular, this will add confusion when more
state flags are available.

Change the representation of the reusable queried state to just
the DESC_REUSE_MASK flag.

Signed-off-by: John Ogness 
---
 kernel/printk/printk_ringbuffer.c | 4 ++--
 kernel/printk/printk_ringbuffer.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 76248c82d557..d339ff7647da 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -463,7 +463,7 @@ static void desc_make_reusable(struct prb_desc_ring 
*desc_ring,
   unsigned long id)
 {
unsigned long val_committed = id | DESC_COMMIT_MASK;
-   unsigned long val_reusable = val_committed | DESC_REUSE_MASK;
+   unsigned long val_reusable = id | DESC_REUSE_MASK;
struct prb_desc *desc = to_desc(desc_ring, id);
atomic_long_t *state_var = >state_var;
 
@@ -899,7 +899,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 */
prev_state_val = atomic_long_read(>state_var); /* 
LMM(desc_reserve:E) */
if (prev_state_val &&
-   prev_state_val != (id_prev_wrap | DESC_COMMIT_MASK | 
DESC_REUSE_MASK)) {
+   get_desc_state(id_prev_wrap, prev_state_val) != desc_reusable) {
WARN_ON_ONCE(1);
return false;
}
diff --git a/kernel/printk/printk_ringbuffer.h 
b/kernel/printk/printk_ringbuffer.h
index dcda5e9b4676..96ef997d7bd6 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -213,7 +213,7 @@ struct prb_reserved_entry {
  */
 #define BLK0_LPOS(sz_bits) (-(_DATA_SIZE(sz_bits)))
 #define DESC0_ID(ct_bits)  DESC_ID(-(_DESCS_COUNT(ct_bits) + 1))
-#define DESC0_SV(ct_bits)  (DESC_COMMIT_MASK | DESC_REUSE_MASK | 
DESC0_ID(ct_bits))
+#define DESC0_SV(ct_bits)  (DESC_REUSE_MASK | DESC0_ID(ct_bits))
 
 /*
  * Define a ringbuffer with an external text data buffer. The same as
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 3/7][next] printk: ringbuffer: relocate get_data()

2020-08-24 Thread John Ogness
Move the internal get_data() function as-is above prb_reserve() so
that a later change can make use of the static function.

Signed-off-by: John Ogness 
---
 kernel/printk/printk_ringbuffer.c | 116 +++---
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index d339ff7647da..86af38c2cf77 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1038,6 +1038,64 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
DATA_SIZE(data_ring) - DATA_INDEX(data_ring, blk_lpos->begin));
 }
 
+/*
+ * Given @blk_lpos, return a pointer to the writer data from the data block
+ * and calculate the size of the data part. A NULL pointer is returned if
+ * @blk_lpos specifies values that could never be legal.
+ *
+ * This function (used by readers) performs strict validation on the lpos
+ * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
+ * triggered if an internal error is detected.
+ */
+static const char *get_data(struct prb_data_ring *data_ring,
+   struct prb_data_blk_lpos *blk_lpos,
+   unsigned int *data_size)
+{
+   struct prb_data_block *db;
+
+   /* Data-less data block description. */
+   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
+   *data_size = 0;
+   return "";
+   }
+   return NULL;
+   }
+
+   /* Regular data block: @begin less than @next and in same wrap. */
+   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
+   blk_lpos->begin < blk_lpos->next) {
+   db = to_block(data_ring, blk_lpos->begin);
+   *data_size = blk_lpos->next - blk_lpos->begin;
+
+   /* Wrapping data block: @begin is one wrap behind @next. */
+   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
+  DATA_WRAPS(data_ring, blk_lpos->next)) {
+   db = to_block(data_ring, 0);
+   *data_size = DATA_INDEX(data_ring, blk_lpos->next);
+
+   /* Illegal block description. */
+   } else {
+   WARN_ON_ONCE(1);
+   return NULL;
+   }
+
+   /* A valid data block will always be aligned to the ID size. */
+   if (WARN_ON_ONCE(blk_lpos->begin != ALIGN(blk_lpos->begin, 
sizeof(db->id))) ||
+   WARN_ON_ONCE(blk_lpos->next != ALIGN(blk_lpos->next, 
sizeof(db->id {
+   return NULL;
+   }
+
+   /* A valid data block will always have at least an ID. */
+   if (WARN_ON_ONCE(*data_size < sizeof(db->id)))
+   return NULL;
+
+   /* Subtract block ID space from size to reflect data size. */
+   *data_size -= sizeof(db->id);
+
+   return >data[0];
+}
+
 /**
  * prb_reserve() - Reserve space in the ringbuffer.
  *
@@ -1192,64 +1250,6 @@ void prb_commit(struct prb_reserved_entry *e)
local_irq_restore(e->irqflags);
 }
 
-/*
- * Given @blk_lpos, return a pointer to the writer data from the data block
- * and calculate the size of the data part. A NULL pointer is returned if
- * @blk_lpos specifies values that could never be legal.
- *
- * This function (used by readers) performs strict validation on the lpos
- * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
- * triggered if an internal error is detected.
- */
-static const char *get_data(struct prb_data_ring *data_ring,
-   struct prb_data_blk_lpos *blk_lpos,
-   unsigned int *data_size)
-{
-   struct prb_data_block *db;
-
-   /* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
-   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
-   *data_size = 0;
-   return "";
-   }
-   return NULL;
-   }
-
-   /* Regular data block: @begin less than @next and in same wrap. */
-   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
-   blk_lpos->begin < blk_lpos->next) {
-   db = to_block(data_ring, blk_lpos->begin);
-   *data_size = blk_lpos->next - blk_lpos->begin;
-
-   /* Wrapping data block: @begin is one wrap behind @next. */
-   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
-  DATA_WRAPS(data_ring, blk_lpos->next)) {
-   db = to_block(data_ring, 0);
-   *data_size = 

[PATCH v2 7/7][next] scripts/gdb: support printk finalized records

2020-08-24 Thread John Ogness
With commit ("printk: ringbuffer: add finalization/extension support")
a new state bit for finalized records was added. This not only changed
the bit representation of committed records, but also reduced the size
for record IDs.

Update the gdb scripts to correctly interpret the state variable.

Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt | 10 +++---
 scripts/gdb/linux/dmesg.py| 10 ++
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 6025534c6c14..1ccc811c82ad 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -295,8 +295,11 @@ document dump_record
 end
 
 define dmesg
-   set var $desc_committed = 1UL << ((sizeof(long) * 8) - 1)
-   set var $flags_mask = 3UL << ((sizeof(long) * 8) - 2)
+   # definitions from kernel/printk/printk_ringbuffer.h
+   set var $desc_commit = 1UL << ((sizeof(long) * 8) - 1)
+   set var $desc_final = 1UL << ((sizeof(long) * 8) - 2)
+   set var $desc_reuse = 1UL << ((sizeof(long) * 8) - 3)
+   set var $flags_mask = $desc_commit | $desc_final | $desc_reuse
set var $id_mask = ~$flags_mask
 
set var $desc_count = 1U << prb->desc_ring.count_bits
@@ -309,7 +312,8 @@ define dmesg
set var $desc = >desc_ring.descs[$id % $desc_count]
 
# skip non-committed record
-   if (($desc->state_var.counter & $flags_mask) == $desc_committed)
+   # (note that commit+!final records will be displayed)
+   if (($desc->state_var.counter & $desc_commit) == $desc_commit)
dump_record $desc $prev_flags
set var $prev_flags = $desc->info.flags
end
diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
index 6c6022012ea8..367523c5c270 100644
--- a/scripts/gdb/linux/dmesg.py
+++ b/scripts/gdb/linux/dmesg.py
@@ -79,9 +79,10 @@ class LxDmesg(gdb.Command):
 
 # definitions from kernel/printk/printk_ringbuffer.h
 desc_sv_bits = utils.get_long_type().sizeof * 8
-desc_committed_mask = 1 << (desc_sv_bits - 1)
-desc_reuse_mask = 1 << (desc_sv_bits - 2)
-desc_flags_mask = desc_committed_mask | desc_reuse_mask
+desc_commit_mask = 1 << (desc_sv_bits - 1)
+desc_final_mask = 1 << (desc_sv_bits - 2)
+desc_reuse_mask = 1 << (desc_sv_bits - 3)
+desc_flags_mask = desc_commit_mask | desc_final_mask | desc_reuse_mask
 desc_id_mask = ~desc_flags_mask
 
 # read in tail and head descriptor ids
@@ -96,8 +97,9 @@ class LxDmesg(gdb.Command):
 desc_off = desc_sz * ind
 
 # skip non-committed record
+# (note that commit+!final records will be displayed)
 state = utils.read_u64(descs, desc_off + sv_off + counter_off) & 
desc_flags_mask
-if state != desc_committed_mask:
+if state & desc_commit_mask != desc_commit_mask:
 if did == head_id:
 break
 did = (did + 1) & desc_id_mask
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 6/7][next] printk: reimplement log_cont using record extension

2020-08-24 Thread John Ogness
Use the record extending feature of the ringbuffer to implement
continuous messages. This preserves the existing continuous message
behavior.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 96 +-
 1 file changed, 19 insertions(+), 77 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e063edd8adc2..80afee3cfec7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -532,7 +532,10 @@ static int log_store(u32 caller_id, int facility, int 
level,
r.info->caller_id = caller_id;
 
/* insert message */
-   prb_commit_finalize();
+   if ((flags & LOG_CONT) || !(flags & LOG_NEWLINE))
+   prb_commit();
+   else
+   prb_commit_finalize();
 
return (text_len + trunc_msg_len);
 }
@@ -1888,87 +1891,26 @@ static inline u32 printk_caller_id(void)
0x8000 + raw_smp_processor_id();
 }
 
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-   char buf[LOG_LINE_MAX];
-   size_t len; /* length == 0 means unused buffer */
-   u32 caller_id;  /* printk_caller_id() of first print */
-   u64 ts_nsec;/* time of first print */
-   u8 level;   /* log level of first message */
-   u8 facility;/* log facility of first message */
-   enum log_flags flags;   /* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-   if (cont.len == 0)
-   return;
-
-   log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-   cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-enum log_flags flags, const char *text, size_t len)
-{
-   /* If the line gets too long, split it up in separate records. */
-   if (cont.len + len > sizeof(cont.buf)) {
-   cont_flush();
-   return false;
-   }
-
-   if (!cont.len) {
-   cont.facility = facility;
-   cont.level = level;
-   cont.caller_id = caller_id;
-   cont.ts_nsec = local_clock();
-   cont.flags = flags;
-   }
-
-   memcpy(cont.buf + cont.len, text, len);
-   cont.len += len;
-
-   // The original flags come from the first line,
-   // but later continuations can add a newline.
-   if (flags & LOG_NEWLINE) {
-   cont.flags |= LOG_NEWLINE;
-   cont_flush();
-   }
-
-   return true;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const 
char *dict, size_t dictlen, char *text, size_t text_len)
 {
const u32 caller_id = printk_caller_id();
 
-   /*
-* If an earlier line was buffered, and we're a continuation
-* write from the same context, try to add it to the buffer.
-*/
-   if (cont.len) {
-   if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
-   return text_len;
-   }
-   /* Otherwise, make sure it's flushed */
-   cont_flush();
-   }
-
-   /* Skip empty continuation lines that couldn't be added - they just 
flush */
-   if (!text_len && (lflags & LOG_CONT))
-   return 0;
-
-   /* If it doesn't end in a newline, try to buffer the current line */
-   if (!(lflags & LOG_NEWLINE)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
+   if (lflags & LOG_CONT) {
+   struct prb_reserved_entry e;
+   struct printk_record r;
+
+   prb_rec_init_wr(, text_len, 0);
+   if (prb_reserve_last(, prb, , caller_id)) {
+   memcpy(_buf[r.info->text_len], text, text_len);
+   r.info->text_len += text_len;
+   if (lflags & LOG_NEWLINE) {
+   r.info->flags |= LOG_NEWLINE;
+   prb_commit_finalize();
+   } else {
+   prb_commit();
+   }
return text_len;
+   }
}
 
/* Store it in the record log */
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 0/7][next] printk: reimplement LOG_CONT handling

2020-08-24 Thread John Ogness
Hello,

Here is v2 for the second series to rework the printk subsystem.
(The v1 is here [0].) This series implements a new ringbuffer
feature that allows the last record to be extended. Petr Mladek
provided the initial proof of concept [1] for this.

Using the record extension feature, LOG_CONT is re-implemented
in a way that exactly preserves its behavior, but avoids the
need for an extra buffer. In particular, it avoids the need for
any synchronization that such a buffer requires.

This series deviates from the agreements [2] made at the meeting
during LPC2019 in Lisbon. The test results of the v1 series
showed that the effects on existing userspace tools using
/dev/kmsg (journalctl, dmesg) were not acceptable [3]. That is
why a new decision [4] was made to preserve the current LOG_CONT
behavior.

Patch 5 introduces two new memory barriers. However, both are
alternate path memory barriers. They exactly match the purpose
and context of the two existing memory barriers that they
provide an alternate path for. For this reason, I do not
believe that a new memory barrier review is necessary.
Nevertheless, I have included the memory barrier experts CC.

Patch 6 assumes that the gdb script series [5] for the new
printk ringbuffer has been applied.

John Ogness

[0] https://lkml.kernel.org/r/20200717234818.8622-1-john.ogn...@linutronix.de
[1] https://lkml.kernel.org/r/20200812163908.GH12903@alley
[2] https://lkml.kernel.org/r/87k1acz5rx@linutronix.de
[3] https://lkml.kernel.org/r/20200811160551.GC12903@alley
[4] 
https://lkml.kernel.org/r/CAHk-=wj_b6Bh=d-wwh0xyqoqbhhkyeexhszkpxdra6gjtvk...@mail.gmail.com
[5] https://lkml.kernel.org/r/20200814212525.6118-1-john.ogn...@linutronix.de

John Ogness (7):
  printk: ringbuffer: rename DESC_COMMITTED_MASK flag
  printk: ringbuffer: change representation of reusable
  printk: ringbuffer: relocate get_data()
  printk: ringbuffer: add BLK_DATALESS() macro
  printk: ringbuffer: add finalization/extension support
  printk: reimplement log_cont using record extension
  scripts/gdb: support printk finalized records

 Documentation/admin-guide/kdump/gdbmacros.txt |  10 +-
 kernel/printk/printk.c|  98 +---
 kernel/printk/printk_ringbuffer.c | 496 +++---
 kernel/printk/printk_ringbuffer.h |  12 +-
 scripts/gdb/linux/dmesg.py|  10 +-
 5 files changed, 453 insertions(+), 173 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 1/7][next] printk: ringbuffer: rename DESC_COMMITTED_MASK flag

2020-08-24 Thread John Ogness
The flag DESC_COMMITTED_MASK has a much longer name compared to the
other state flags and also is in past tense form, rather than in
command form. Rename the flag to DESC_COMMIT_MASK in order to match
the other state flags.

Signed-off-by: John Ogness 
---
 kernel/printk/printk_ringbuffer.c | 8 
 kernel/printk/printk_ringbuffer.h | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 0659b50872b5..76248c82d557 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -361,7 +361,7 @@ static enum desc_state get_desc_state(unsigned long id,
if (state_val & DESC_REUSE_MASK)
return desc_reusable;
 
-   if (state_val & DESC_COMMITTED_MASK)
+   if (state_val & DESC_COMMIT_MASK)
return desc_committed;
 
return desc_reserved;
@@ -462,7 +462,7 @@ static enum desc_state desc_read(struct prb_desc_ring 
*desc_ring,
 static void desc_make_reusable(struct prb_desc_ring *desc_ring,
   unsigned long id)
 {
-   unsigned long val_committed = id | DESC_COMMITTED_MASK;
+   unsigned long val_committed = id | DESC_COMMIT_MASK;
unsigned long val_reusable = val_committed | DESC_REUSE_MASK;
struct prb_desc *desc = to_desc(desc_ring, id);
atomic_long_t *state_var = >state_var;
@@ -899,7 +899,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 */
prev_state_val = atomic_long_read(>state_var); /* 
LMM(desc_reserve:E) */
if (prev_state_val &&
-   prev_state_val != (id_prev_wrap | DESC_COMMITTED_MASK | 
DESC_REUSE_MASK)) {
+   prev_state_val != (id_prev_wrap | DESC_COMMIT_MASK | 
DESC_REUSE_MASK)) {
WARN_ON_ONCE(1);
return false;
}
@@ -1184,7 +1184,7 @@ void prb_commit(struct prb_reserved_entry *e)
 * this. This pairs with desc_read:B.
 */
if (!atomic_long_try_cmpxchg(>state_var, _state_val,
-e->id | DESC_COMMITTED_MASK)) { /* 
LMM(prb_commit:B) */
+e->id | DESC_COMMIT_MASK)) { /* 
LMM(prb_commit:B) */
WARN_ON_ONCE(1);
}
 
diff --git a/kernel/printk/printk_ringbuffer.h 
b/kernel/printk/printk_ringbuffer.h
index e6302da041f9..dcda5e9b4676 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -115,9 +115,9 @@ struct prb_reserved_entry {
 #define _DATA_SIZE(sz_bits)(1UL << (sz_bits))
 #define _DESCS_COUNT(ct_bits)  (1U << (ct_bits))
 #define DESC_SV_BITS   (sizeof(unsigned long) * 8)
-#define DESC_COMMITTED_MASK(1UL << (DESC_SV_BITS - 1))
+#define DESC_COMMIT_MASK   (1UL << (DESC_SV_BITS - 1))
 #define DESC_REUSE_MASK(1UL << (DESC_SV_BITS - 2))
-#define DESC_FLAGS_MASK(DESC_COMMITTED_MASK | 
DESC_REUSE_MASK)
+#define DESC_FLAGS_MASK(DESC_COMMIT_MASK | 
DESC_REUSE_MASK)
 #define DESC_ID_MASK   (~DESC_FLAGS_MASK)
 #define DESC_ID(sv)((sv) & DESC_ID_MASK)
 #define FAILED_LPOS0x1
@@ -213,7 +213,7 @@ struct prb_reserved_entry {
  */
 #define BLK0_LPOS(sz_bits) (-(_DATA_SIZE(sz_bits)))
 #define DESC0_ID(ct_bits)  DESC_ID(-(_DESCS_COUNT(ct_bits) + 1))
-#define DESC0_SV(ct_bits)  (DESC_COMMITTED_MASK | DESC_REUSE_MASK | 
DESC0_ID(ct_bits))
+#define DESC0_SV(ct_bits)  (DESC_COMMIT_MASK | DESC_REUSE_MASK | 
DESC0_ID(ct_bits))
 
 /*
  * Define a ringbuffer with an external text data buffer. The same as
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 4/7][next] printk: ringbuffer: add BLK_DATALESS() macro

2020-08-24 Thread John Ogness
Rather than continually needing to explicitly check @begin and @next
to identify a dataless block, introduce and use a BLK_DATALESS()
macro.

Signed-off-by: John Ogness 
---
 kernel/printk/printk_ringbuffer.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 86af38c2cf77..d66718e74aae 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -266,6 +266,8 @@
 
 /* Determine if a logical position refers to a data-less block. */
 #define LPOS_DATALESS(lpos)((lpos) & 1UL)
+#define BLK_DATALESS(blk)  (LPOS_DATALESS((blk)->begin) && \
+LPOS_DATALESS((blk)->next))
 
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
@@ -1021,7 +1023,7 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
/* Data-less blocks take no space. */
-   if (LPOS_DATALESS(blk_lpos->begin))
+   if (BLK_DATALESS(blk_lpos))
return 0;
 
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
@@ -1054,7 +1056,7 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
struct prb_data_block *db;
 
/* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (BLK_DATALESS(blk_lpos)) {
if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
*data_size = 0;
return "";
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH][next] docs: vmcoreinfo: add lockless printk ringbuffer vmcoreinfo

2020-08-14 Thread John Ogness
With the introduction of the lockless printk ringbuffer, the
VMCOREINFO relating to the kernel log buffer was changed. Update the
documentation to match those changes.

Fixes: ("printk: use the lockless ringbuffer")
Signed-off-by: John Ogness 
Reported-by: Nick Desaulniers 
---
 based on next-20200814

 .../admin-guide/kdump/vmcoreinfo.rst  | 131 ++
 1 file changed, 102 insertions(+), 29 deletions(-)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst 
b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 2baad0bfb09d..eb116905c31c 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -189,50 +189,123 @@ from this.
 Free areas descriptor. User-space tools use this value to iterate the
 free_area ranges. MAX_ORDER is used by the zone buddy allocator.
 
-log_first_idx
+prb
+---
+
+A pointer to the printk ringbuffer (struct printk_ringbuffer). This
+may be pointing to the static boot ringbuffer or the dynamically
+allocated ringbuffer, depending on when the the core dump occurred.
+Used by user-space tools to read the active kernel log buffer.
+
+printk_rb_static
+
+
+A pointer to the static boot printk ringbuffer. If @prb has a
+different value, this is useful for viewing the initial boot messages,
+which may have been overwritten in the dynamically allocated
+ringbuffer.
+
+clear_seq
+-
+
+The sequence number of the printk() record after the last clear
+command. It indicates the first record after the last
+SYSLOG_ACTION_CLEAR, like issued by 'dmesg -c'. Used by user-space
+tools to dump a subset of the dmesg log.
+
+printk_ringbuffer
+-
+
+The size of a printk_ringbuffer structure. This structure contains all
+information required for accessing the various components of the
+kernel log buffer.
+
+(printk_ringbuffer, desc_ring|text_data_ring|dict_data_ring|fail)
+-
+
+Offsets for the various components of the printk ringbuffer. Used by
+user-space tools to view the kernel log buffer without requiring the
+declaration of the structure.
+
+prb_desc_ring
 -
 
-Index of the first record stored in the buffer log_buf. Used by
-user-space tools to read the strings in the log_buf.
+The size of the prb_desc_ring structure. This structure contains
+information about the set of record descriptors.
 
-log_buf

+(prb_desc_ring, count_bits|descs|head_id|tail_id)
+-
+
+Offsets for the fields describing the set of record descriptors. Used
+by user-space tools to be able to traverse the descriptors without
+requiring the declaration of the structure.
+
+prb_desc
+
+
+The size of the prb_desc structure. This structure contains
+information about a single record descriptor.
+
+(prb_desc, info|state_var|text_blk_lpos|dict_blk_lpos)
+--
+
+Offsets for the fields describing a record descriptors. Used by
+user-space tools to be able to read descriptors without requiring
+the declaration of the structure.
+
+prb_data_blk_lpos
+-
+
+The size of the prb_data_blk_lpos structure. This structure contains
+information about where the text or dictionary data (data block) is
+located within the respective data ring.
+
+(prb_data_blk_lpos, begin|next)
+---
 
-Console output is written to the ring buffer log_buf at index
-log_first_idx. Used to get the kernel log.
+Offsets for the fields describing the location of a data block. Used
+by user-space tools to be able to locate data blocks without
+requiring the declaration of the structure.
 
-log_buf_len
+printk_info
 ---
 
-log_buf's length.
+The size of the printk_info structure. This structure contains all
+the meta-data for a record.
 
-clear_idx
--
+(printk_info, seq|ts_nsec|text_len|dict_len|caller_id)
+--
 
-The index that the next printk() record to read after the last clear
-command. It indicates the first record after the last SYSLOG_ACTION
-_CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump
-the dmesg log.
+Offsets for the fields providing the meta-data for a record. Used by
+user-space tools to be able to read the information without requiring
+the declaration of the structure.
 
-log_next_idx
-
+prb_data_ring
+-
 
-The index of the next record to store in the buffer log_buf. Used to
-compute the index of the current buffer position.
+The size of the prb_data_ring structure. This structure contains
+information about a set of data blocks.
 
-printk_log
---
+(prb_data_ring, size_bits|data|head_lpos|tail_lpos)
+---
 
-The size of a structure printk_log. Used to compute the size of
-messages, and extract dmesg log. It encapsulates header i

Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-08-14 Thread John Ogness
On 2020-08-14, Sergey Senozhatsky  wrote:
> One thing that we need to handle here, I believe, is that the context
> which crashes the kernel should flush its cont buffer, because the
> information there is relevant to the crash:
>
>   pr_cont_alloc_info();
>   pr_cont(, "1");
>   pr_cont(, "2");
>   >>
>  oops
> panic()
>   <<
>   pr_cont_flush();
>
> We better flush that context's pr_cont buffer during panic().

I am not convinced of the general usefulness of partial messages, but as
long as we have an API that includes registration, usage, and
deregistration of some sort of handle, then we leave the window open for
such implementations.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-08-13 Thread John Ogness
On 2020-08-13, Petr Mladek  wrote:
> On Thu 2020-08-13 09:50:25, John Ogness wrote:
>> On 2020-08-13, Sergey Senozhatsky  wrote:
>> > This is not an unseen pattern, I'm afraid. And the problem here can
>> > be more general:
>> >
>> >pr_info("text");
>> >pr_cont("1");
>> >exception/IRQ/NMI
>> >pr_alert("text\n");
>> >pr_cont("2");
>> >pr_cont("\n");
>> >
>> > I guess the solution would be to store "last log_level" in task_struct
>> > and get current (new) timestamp for broken cont line?
>> 
>> (Warning: new ideas ahead)
>> 
>> The fundamental problem is that there is no real association between
>> the cont parts. So any interruption results in a broken record. If we
>> really want to do this correctly, we need real association.

I believe I failed to recognize the fundamental problem. The fundamental
problem is that the pr_cont() semantics are very poor. I now strongly
believe that we need to fix those semantics by having the pr_cont() user
take responsibility for buffering the message. Patching the ~2000
pr_cont() users will be far easier than continuing to twist ourselves
around this madness.

Here is an example for a new pr_cont() API:

struct pr_cont c;

pr_cont_alloc_info();
   (or alternatively)
dev_cont_alloc_info(dev, );

pr_cont(, "1");
pr_cont(, "2");

pr_cont_flush();

Using macro magic, there can be the usual dbg, warn, err, etc. variants
of the alloc functions.

The alloc function would need to work for any context, but that would
not be an issue. If the cont message started to get too large, pr_cont()
could do its own flushing in between, while still holding on to the
context information. If for some reason the alloc function could not
allocate a buffer, all the pr_cont() calls could fallback to logging the
individual cont parts.

I believe this would solve all cont-related problems while also allowing
the new ringbuffer to remain as it already is in linux-next.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-08-13 Thread John Ogness
On 2020-08-13, Sergey Senozhatsky  wrote:
> This is not an unseen pattern, I'm afraid. And the problem here can
> be more general:
>
>   pr_info("text");
>   pr_cont("1");
>   exception/IRQ/NMI
>   pr_alert("text\n");
>   pr_cont("2");
>   pr_cont("\n");
>
> I guess the solution would be to store "last log_level" in task_struct
> and get current (new) timestamp for broken cont line?

(Warning: new ideas ahead)

The fundamental problem is that there is no real association between
the cont parts. So any interruption results in a broken record. If we
really want to do this correctly, we need real association.

With the new finalize flag for records, I thought about perhaps adding
support for chaining data blocks.

A data block currently stores an unsigned long for the ID of the
associated descriptor. But it could optionally include a second unsigned
long, which is the lpos of the next text part. All the data blocks of a
chain would point back to the same descriptor. The descriptor would only
point to the first data block of the chain and include a flag that it is
using chained data blocks.

Then we would only need to track the sequence number of the open record
and new data blocks could be added to the data block chain of the
correct record. Readers cannot see the record until it is finalized.

Also, since only finalized records can be invalidated, there are no
races of chains becoming invalidated while being appended.

My concerns about this idea:

- What if the printk user does not correctly terminate the cont message?
  There is no mechanism to allow that open record to be force-finalized
  so that readers can read newer records.

- For tasks, the sequence number of the open record could be stored on
  the task_struct. For non-tasks, we could use a global per-cpu variable
  where each CPU stores 2 sequence numbers: the sequence number of the
  open record for the non-task and the sequence number of the open
  record for an interrupting NMI. Is that sufficient?

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-08-12 Thread John Ogness
On 2020-08-12, Petr Mladek  wrote:
> So, I have one crazy idea to add one more state bit so that we
> could have:
>
>   + committed: set when the data are written into the data ring.
>   + final: set when the data block could not longer get reopened
>   + reuse: set when the desctiptor/data block could get reused
>
> "final" bit will define when the descriptor could not longer
> get reopened (cleared committed bit) and the data block could
> not get extended.

I had not thought of extending data blocks. That is clever!

I implemented this solution for myself and am currently running more
tests. Some things that I changed from your suggestion:

1. I created a separate prb_reserve_cont() function. The reason for this
is because the caller needs to understand what is happening. The caller
is getting an existing record with existing data and must append new
data. The @text_len field of the info reports how long the existing data
is. So the LOG_CONT handling code in printk.c looks something like this:

if (lflags & LOG_CONT) {
struct prb_reserved_entry e;
struct printk_record r;

prb_rec_init_wr(, text_len, 0);

if (prb_reserve_cont(, prb, , caller_id)) {
memcpy(_buf[r.info->text_len], text, text_len);
r.info->text_len += text_len;

if (lflags & LOG_NEWLINE)
r.info->flags |= LOG_NEWLINE;

if (r.info->flags & LOG_NEWLINE)
prb_commit_finalize();
else
prb_commit();

return text_len;
}
}

This seemed simpler than trying to extend prb_reserve() to secretly
support LOG_CONT records.

2. I haven't yet figured out how to preserve calling context when a
newline appears. For example:

pr_info("text");
pr_cont(" 1");
pr_cont(" 2\n");
pr_cont("3");
pr_cont(" 4\n");

For "3" the calling context (info, timestamp) is lost because with "2"
the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

3. There are some memory barriers introduced, but it looks like it
shouldn't add too much complexity.

I will continue to refine my working version and post a patch so that we
have something to work with. This looks to be the most promising way
forward. Thanks.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-08-12 Thread John Ogness
On 2020-08-11, Nick Desaulniers  wrote:
> From what I can tell, I think this patch ("printk: use the lockless
> ringbuffer") breaks lx-dmesg in CONFIG_GDB_SCRIPTS.
>
> (gdb) lx-dmesg
> Python Exception  No symbol "log_first_idx" in specified 
> context.: 
> Error occurred in Python: No symbol "log_first_idx" in specified context.
>
> This command is used to dump the printk log buffer.
>
> It looks like the only places left in the kernel that reference are:
>
> - Documentation/admin-guide/kdump/gdbmacros.txt
> - Documentation/admin-guide/kdump/vmcoreinfo.rst
> - scripts/gdb/linux/dmesg.py
>
> I believe this commit removed log_first_idx, so all of the above
> probably need to be fixed up, too.

Thanks for pointing this out! I will get to work on a patch for this.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/4] printk: store instead of processing cont parts

2020-07-21 Thread John Ogness
On 2020-07-21, Sergey Senozhatsky  wrote:
>> That said, we have traditionally used not just "current process", but
>> also "last irq-level" as the context information, so I do think it
>> would be good to continue to do that.
>
> OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> we will have more info than just "0x8000 + raw_smp_processor_id()".

If bit31 is set, the upper 8 bits could specify what the lower 24 bits
represent. That would give some freedom for the future.

For example:

0x80 = cpu id (generic context)
0x81 = interrupt number
0x82 = cpu id (nmi context)

Or maybe ascii should be used instead?

0x80 | '\0' = cpu id (generic context)
0x80 | 'i'  = interrupt number
0x80 | 'n'  = cpu id (nmi context)

Just an idea.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2][next] printk: ringbuffer: support dataless records

2020-07-21 Thread John Ogness
With commit ("printk: use the lockless ringbuffer"), printk()
started silently dropping messages without text because such
records are not supported by the new printk ringbuffer.

Add support for such records.

Currently dataless records are denoted by INVALID_LPOS in order
to recognize failed prb_reserve() calls. Change the ringbuffer
to instead use two different identifiers (FAILED_LPOS and
NO_LPOS) to distinguish between failed prb_reserve() records and
successful dataless records, respectively.

Fixes: ("printk: use the lockless ringbuffer")
Fixes: https://lkml.kernel.org/r/20200718121053.ga691...@elver.google.com
Reported-by: Marco Elver 
Signed-off-by: John Ogness 
---
 based on next-20200721

 chages since v1:
 - Instead of handling empty text messages as special case errors,
   allow such messages to be handled as any other valid messages.
   This also allows the empty text message to be counted as a line.

 kernel/printk/printk_ringbuffer.c | 72 +++
 kernel/printk/printk_ringbuffer.h | 15 ---
 2 files changed, 43 insertions(+), 44 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 7355ca99e852..0659b50872b5 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -264,6 +264,9 @@
 /* Determine how many times the data array has wrapped. */
 #define DATA_WRAPS(data_ring, lpos)((lpos) >> (data_ring)->size_bits)
 
+/* Determine if a logical position refers to a data-less block. */
+#define LPOS_DATALESS(lpos)((lpos) & 1UL)
+
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
 ((lpos) & ~DATA_SIZE_MASK(data_ring))
@@ -320,21 +323,13 @@ static unsigned int to_blk_size(unsigned int size)
  * block does not exceed the maximum possible size that could fit within the
  * ringbuffer. This function provides that basic size check so that the
  * assumption is safe.
- *
- * Writers are also not allowed to write 0-sized (data-less) records. Such
- * records are used only internally by the ringbuffer.
  */
 static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
 {
struct prb_data_block *db = NULL;
 
-   /*
-* Writers are not allowed to write data-less records. Such records
-* are used only internally by the ringbuffer to denote records where
-* their data failed to allocate or have been lost.
-*/
if (size == 0)
-   return false;
+   return true;
 
/*
 * Ensure the alignment padded size could possibly fit in the data
@@ -568,8 +563,8 @@ static bool data_push_tail(struct printk_ringbuffer *rb,
unsigned long tail_lpos;
unsigned long next_lpos;
 
-   /* If @lpos is not valid, there is nothing to do. */
-   if (lpos == INVALID_LPOS)
+   /* If @lpos is from a data-less block, there is nothing to do. */
+   if (LPOS_DATALESS(lpos))
return true;
 
/*
@@ -962,8 +957,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
if (size == 0) {
/* Specify a data-less block. */
-   blk_lpos->begin = INVALID_LPOS;
-   blk_lpos->next = INVALID_LPOS;
+   blk_lpos->begin = NO_LPOS;
+   blk_lpos->next = NO_LPOS;
return NULL;
}
 
@@ -976,8 +971,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
if (!data_push_tail(rb, data_ring, next_lpos - 
DATA_SIZE(data_ring))) {
/* Failed to allocate, specify a data-less block. */
-   blk_lpos->begin = INVALID_LPOS;
-   blk_lpos->next = INVALID_LPOS;
+   blk_lpos->begin = FAILED_LPOS;
+   blk_lpos->next = FAILED_LPOS;
return NULL;
}
 
@@ -1025,6 +1020,10 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 static unsigned int space_used(struct prb_data_ring *data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
+   /* Data-less blocks take no space. */
+   if (LPOS_DATALESS(blk_lpos->begin))
+   return 0;
+
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
/* Data block does not wrap. */
return (DATA_INDEX(data_ring, blk_lpos->next) -
@@ -1080,11 +1079,8 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
if (!data_check_size(>text_data_ring, r->text_buf_size))
goto fail;
 
-   /* Records are allowed to not have dictionaries. */
-   if (r->dict_buf_size) {
-   if (!data_check_size(>dict_data_ring, r->dict_buf_size))
-   goto fail;
-   }
+ 

Re: [PATCH][next] printk: ringbuffer: support dataless records

2020-07-21 Thread John Ogness
On 2020-07-21, Sergey Senozhatsky  wrote:
>> @@ -1402,7 +1396,9 @@ static int prb_read(struct printk_ringbuffer *rb, u64 
>> seq,
>>  /* Copy text data. If it fails, this is a data-less record. */
>>  if (!copy_data(>text_data_ring, _blk_lpos, 
>> desc.info.text_len,
>> r->text_buf, r->text_buf_size, line_count)) {
>> -return -ENOENT;
>> +/* Report an error if there should have been data. */
>> +if (desc.info.text_len != 0)
>> +return -ENOENT;
>>  }
>
> If this is a dataless record then should copy_data() return error?

You are correct. That makes more sense. I will send a v2.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/4] printk: ringbuffer: support dataless records

2020-07-20 Thread John Ogness
On 2020-07-18, John Ogness  wrote:
> In order to support storage of continuous lines, dataless records must
> be allowed. For example, these are generated with the legal calls:
>
> pr_info("");
> pr_cont("\n");
>
> Currently dataless records are denoted by INVALID_LPOS in order to
> recognize failed prb_reserve() calls. Change the code to use two
> different identifiers (FAILED_LPOS and NO_LPOS) to distinguish
> between failed prb_reserve() records and successful dataless records.

This patch has been re-posted [0] as a regression fix for the first
series that is already in linux-next. Only the commit message has been
changed to reflect the regression fix rather than preparing for
continuous line support.

Assuming that patch is accepted, this one should be dropped.

John Ogness

[0] https://lkml.kernel.org/r/20200720140111.19935-1-john.ogn...@linutronix.de

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH][next] printk: ringbuffer: support dataless records

2020-07-20 Thread John Ogness
With commit ("printk: use the lockless ringbuffer"), printk()
started silently dropping messages without text because such
records are not supported by the new printk ringbuffer.

Add support for such records.

Currently dataless records are denoted by INVALID_LPOS in order
to recognize failed prb_reserve() calls. Change the ringbuffer
to instead use two different identifiers (FAILED_LPOS and
NO_LPOS) to distinguish between failed prb_reserve() records and
successful dataless records, respectively.

Fixes: ("printk: use the lockless ringbuffer")
Fixes: https://lkml.kernel.org/r/20200718121053.ga691...@elver.google.com
Signed-off-by: John Ogness 
---
 based on next-20200720

 kernel/printk/printk_ringbuffer.c | 58 ++-
 kernel/printk/printk_ringbuffer.h | 15 
 2 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 7355ca99e852..54b0a6324dbf 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -264,6 +264,9 @@
 /* Determine how many times the data array has wrapped. */
 #define DATA_WRAPS(data_ring, lpos)((lpos) >> (data_ring)->size_bits)
 
+/* Determine if a logical position refers to a data-less block. */
+#define LPOS_DATALESS(lpos)((lpos) & 1UL)
+
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
 ((lpos) & ~DATA_SIZE_MASK(data_ring))
@@ -320,21 +323,13 @@ static unsigned int to_blk_size(unsigned int size)
  * block does not exceed the maximum possible size that could fit within the
  * ringbuffer. This function provides that basic size check so that the
  * assumption is safe.
- *
- * Writers are also not allowed to write 0-sized (data-less) records. Such
- * records are used only internally by the ringbuffer.
  */
 static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
 {
struct prb_data_block *db = NULL;
 
-   /*
-* Writers are not allowed to write data-less records. Such records
-* are used only internally by the ringbuffer to denote records where
-* their data failed to allocate or have been lost.
-*/
if (size == 0)
-   return false;
+   return true;
 
/*
 * Ensure the alignment padded size could possibly fit in the data
@@ -568,8 +563,8 @@ static bool data_push_tail(struct printk_ringbuffer *rb,
unsigned long tail_lpos;
unsigned long next_lpos;
 
-   /* If @lpos is not valid, there is nothing to do. */
-   if (lpos == INVALID_LPOS)
+   /* If @lpos is from a data-less block, there is nothing to do. */
+   if (LPOS_DATALESS(lpos))
return true;
 
/*
@@ -962,8 +957,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
if (size == 0) {
/* Specify a data-less block. */
-   blk_lpos->begin = INVALID_LPOS;
-   blk_lpos->next = INVALID_LPOS;
+   blk_lpos->begin = NO_LPOS;
+   blk_lpos->next = NO_LPOS;
return NULL;
}
 
@@ -976,8 +971,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
if (!data_push_tail(rb, data_ring, next_lpos - 
DATA_SIZE(data_ring))) {
/* Failed to allocate, specify a data-less block. */
-   blk_lpos->begin = INVALID_LPOS;
-   blk_lpos->next = INVALID_LPOS;
+   blk_lpos->begin = FAILED_LPOS;
+   blk_lpos->next = FAILED_LPOS;
return NULL;
}
 
@@ -1025,6 +1020,10 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 static unsigned int space_used(struct prb_data_ring *data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
+   /* Data-less blocks take no space. */
+   if (LPOS_DATALESS(blk_lpos->begin))
+   return 0;
+
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
/* Data block does not wrap. */
return (DATA_INDEX(data_ring, blk_lpos->next) -
@@ -1080,11 +1079,8 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
if (!data_check_size(>text_data_ring, r->text_buf_size))
goto fail;
 
-   /* Records are allowed to not have dictionaries. */
-   if (r->dict_buf_size) {
-   if (!data_check_size(>dict_data_ring, r->dict_buf_size))
-   goto fail;
-   }
+   if (!data_check_size(>dict_data_ring, r->dict_buf_size))
+   goto fail;
 
/*
 * Descriptors in the reserved state act as blockers to all further
@@ -1212,10 +1208,8 @@ static char *get_data(struct prb_data_rin

Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread John Ogness
On 2020-07-18, Marco Elver  wrote:
> It seems this causes a regression observed at least with newline-only
> printks.
> [...]
> -- >8 --
>
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1039,6 +1039,10 @@ asmlinkage __visible void __init start_kernel(void)
>   sfi_init_late();
>   kcsan_init();
>  
> + pr_info("EXPECT BLANK LINE --vv\n");
> + pr_info("\n");
> + pr_info("EXPECT BLANK LINE --^^\n");
> +
>   /* Do the rest non-__init'ed, we're now alive */
>   arch_call_rest_init();

Thanks for the example. This is an unintentional regression in the
series. I will submit a patch to fix this.

Note that this regression does not exist when the followup series [0]
(reimplementing LOG_CONT) is applied. All the more reason that the 1st
series should be fixed before pushing the 2nd series to linux-next.

John Ogness

[0] https://lkml.kernel.org/r/20200717234818.8622-1-john.ogn...@linutronix.de

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-07-18 Thread John Ogness
On 2020-07-17, Linus Torvalds  wrote:
> Make sure you test the case of "fast concurrent readers". The last
> time we did things like this, it was a disaster, because a concurrent
> reader would see and return the _incomplete_ line, and the next entry
> was still being generated on another CPU.
>
> The reader would then decide to return that incomplete line, because
> it had something.
>
> And while in theory this could then be handled properly in user space,
> in practice it wasn't. So you'd see a lot of logging tools that would
> then report all those continuations as separate log events.
>
> Which is the whole point of LOG_CONT - for that *not* to happen.

I expect this is handled correctly since the reader is not given any
parts until a full line is ready, but I will put more focus on testing
this to make sure. Thanks for the regression and testing tips.

> So this is just a heads-up that I will not pull something that breaks
> LOG_CONT because it thinks "user space can handle it". No. User space
> does not handle it, and we need to handle it for the user.

Understood. Petr and Sergey are also strict about this. We are making a
serious effort to avoid breaking things for userspace.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/4] printk: ringbuffer: support dataless records

2020-07-17 Thread John Ogness
In order to support storage of continuous lines, dataless records must
be allowed. For example, these are generated with the legal calls:

pr_info("");
pr_cont("\n");

Currently dataless records are denoted by INVALID_LPOS in order to
recognize failed prb_reserve() calls. Change the code to use two
different identifiers (FAILED_LPOS and NO_LPOS) to distinguish
between failed prb_reserve() records and successful dataless records.

Signed-off-by: John Ogness 
---
 kernel/printk/printk_ringbuffer.c | 58 ++-
 kernel/printk/printk_ringbuffer.h | 15 
 2 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 7355ca99e852..54b0a6324dbf 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -264,6 +264,9 @@
 /* Determine how many times the data array has wrapped. */
 #define DATA_WRAPS(data_ring, lpos)((lpos) >> (data_ring)->size_bits)
 
+/* Determine if a logical position refers to a data-less block. */
+#define LPOS_DATALESS(lpos)((lpos) & 1UL)
+
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
 ((lpos) & ~DATA_SIZE_MASK(data_ring))
@@ -320,21 +323,13 @@ static unsigned int to_blk_size(unsigned int size)
  * block does not exceed the maximum possible size that could fit within the
  * ringbuffer. This function provides that basic size check so that the
  * assumption is safe.
- *
- * Writers are also not allowed to write 0-sized (data-less) records. Such
- * records are used only internally by the ringbuffer.
  */
 static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
 {
struct prb_data_block *db = NULL;
 
-   /*
-* Writers are not allowed to write data-less records. Such records
-* are used only internally by the ringbuffer to denote records where
-* their data failed to allocate or have been lost.
-*/
if (size == 0)
-   return false;
+   return true;
 
/*
 * Ensure the alignment padded size could possibly fit in the data
@@ -568,8 +563,8 @@ static bool data_push_tail(struct printk_ringbuffer *rb,
unsigned long tail_lpos;
unsigned long next_lpos;
 
-   /* If @lpos is not valid, there is nothing to do. */
-   if (lpos == INVALID_LPOS)
+   /* If @lpos is from a data-less block, there is nothing to do. */
+   if (LPOS_DATALESS(lpos))
return true;
 
/*
@@ -962,8 +957,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
if (size == 0) {
/* Specify a data-less block. */
-   blk_lpos->begin = INVALID_LPOS;
-   blk_lpos->next = INVALID_LPOS;
+   blk_lpos->begin = NO_LPOS;
+   blk_lpos->next = NO_LPOS;
return NULL;
}
 
@@ -976,8 +971,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
if (!data_push_tail(rb, data_ring, next_lpos - 
DATA_SIZE(data_ring))) {
/* Failed to allocate, specify a data-less block. */
-   blk_lpos->begin = INVALID_LPOS;
-   blk_lpos->next = INVALID_LPOS;
+   blk_lpos->begin = FAILED_LPOS;
+   blk_lpos->next = FAILED_LPOS;
return NULL;
}
 
@@ -1025,6 +1020,10 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 static unsigned int space_used(struct prb_data_ring *data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
+   /* Data-less blocks take no space. */
+   if (LPOS_DATALESS(blk_lpos->begin))
+   return 0;
+
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
/* Data block does not wrap. */
return (DATA_INDEX(data_ring, blk_lpos->next) -
@@ -1080,11 +1079,8 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
if (!data_check_size(>text_data_ring, r->text_buf_size))
goto fail;
 
-   /* Records are allowed to not have dictionaries. */
-   if (r->dict_buf_size) {
-   if (!data_check_size(>dict_data_ring, r->dict_buf_size))
-   goto fail;
-   }
+   if (!data_check_size(>dict_data_ring, r->dict_buf_size))
+   goto fail;
 
/*
 * Descriptors in the reserved state act as blockers to all further
@@ -1212,10 +1208,8 @@ static char *get_data(struct prb_data_ring *data_ring,
struct prb_data_block *db;
 
/* Data-less data block description. */
-   if (blk_lpos->begin == INVALID_LPOS &&
-   blk_lpos->next == INVALID_LPOS

[PATCH 2/4] printk: store instead of processing cont parts

2020-07-17 Thread John Ogness
Instead of buffering continuous line parts before storing the full
line into the ringbuffer, store each part as its own record.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 114 -
 1 file changed, 11 insertions(+), 103 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fec71229169e..c4274c867771 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -495,9 +495,14 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len)
*trunc_msg_len = 0;
 }
 
+static inline u32 printk_caller_id(void)
+{
+   return in_task() ? task_pid_nr(current) :
+   0x8000 + raw_smp_processor_id();
+}
+
 /* insert record into the buffer, discard old ones, update heads */
-static int log_store(u32 caller_id, int facility, int level,
-enum log_flags flags, u64 ts_nsec,
+static int log_store(int facility, int level, enum log_flags flags,
 const char *dict, u16 dict_len,
 const char *text, u16 text_len)
 {
@@ -525,11 +530,8 @@ static int log_store(u32 caller_id, int facility, int 
level,
r.info->facility = facility;
r.info->level = level & 7;
r.info->flags = flags & 0x1f;
-   if (ts_nsec > 0)
-   r.info->ts_nsec = ts_nsec;
-   else
-   r.info->ts_nsec = local_clock();
-   r.info->caller_id = caller_id;
+   r.info->ts_nsec = local_clock();
+   r.info->caller_id = printk_caller_id();
 
/* insert message */
prb_commit();
@@ -1874,100 +1876,6 @@ static inline void printk_delay(void)
}
 }
 
-static inline u32 printk_caller_id(void)
-{
-   return in_task() ? task_pid_nr(current) :
-   0x8000 + raw_smp_processor_id();
-}
-
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-   char buf[LOG_LINE_MAX];
-   size_t len; /* length == 0 means unused buffer */
-   u32 caller_id;  /* printk_caller_id() of first print */
-   u64 ts_nsec;/* time of first print */
-   u8 level;   /* log level of first message */
-   u8 facility;/* log facility of first message */
-   enum log_flags flags;   /* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-   if (cont.len == 0)
-   return;
-
-   log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-   cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-enum log_flags flags, const char *text, size_t len)
-{
-   /* If the line gets too long, split it up in separate records. */
-   if (cont.len + len > sizeof(cont.buf)) {
-   cont_flush();
-   return false;
-   }
-
-   if (!cont.len) {
-   cont.facility = facility;
-   cont.level = level;
-   cont.caller_id = caller_id;
-   cont.ts_nsec = local_clock();
-   cont.flags = flags;
-   }
-
-   memcpy(cont.buf + cont.len, text, len);
-   cont.len += len;
-
-   // The original flags come from the first line,
-   // but later continuations can add a newline.
-   if (flags & LOG_NEWLINE) {
-   cont.flags |= LOG_NEWLINE;
-   cont_flush();
-   }
-
-   return true;
-}
-
-static size_t log_output(int facility, int level, enum log_flags lflags, const 
char *dict, size_t dictlen, char *text, size_t text_len)
-{
-   const u32 caller_id = printk_caller_id();
-
-   /*
-* If an earlier line was buffered, and we're a continuation
-* write from the same context, try to add it to the buffer.
-*/
-   if (cont.len) {
-   if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
-   return text_len;
-   }
-   /* Otherwise, make sure it's flushed */
-   cont_flush();
-   }
-
-   /* Skip empty continuation lines that couldn't be added - they just 
flush */
-   if (!text_len && (lflags & LOG_CONT))
-   return 0;
-
-   /* If it doesn't end in a newline, try to buffer the current line */
-   if (!(lflags & LOG_NEWLINE)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
-   return text_len;
-   }
-
-   /* Store it in the record

[PATCH 3/4] printk: process cont records during reading

2020-07-17 Thread John Ogness
Readers of the printk ringbuffer can use the continuous line interface
to read full lines. The interface buffers continuous line parts until
the full line is available or that line was interrupted by a writer
from another context.

The continuous line interface automatically throws out partial lines if
a reader jumps to older sequence numbers. If a reader jumps to higher
sequence numbers, any cached partial lines are flushed.

The continuous line interface is used by:

  - console printing
  - syslog
  - devkmsg

devkmsg has the additional requirement that it must show a line for
every sequence number if the corresponding continuous line record was
not dropped. The continuous line interface supports this by allowing
the reader to provide a printk_record struct that will be filled in
with placeholder information (but no text) in case a full line is not
yet available.

Note that kmsg_dump does not use the continuous line interface.

The continuous line interface discards dictionaries of continuous lines.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 455 +
 1 file changed, 371 insertions(+), 84 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c4274c867771..363ef290f313 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -657,6 +657,287 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
return p - buf;
 }
 
+/*
+ * Readers of the printk ringbuffer can use the continuous line interface
+ * to read full lines. The interface buffers continuous line parts until
+ * the full line is available or that line was interrupted by a writer
+ * from another context.
+ *
+ * The continuous line interface automatically throws out partial lines if a
+ * reader jumps to older sequence numbers. If a reader jumps to higher
+ * sequence numbers, any cached partial lines are flushed.
+ *
+ * The continuous line interface is used by:
+ *
+ *   - console printing
+ *   - syslog
+ *   - devkmsg
+ *
+ * devkmsg has the additional requirement that it must show a line for every
+ * sequence number if the corresponding continuous line record was not dropped.
+ * The continuous line interface supports this by allowing the reader to
+ * provide a printk_record struct that will be filled in with placeholder
+ * information (but no text) in case a full line is not yet available.
+ *
+ * Note that kmsg_dump does not use the continuous line interface.
+ *
+ * The continuous line interface discards dictionaries of continuous lines.
+ */
+
+struct cont_record {
+   struct printk_recordr;
+   struct printk_info  info;
+   chartext[LOG_LINE_MAX + PREFIX_MAX];
+   boolset;
+};
+
+/*
+ * The continuous line buffer manager.
+ *
+ * @cr:record buffers for reading and caching continuous lines
+ * @dict:  the dictionary used when reading a record
+ * @cache_ind: index of the cache record in @cr
+ * @begin_seq: the minimal sequence number of the current continuous line
+ * @end_seq:   the maximal sequence number of the current continuous line
+ * @dropped:   count of dropped records during the current continuous line
+ */
+struct cont {
+   struct cont_record  cr[2];
+   chardict[LOG_LINE_MAX];
+   int cache_ind;
+   u64 begin_seq;
+   u64 end_seq;
+   unsigned long   dropped;
+};
+
+/*
+ * Initialize the continuous line manager. As an alternative, it is also
+ * acceptable if the structure is set to all zeros.
+ */
+static void cont_init(struct cont *c, u64 seq)
+{
+   c->cr[0].set = false;
+   c->cr[1].set = false;
+   c->cache_ind = 0;
+   c->begin_seq = seq;
+   c->end_seq = seq;
+   c->dropped = 0;
+}
+
+/* Get the continuous line cache, if one exists. */
+static struct printk_record *cont_cache(struct cont *c)
+{
+   struct cont_record *cr = >cr[c->cache_ind];
+
+   if (!cr->set)
+   return NULL;
+   return >r;
+}
+
+/*
+ * Like cont_cache(), but also flushes the dropped count, clears the
+ * dictionary, and switches to the other record buffer for future caching.
+ */
+static struct printk_record *cont_flush(struct cont *c, unsigned long *dropped)
+{
+   struct cont_record *cr = >cr[c->cache_ind];
+
+   c->cache_ind ^= 1;
+
+   if (!cr->set)
+   return NULL;
+
+   if (dropped)
+   *dropped = c->dropped;
+   c->dropped = 0;
+
+   c->begin_seq = cr->info.seq;
+   cr->info.dict_len = 0;
+   cr->set = false;
+
+   return >r;
+}
+
+/*
+ * Wrapper for prb_read_valid() that reads a new record into the
+ * non-caching record buffer.
+ */
+static struct printk_record *cont_read(struct cont *c, u64 seq)
+{
+   struct cont_record *cr = >cr[c->cache_ind ^ 1];
+   struc

[PATCH 0/4] printk: reimplement LOG_CONT handling

2020-07-17 Thread John Ogness
Hello,

Here is the second series to rework the printk subsystem. This series
removes LOG_CONT handling from printk() callers, storing all LOG_CONT
parts individually in the ringbuffer. With this series, LOG_CONT
handling is moved to the ringbuffer readers that provide the record
contents to users (console printing, syslog, /dev/kmsg).

This change is necessary in order to support the upcoming move to a
fully lockless printk() implementation.

This series is in line with the agreements [0] made at the meeting
during LPC2019 in Lisbon, with 1 exception: For the /dev/kmsg
interface, empty line placeholder records are reported for the
LOG_CONT parts.

Using placeholders avoids tools such as systemd-journald from
erroneously reporting missed messages. However, it also means that
empty placeholder records are visible in systemd-journald logs and
displayed in tools such as dmesg.

The effect can be easily observed with the sysrq help:

$ echo h | sudo tee /proc/sysrq-trigger
$ sudo dmesg | tail -n 30
$ sudo journalctl -k -n 30

Providing the placeholder entries allows a userspace tool to identify
if records were actually lost. IMHO this an important feature. Its
side effect can be addressed by userspace tools if they change to
silently consume empty records.

For dump tools that process the ringbuffer directly (such as crash,
makedumpfile, kexec-tools), they will need to implement LOG_CONT
handling if they want to present clean continuous line messages.

Finally, by moving LOG_CONT handling from writers to readers, some
incorrect pr_cont() usage is revealed. Patch 4 of this series
addresses one such example.

This series is based on the printk git tree [1] printk-rework branch.

[0] https://lkml.kernel.org/r/87k1acz5rx@linutronix.de
[1] https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git 
(printk-rework branch)

John Ogness (4):
  printk: ringbuffer: support dataless records
  printk: store instead of processing cont parts
  printk: process cont records during reading
  ipconfig: cleanup printk usage

 kernel/printk/printk.c| 569 --
 kernel/printk/printk_ringbuffer.c |  58 ++-
 kernel/printk/printk_ringbuffer.h |  15 +-
 net/ipv4/ipconfig.c   |  25 +-
 4 files changed, 434 insertions(+), 233 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 4/4] ipconfig: cleanup printk usage

2020-07-17 Thread John Ogness
The use of pr_info() and pr_cont() was not ordered correctly for
all cases. Order it so that all cases provide the expected output.

Signed-off-by: John Ogness 
---
 net/ipv4/ipconfig.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 561f15b5a944..0f4bd7a59310 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1442,6 +1442,9 @@ static int __init ip_auto_config(void)
 #endif
int err;
unsigned int i;
+#ifndef IPCONFIG_SILENT
+   bool pr0;
+#endif
 
/* Initialise all name servers and NTP servers to NONE (but only if the
 * "ip=" or "nfsaddrs=" kernel command line parameters weren't decoded,
@@ -1575,31 +1578,37 @@ static int __init ip_auto_config(void)
if (ic_dev_mtu)
pr_cont(", mtu=%d", ic_dev_mtu);
/* Name servers (if any): */
+   pr0 = false;
for (i = 0; i < CONF_NAMESERVERS_MAX; i++) {
if (ic_nameservers[i] != NONE) {
-   if (i == 0)
+   if (!pr0) {
pr_info(" nameserver%u=%pI4",
i, _nameservers[i]);
-   else
+   pr0 = true;
+   } else {
pr_cont(", nameserver%u=%pI4",
i, _nameservers[i]);
+   }
}
-   if (i + 1 == CONF_NAMESERVERS_MAX)
-   pr_cont("\n");
}
+   if (pr0)
+   pr_cont("\n");
/* NTP servers (if any): */
+   pr0 = false;
for (i = 0; i < CONF_NTP_SERVERS_MAX; i++) {
if (ic_ntp_servers[i] != NONE) {
-   if (i == 0)
+   if (!pr0) {
pr_info(" ntpserver%u=%pI4",
i, _ntp_servers[i]);
-   else
+   pr0 = true;
+   } else {
pr_cont(", ntpserver%u=%pI4",
i, _ntp_servers[i]);
+   }
}
-   if (i + 1 == CONF_NTP_SERVERS_MAX)
-   pr_cont("\n");
}
+   if (pr0)
+   pr_cont("\n");
 #endif /* !SILENT */
 
/*
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v4 4/4] printk: use the lockless ringbuffer

2020-07-15 Thread John Ogness
Replace the existing ringbuffer usage and implementation with
lockless ringbuffer usage. Even though the new ringbuffer does not
require locking, all existing locking is left in place. Therefore,
this change is purely replacing the underlining ringbuffer.

Changes that exist due to the ringbuffer replacement:

- The VMCOREINFO has been updated for the new structures.

- Dictionary data is now stored in a separate data buffer from the
  human-readable messages. The dictionary data buffer is set to the
  same size as the message buffer. Therefore, the total required
  memory for both dictionary and message data is
  2 * (2 ^ CONFIG_LOG_BUF_SHIFT) for the initial static buffers and
  2 * log_buf_len (the kernel parameter) for the dynamic buffers.

- Record meta-data is now stored in a separate array of descriptors.
  This is an additional 72 * (2 ^ (CONFIG_LOG_BUF_SHIFT - 5)) bytes
  for the static array and 72 * (log_buf_len >> 5) bytes for the
  dynamic array.

Signed-off-by: John Ogness 
---
 kernel/printk/printk.c | 940 +
 1 file changed, 493 insertions(+), 447 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1b41e1b98221..4c6b4e68ad07 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -55,6 +55,7 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#include "printk_ringbuffer.h"
 #include "console_cmdline.h"
 #include "braille.h"
 #include "internal.h"
@@ -294,30 +295,24 @@ enum con_msg_format_flags {
 static int console_msg_format = MSG_FORMAT_DEFAULT;
 
 /*
- * The printk log buffer consists of a chain of concatenated variable
- * length records. Every record starts with a record header, containing
- * the overall length of the record.
+ * The printk log buffer consists of a sequenced collection of records, each
+ * containing variable length message and dictionary text. Every record
+ * also contains its own meta-data (@info).
  *
- * The heads to the first and last entry in the buffer, as well as the
- * sequence numbers of these entries are maintained when messages are
- * stored.
+ * Every record meta-data carries the timestamp in microseconds, as well as
+ * the standard userspace syslog level and syslog facility. The usual kernel
+ * messages use LOG_KERN; userspace-injected messages always carry a matching
+ * syslog facility, by default LOG_USER. The origin of every message can be
+ * reliably determined that way.
  *
- * If the heads indicate available messages, the length in the header
- * tells the start next message. A length == 0 for the next message
- * indicates a wrap-around to the beginning of the buffer.
+ * The human readable log message of a record is available in @text, the
+ * length of the message text in @text_len. The stored message is not
+ * terminated.
  *
- * Every record carries the monotonic timestamp in microseconds, as well as
- * the standard userspace syslog level and syslog facility. The usual
- * kernel messages use LOG_KERN; userspace-injected messages always carry
- * a matching syslog facility, by default LOG_USER. The origin of every
- * message can be reliably determined that way.
- *
- * The human readable log message directly follows the message header. The
- * length of the message text is stored in the header, the stored message
- * is not terminated.
- *
- * Optionally, a message can carry a dictionary of properties (key/value 
pairs),
- * to provide userspace with a machine-readable message context.
+ * Optionally, a record can carry a dictionary of properties (key/value
+ * pairs), to provide userspace with a machine-readable message context. The
+ * length of the dictionary is available in @dict_len. The dictionary is not
+ * terminated.
  *
  * Examples for well-defined, commonly used property names are:
  *   DEVICE=b12:8   device identifier
@@ -331,21 +326,19 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
  * follows directly after a '=' character. Every property is terminated by
  * a '\0' character. The last property is not terminated.
  *
- * Example of a message structure:
- *     ff 8f 00 00 00 00 00 00  monotonic time in nsec
- *   0008  34 00record is 52 bytes long
- *   000a0b 00  text is 11 bytes long
- *   000c  1f 00dictionary is 23 bytes long
- *   000e03 00  LOG_KERN (facility) LOG_ERR (level)
- *   0010  69 74 27 73 20 61 20 6c  "it's a l"
- * 69 6e 65 "ine"
- *   001b   44 45 56 49 43  "DEVIC"
- * 45 3d 62 38 3a 32 00 44  "E=b8:2\0D"
- * 52 49 56 45 52 3d 62 75  "RIVER=bu"
- * 67   "g"
- *   0032 00 00 00  padding to next message header
- *
- * The 'struct printk_log' buffer header must never be di

[PATCH v4 2/4] printk: add lockless ringbuffer

2020-07-15 Thread John Ogness
Introduce a multi-reader multi-writer lockless ringbuffer for storing
the kernel log messages. Readers and writers may use their API from
any context (including scheduler and NMI). This ringbuffer will make
it possible to decouple printk() callers from any context, locking,
or console constraints. It also makes it possible for readers to have
full access to the ringbuffer contents at any time and context (for
example from any panic situation).

The printk_ringbuffer is made up of 3 internal ringbuffers:

desc_ring:
A ring of descriptors. A descriptor contains all record meta data
(sequence number, timestamp, loglevel, etc.) as well as internal state
information about the record and logical positions specifying where in
the other ringbuffers the text and dictionary strings are located.

text_data_ring:
A ring of data blocks. A data block consists of an unsigned long
integer (ID) that maps to a desc_ring index followed by the text
string of the record.

dict_data_ring:
A ring of data blocks. A data block consists of an unsigned long
integer (ID) that maps to a desc_ring index followed by the dictionary
string of the record.

The internal state information of a descriptor is the key element to
allow readers and writers to locklessly synchronize access to the data.

Signed-off-by: John Ogness 
Co-developed-by: Petr Mladek 
Reviewed-by: Petr Mladek 
Reviewed-by: Paul E. McKenney 
---
 kernel/printk/Makefile|1 +
 kernel/printk/printk_ringbuffer.c | 1676 +
 kernel/printk/printk_ringbuffer.h |  399 +++
 3 files changed, 2076 insertions(+)
 create mode 100644 kernel/printk/printk_ringbuffer.c
 create mode 100644 kernel/printk/printk_ringbuffer.h

diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 4d052fc6bcde..eee3dc9b60a9 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -2,3 +2,4 @@
 obj-y  = printk.o
 obj-$(CONFIG_PRINTK)   += printk_safe.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
+obj-$(CONFIG_PRINTK)   += printk_ringbuffer.o
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
new file mode 100644
index ..f4a670f7289d
--- /dev/null
+++ b/kernel/printk/printk_ringbuffer.c
@@ -0,0 +1,1676 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "printk_ringbuffer.h"
+
+/**
+ * DOC: printk_ringbuffer overview
+ *
+ * Data Structure
+ * --
+ * The printk_ringbuffer is made up of 3 internal ringbuffers:
+ *
+ *   desc_ring
+ * A ring of descriptors. A descriptor contains all record meta data
+ * (sequence number, timestamp, loglevel, etc.) as well as internal state
+ * information about the record and logical positions specifying where in
+ * the other ringbuffers the text and dictionary strings are located.
+ *
+ *   text_data_ring
+ * A ring of data blocks. A data block consists of an unsigned long
+ * integer (ID) that maps to a desc_ring index followed by the text
+ * string of the record.
+ *
+ *   dict_data_ring
+ * A ring of data blocks. A data block consists of an unsigned long
+ * integer (ID) that maps to a desc_ring index followed by the dictionary
+ * string of the record.
+ *
+ * The internal state information of a descriptor is the key element to allow
+ * readers and writers to locklessly synchronize access to the data.
+ *
+ * Implementation
+ * --
+ *
+ * Descriptor Ring
+ * ~~~
+ * The descriptor ring is an array of descriptors. A descriptor contains all
+ * the meta data of a printk record as well as blk_lpos structs pointing to
+ * associated text and dictionary data blocks (see "Data Rings" below). Each
+ * descriptor is assigned an ID that maps directly to index values of the
+ * descriptor array and has a state. The ID and the state are bitwise combined
+ * into a single descriptor field named @state_var, allowing ID and state to
+ * be synchronously and atomically updated.
+ *
+ * Descriptors have three states:
+ *
+ *   reserved
+ * A writer is modifying the record.
+ *
+ *   committed
+ * The record and all its data are complete and available for reading.
+ *
+ *   reusable
+ * The record exists, but its text and/or dictionary data may no longer
+ * be available.
+ *
+ * Querying the @state_var of a record requires providing the ID of the
+ * descriptor to query. This can yield a possible fourth (pseudo) state:
+ *
+ *   miss
+ * The descriptor being queried has an unexpected ID.
+ *
+ * The descriptor ring has a @tail_id that contains the ID of the oldest
+ * descriptor and @head_id that contains the ID of the newest descriptor.
+ *
+ * When a new descriptor should be created (and the ring is full), the tail
+ * descriptor is invalidated by first transitioning to the reusable state and
+ * then invalidating all tail data blocks up to and including the data blocks
+ * associated with

[PATCH v5 2/4] printk: add lockless ringbuffer

2020-07-15 Thread John Ogness
Introduce a multi-reader multi-writer lockless ringbuffer for storing
the kernel log messages. Readers and writers may use their API from
any context (including scheduler and NMI). This ringbuffer will make
it possible to decouple printk() callers from any context, locking,
or console constraints. It also makes it possible for readers to have
full access to the ringbuffer contents at any time and context (for
example from any panic situation).

The printk_ringbuffer is made up of 3 internal ringbuffers:

desc_ring:
A ring of descriptors. A descriptor contains all record meta data
(sequence number, timestamp, loglevel, etc.) as well as internal state
information about the record and logical positions specifying where in
the other ringbuffers the text and dictionary strings are located.

text_data_ring:
A ring of data blocks. A data block consists of an unsigned long
integer (ID) that maps to a desc_ring index followed by the text
string of the record.

dict_data_ring:
A ring of data blocks. A data block consists of an unsigned long
integer (ID) that maps to a desc_ring index followed by the dictionary
string of the record.

The internal state information of a descriptor is the key element to
allow readers and writers to locklessly synchronize access to the data.

Signed-off-by: John Ogness 
Co-developed-by: Petr Mladek 
Reviewed-by: Petr Mladek 
Reviewed-by: Paul E. McKenney 
---
 kernel/printk/Makefile|1 +
 kernel/printk/printk_ringbuffer.c | 1687 +
 kernel/printk/printk_ringbuffer.h |  399 +++
 3 files changed, 2087 insertions(+)
 create mode 100644 kernel/printk/printk_ringbuffer.c
 create mode 100644 kernel/printk/printk_ringbuffer.h

diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 4d052fc6bcde..eee3dc9b60a9 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -2,3 +2,4 @@
 obj-y  = printk.o
 obj-$(CONFIG_PRINTK)   += printk_safe.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
+obj-$(CONFIG_PRINTK)   += printk_ringbuffer.o
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
new file mode 100644
index ..7355ca99e852
--- /dev/null
+++ b/kernel/printk/printk_ringbuffer.c
@@ -0,0 +1,1687 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "printk_ringbuffer.h"
+
+/**
+ * DOC: printk_ringbuffer overview
+ *
+ * Data Structure
+ * --
+ * The printk_ringbuffer is made up of 3 internal ringbuffers:
+ *
+ *   desc_ring
+ * A ring of descriptors. A descriptor contains all record meta data
+ * (sequence number, timestamp, loglevel, etc.) as well as internal state
+ * information about the record and logical positions specifying where in
+ * the other ringbuffers the text and dictionary strings are located.
+ *
+ *   text_data_ring
+ * A ring of data blocks. A data block consists of an unsigned long
+ * integer (ID) that maps to a desc_ring index followed by the text
+ * string of the record.
+ *
+ *   dict_data_ring
+ * A ring of data blocks. A data block consists of an unsigned long
+ * integer (ID) that maps to a desc_ring index followed by the dictionary
+ * string of the record.
+ *
+ * The internal state information of a descriptor is the key element to allow
+ * readers and writers to locklessly synchronize access to the data.
+ *
+ * Implementation
+ * --
+ *
+ * Descriptor Ring
+ * ~~~
+ * The descriptor ring is an array of descriptors. A descriptor contains all
+ * the meta data of a printk record as well as blk_lpos structs pointing to
+ * associated text and dictionary data blocks (see "Data Rings" below). Each
+ * descriptor is assigned an ID that maps directly to index values of the
+ * descriptor array and has a state. The ID and the state are bitwise combined
+ * into a single descriptor field named @state_var, allowing ID and state to
+ * be synchronously and atomically updated.
+ *
+ * Descriptors have three states:
+ *
+ *   reserved
+ * A writer is modifying the record.
+ *
+ *   committed
+ * The record and all its data are complete and available for reading.
+ *
+ *   reusable
+ * The record exists, but its text and/or dictionary data may no longer
+ * be available.
+ *
+ * Querying the @state_var of a record requires providing the ID of the
+ * descriptor to query. This can yield a possible fourth (pseudo) state:
+ *
+ *   miss
+ * The descriptor being queried has an unexpected ID.
+ *
+ * The descriptor ring has a @tail_id that contains the ID of the oldest
+ * descriptor and @head_id that contains the ID of the newest descriptor.
+ *
+ * When a new descriptor should be created (and the ring is full), the tail
+ * descriptor is invalidated by first transitioning to the reusable state and
+ * then invalidating all tail data blocks up to and including the data blocks
+ * associated with

[PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-15 Thread John Ogness
Replace the existing ringbuffer usage and implementation with
lockless ringbuffer usage. Even though the new ringbuffer does not
require locking, all existing locking is left in place. Therefore,
this change is purely replacing the underlining ringbuffer.

Changes that exist due to the ringbuffer replacement:

- The VMCOREINFO has been updated for the new structures.

- Dictionary data is now stored in a separate data buffer from the
  human-readable messages. The dictionary data buffer is set to the
  same size as the message buffer. Therefore, the total required
  memory for both dictionary and message data is
  2 * (2 ^ CONFIG_LOG_BUF_SHIFT) for the initial static buffers and
  2 * log_buf_len (the kernel parameter) for the dynamic buffers.

- Record meta-data is now stored in a separate array of descriptors.
  This is an additional 72 * (2 ^ (CONFIG_LOG_BUF_SHIFT - 5)) bytes
  for the static array and 72 * (log_buf_len >> 5) bytes for the
  dynamic array.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 940 +
 1 file changed, 493 insertions(+), 447 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1b41e1b98221..fec71229169e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -55,6 +55,7 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#include "printk_ringbuffer.h"
 #include "console_cmdline.h"
 #include "braille.h"
 #include "internal.h"
@@ -294,30 +295,24 @@ enum con_msg_format_flags {
 static int console_msg_format = MSG_FORMAT_DEFAULT;
 
 /*
- * The printk log buffer consists of a chain of concatenated variable
- * length records. Every record starts with a record header, containing
- * the overall length of the record.
+ * The printk log buffer consists of a sequenced collection of records, each
+ * containing variable length message and dictionary text. Every record
+ * also contains its own meta-data (@info).
  *
- * The heads to the first and last entry in the buffer, as well as the
- * sequence numbers of these entries are maintained when messages are
- * stored.
+ * Every record meta-data carries the timestamp in microseconds, as well as
+ * the standard userspace syslog level and syslog facility. The usual kernel
+ * messages use LOG_KERN; userspace-injected messages always carry a matching
+ * syslog facility, by default LOG_USER. The origin of every message can be
+ * reliably determined that way.
  *
- * If the heads indicate available messages, the length in the header
- * tells the start next message. A length == 0 for the next message
- * indicates a wrap-around to the beginning of the buffer.
+ * The human readable log message of a record is available in @text, the
+ * length of the message text in @text_len. The stored message is not
+ * terminated.
  *
- * Every record carries the monotonic timestamp in microseconds, as well as
- * the standard userspace syslog level and syslog facility. The usual
- * kernel messages use LOG_KERN; userspace-injected messages always carry
- * a matching syslog facility, by default LOG_USER. The origin of every
- * message can be reliably determined that way.
- *
- * The human readable log message directly follows the message header. The
- * length of the message text is stored in the header, the stored message
- * is not terminated.
- *
- * Optionally, a message can carry a dictionary of properties (key/value 
pairs),
- * to provide userspace with a machine-readable message context.
+ * Optionally, a record can carry a dictionary of properties (key/value
+ * pairs), to provide userspace with a machine-readable message context. The
+ * length of the dictionary is available in @dict_len. The dictionary is not
+ * terminated.
  *
  * Examples for well-defined, commonly used property names are:
  *   DEVICE=b12:8   device identifier
@@ -331,21 +326,19 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
  * follows directly after a '=' character. Every property is terminated by
  * a '\0' character. The last property is not terminated.
  *
- * Example of a message structure:
- *     ff 8f 00 00 00 00 00 00  monotonic time in nsec
- *   0008  34 00record is 52 bytes long
- *   000a0b 00  text is 11 bytes long
- *   000c  1f 00dictionary is 23 bytes long
- *   000e03 00  LOG_KERN (facility) LOG_ERR (level)
- *   0010  69 74 27 73 20 61 20 6c  "it's a l"
- * 69 6e 65 "ine"
- *   001b   44 45 56 49 43  "DEVIC"
- * 45 3d 62 38 3a 32 00 44  "E=b8:2\0D"
- * 52 49 56 45 52 3d 62 75  "RIVER=bu"
- * 67   "g"
- *   0032 00 00 00  padding to next message header
- *
- * The 'struct printk_log' buffer header m

Re: [PATCH v4 0/4] printk: replace ringbuffer

2020-07-10 Thread John Ogness
On 2020-07-10, Petr Mladek  wrote:
>> The next series in the printk-rework (move LOG_CONT handling from
>> writers to readers) makes some further changes that, while not
>> incompatible, could affect the output of existing tools. It may be a
>> good idea to let the new ringbuffer sit in linux-next until the next
>> series has been discussed/reviewed/merged. After the next series,
>> everything will be in place (with regard to userspace tools) to
>> finish the rework.
>
> I know that it might be premature question. But I wonder what kind
> of changes are expected because of the continuous lines.

I will be posting the next series quite soon, so I think it will be
better to discuss it when we have a working example in front of us.

> Do you expect some changes in the ring buffer structures so that
> the debugging tools would need yet another update to actually
> access the data?

The next series will be modifying the ringbuffer to allow data-less
records. This is necessary to support the thousands of

pr_cont("\n");

calls in the kernel code. Failed dataring allocations will still be
detected because the message flags for those records will be 0. For the
above pr_cont() line, they will be LOG_NEWLINE|LOG_CONT.

Since the dump tools need to make changes for the new ringbuffer anyway,
I think it would be good to hammer out the accepted LOG_CONT
implementation first, just in case we do need to make any subtle
internal changes.

> Or do you expect backward compatible changes that would allow
> to pass related parts of the continuous lines via syslog/dev_kmsg
> interface and join them later in userspace?

For users of console, non-extended netconsole, syslog, and kmsg_dump,
there will be no external changes whatsoever. These interfaces have no
awareness of sequence numbers, which will allow the kernel to
re-assemble the LOG_CONT messages for them.

Users of /dev/kmsg and extended netconsole see sequence numbers. Offlist
we discussed various hacks how to get around this without causing errors
for existing software, but it was all ugly.

IMHO users of these sequence number interfaces need to see all the
records individually and reassemble the LOG_CONT messages themselves if
they want to. I believe that is the only sane path forward. To do this,
the caller id will no longer be optional to the sequence number output
since that is vital information to re-assemble the LOG_CONT messages.

Keep in mind that current software already needs to be able to handle
the caller id being shown. Also, currently in mainline there is no
guarantee that LOG_CONT messages are contiguous. So current software
must also be ready to accept broken up LOG_CONT messages. This is why I
think it would be acceptable to make this change for /dev/kmsg and
extended netconsole. But I understand it is controversial since tools
like systemd and dmesg use /dev/kmsg. Until they are modified to
re-assemble LOG_CONT messages, they will present the user with the
ugliness of LOG_CONT pieces (always, rather than as is now rarely).

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


  1   2   >