Re: [PATCH v10 1/9] drm/panic: Add drm panic locking

2024-03-15 Thread John Ogness
On 2024-03-15, Jocelyn Falempe  wrote:
> +static inline int drm_panic_trylock(struct drm_device *dev, unsigned long 
> *flags)
> +{
> + return raw_spin_trylock_irqsave(>mode_config.panic_lock, *flags);
> +}

[...]

> +static inline unsigned long drm_panic_lock(struct drm_device *dev)
> +{
> + unsigned long flags;
> +
> + raw_spin_lock_irqsave(>mode_config.panic_lock, flags);
> + return flags;
> +}

I find it a bit strange that @flags is passed as an argument in the
trylock variant but returned in the lock variant. I would pass it as an
argument in both variants. Just a suggestion.

John Ogness


Re: [PATCH v9 1/9] drm/panic: Add drm panic locking

2024-03-07 Thread John Ogness
On 2024-03-07, Jocelyn Falempe  wrote:
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index 39ef0a6addeb..c0bb91312fb2 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -3099,6 +3100,7 @@ int drm_atomic_helper_swap_state(struct 
> drm_atomic_state *state,
>   }
>   }
>  
> + drm_panic_lock(state->dev);
>   for_each_oldnew_plane_in_state(state, plane, old_plane_state, 
> new_plane_state, i) {
>   WARN_ON(plane->state != old_plane_state);
>  
> @@ -3108,6 +3110,7 @@ int drm_atomic_helper_swap_state(struct 
> drm_atomic_state *state,
>   state->planes[i].state = old_plane_state;
>   plane->state = new_plane_state;
>   }
> + drm_panic_unlock(state->dev);

Is there a reason irqsave/irqrestore variants are not used? Maybe this
code path is too hot?

By leaving interrupts enabled, there is the risk that a panic from
within any interrupt handler may block the drm panic handler.

John Ogness


Re: [RFC] drm/panic: Add drm panic locking

2024-03-05 Thread John Ogness
Hi Daniel,

Great to see this moving forward!

On 2024-03-01, Daniel Vetter  wrote:
> But for the initial cut of a drm panic printing support I don't think
> we need that, because the critical sections are extremely small and
> only happen once per display refresh. So generally just 60 tiny locked
> sections per second, which is nothing compared to a serial console
> running a 115kbaud doing really slow mmio writes for each byte. So for
> now the raw spintrylock in drm panic notifier callback should be good
> enough.

Is there a reason you do not use the irqsave/irqrestore variants? By
leaving interrupts enabled, there is the risk that a panic from any
interrupt handler may block the drm panic handler.

John Ogness


Re: [RFC] How to test panic handlers, without crashing the kernel

2024-03-04 Thread John Ogness
[Added printk maintainer and kdb folks]

Hi Jocelyn,

On 2024-03-01, Jocelyn Falempe  wrote:
> While writing a panic handler for drm devices [1], I needed a way to 
> test it without crashing the machine.
> So from debugfs, I called 
> atomic_notifier_call_chain(_notifier_list, ...), but it has the 
> side effect of calling all other panic notifiers registered.
>
> So Sima suggested to move that to the generic panic code, and test all 
> panic notifiers with a dedicated debugfs interface.
>
> I can move that code to kernel/, but before doing that, I would like to 
> know if you think that's the right way to test the panic code.

One major event that happens before the panic notifiers is
panic_other_cpus_shutdown(). This can cause special situations because
CPUs can be stopped while holding resources (such as raw spin
locks). And these are the situations that make it so tricky to have safe
and reliable notifiers. If triggered from debugfs, these situations will
never occur.

My concern is that the tests via debugfs will always succeed, but in the
real world panic notifiers are failing/hanging/exploding. IMHO useful
panic testing requires real panic'ing.

For my printk panic tests I trigger unknown NMIs while booting with
"unknown_nmi_panic". Particularly with Qemu this is quite easy and
amazingly effective at catching problems. In fact, a recent printk
series [0] fixed seven issues that were found through this method of
panic testing.

> The second question is how to simulate a panic context in a
> non-destructive way, so we can test the panic notifiers in CI, without
> crashing the machine.

I'm wondering if a "fake panic" can be implemented that quiesces all the
other CPUs via NMI (similar to kdb) and then calls the panic
notifiers. And finally releases everything back to normal. That might
produce a fairly realistic panic situation and should be fairly
non-destructive (depending on what the notifiers do and how long they
take).

> The worst case for a panic notifier, is when the panic occurs in NMI
> context, but I don't know how to simulate that. The goal would be to
> find early if a panic notifier tries to sleep, or do other things that
> are not allowed in a panic context.

Maybe with a new boot argument "unknown_nmi_fake_panic" that triggers
the fake panic instead?

John Ogness

[0] 
https://lore.kernel.org/lkml/20240207134103.1357162-1-john.ogn...@linutronix.de


[PATCH v3] drm/nouveau: fix incorrect conversion to dma_resv_wait_timeout()

2023-04-14 Thread John Ogness
Commit 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
converted from ttm_bo_wait_ctx() to dma_resv_wait_timeout().
However, dma_resv_wait_timeout() returns greater than zero on
success as opposed to ttm_bo_wait_ctx(). As a result, relocs
will fail and log errors even when it was a success.

Change the return code handling to match that of
nouveau_gem_ioctl_cpu_prep(), which was already using
dma_resv_wait_timeout() correctly.

Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
Reported-by: Tanmay Bhushan <0070472...@gmail.com>
Link: https://lore.kernel.org/lkml/20230119225351.71657-1-0070472...@gmail.com
Signed-off-by: John Ogness 
---
 I just realized that the nouveau driver style prefers to scope
 variables used only in loops.

 v3: Define @lret within the for-loop.

 drivers/gpu/drm/nouveau/nouveau_gem.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index f77e44958037..ab9062e50977 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -645,7 +645,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
struct drm_nouveau_gem_pushbuf_reloc *reloc,
struct drm_nouveau_gem_pushbuf_bo *bo)
 {
-   long ret = 0;
+   int ret = 0;
unsigned i;
 
for (i = 0; i < req->nr_relocs; i++) {
@@ -653,6 +653,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
struct drm_nouveau_gem_pushbuf_bo *b;
struct nouveau_bo *nvbo;
uint32_t data;
+   long lret;
 
if (unlikely(r->bo_index >= req->nr_buffers)) {
NV_PRINTK(err, cli, "reloc bo index invalid\n");
@@ -703,13 +704,18 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
data |= r->vor;
}
 
-   ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
-   DMA_RESV_USAGE_BOOKKEEP,
-   false, 15 * HZ);
-   if (ret == 0)
+   lret = dma_resv_wait_timeout(nvbo->bo.base.resv,
+DMA_RESV_USAGE_BOOKKEEP,
+false, 15 * HZ);
+   if (!lret)
ret = -EBUSY;
+   else if (lret > 0)
+   ret = 0;
+   else
+   ret = lret;
+
if (ret) {
-   NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
+   NV_PRINTK(err, cli, "reloc wait_idle failed: %d\n",
  ret);
break;
}

base-commit: 09a9639e56c01c7a00d6c0ca63f4c7c41abe075d
-- 
2.30.2


[PATCH v2] drm/nouveau: fix incorrect conversion to dma_resv_wait_timeout()

2023-04-14 Thread John Ogness
Commit 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
converted from ttm_bo_wait_ctx() to dma_resv_wait_timeout().
However, dma_resv_wait_timeout() returns greater than zero on
success as opposed to ttm_bo_wait_ctx(). As a result, relocs
will fail and log errors even when it was a success.

Change the return code handling to match that of
nouveau_gem_ioctl_cpu_prep(), which was already using
dma_resv_wait_timeout() correctly.

Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
Reported-by: Tanmay Bhushan <0070472...@gmail.com>
Link: https://lore.kernel.org/lkml/20230119225351.71657-1-0070472...@gmail.com
Signed-off-by: John Ogness 
---
 The original report was actually a patch that needed fixing.
 Since nobody has stepped up to fix this regression correctly,
 I'm posting the v2.

 This is a real regression introduced in 6.3-rc1.

 drivers/gpu/drm/nouveau/nouveau_gem.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index f77e44958037..346839c24273 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -645,8 +645,9 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
struct drm_nouveau_gem_pushbuf_reloc *reloc,
struct drm_nouveau_gem_pushbuf_bo *bo)
 {
-   long ret = 0;
+   int ret = 0;
unsigned i;
+   long lret;
 
for (i = 0; i < req->nr_relocs; i++) {
struct drm_nouveau_gem_pushbuf_reloc *r = [i];
@@ -703,13 +704,18 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
data |= r->vor;
}
 
-   ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
-   DMA_RESV_USAGE_BOOKKEEP,
-   false, 15 * HZ);
-   if (ret == 0)
+   lret = dma_resv_wait_timeout(nvbo->bo.base.resv,
+DMA_RESV_USAGE_BOOKKEEP,
+false, 15 * HZ);
+   if (!lret)
ret = -EBUSY;
+   else if (lret > 0)
+   ret = 0;
+   else
+   ret = lret;
+
if (ret) {
-   NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
+   NV_PRINTK(err, cli, "reloc wait_idle failed: %d\n",
  ret);
break;
}

base-commit: 09a9639e56c01c7a00d6c0ca63f4c7c41abe075d
-- 
2.30.2



Re: [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau

2023-03-27 Thread John Ogness
On 2023-01-19, Tanmay Bhushan <0070472...@gmail.com> wrote:
> dma_resv_wait_timeout returns greater than zero on success
> as opposed to ttm_bo_wait_ctx. As a result of that relocs
> will fail and give failure even when it was a success.

Today I switched my workstation from 6.2 to 6.3-rc3 and started seeing
lots of new kernel messages:

[  642.138313][ T1751] nouveau :f0:10.0: X[1751]: reloc wait_idle failed: 
1500
[  642.138389][ T1751] nouveau :f0:10.0: X[1751]: reloc apply: 1500
[  646.123490][ T1751] nouveau :f0:10.0: X[1751]: reloc wait_idle failed: 
1500
[  646.123573][ T1751] nouveau :f0:10.0: X[1751]: reloc apply: 1500

The graphics seemed to go slower or hang a bit when these messages would
appear. I then found your patch! However, I have some comments about it.

First, it should include a fixes tag:

Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")

> Signed-off-by: Tanmay Bhushan <0070472...@gmail.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
> b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index f77e44958037..0e3690459144 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
>   ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
>   DMA_RESV_USAGE_BOOKKEEP,
>   false, 15 * HZ);
> - if (ret == 0)
> + if (ret <= 0) {
>   ret = -EBUSY;

This is incorrect for 2 reasons:

* it treats restarts as timeouts

* this function now returns >0 on success

> - if (ret) {
>   NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
> ret);
>   break;

I rearranged things to basically correctly translate the return code of
dma_resv_wait_timeout() to match the previous ttm_bo_wait():

ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
DMA_RESV_USAGE_BOOKKEEP,
false, 15 * HZ);
if (ret == 0)
ret = -EBUSY;
if (ret > 0)
ret = 0;
if (ret) {
NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
  ret);
break;
}

So the patch just becomes:

@@ -708,6 +708,8 @@ nouveau_gem_pushbuf_reloc_apply(struct n
false, 15 * HZ);
if (ret == 0)
ret = -EBUSY;
+   if (ret > 0)
+   ret = 0;
if (ret) {
NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
  ret);

With this variant, everything runs correctly on my workstation again.

It probably deserves a comment about why @ret is being translated. Or
perhaps a new variable should be introduced to separate the return value
of dma_resv_wait_timeout() from the return value of this function.

Either way, this is an important fix for 6.3-rc!

John Ogness


[PATCH printk v5 32/40] printk, xen: fbfront: create/use safe function for forcing preferred

2022-11-16 Thread John Ogness
With commit 9e124fe16ff2("xen: Enable console tty by default in domU
if it's not a dummy") a hack was implemented to make sure that the
tty console remains the console behind the /dev/console device. The
main problem with the hack is that, after getting the console pointer
to the tty console, it is assumed the pointer is still valid after
releasing the console_sem. This assumption is incorrect and unsafe.

Make the hack safe by introducing a new function
console_force_preferred_locked() and perform the full operation
under the console_list_lock.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 drivers/video/fbdev/xen-fbfront.c | 12 +++-
 include/linux/console.h   |  1 +
 kernel/printk/printk.c| 49 +--
 3 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/drivers/video/fbdev/xen-fbfront.c 
b/drivers/video/fbdev/xen-fbfront.c
index 4d2694d904aa..8752d389e382 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -504,18 +504,14 @@ static void xenfb_make_preferred_console(void)
if (console_set_on_cmdline)
return;
 
-   console_lock();
+   console_list_lock();
for_each_console(c) {
if (!strcmp(c->name, "tty") && c->index == 0)
break;
}
-   console_unlock();
-   if (c) {
-   unregister_console(c);
-   c->flags |= CON_CONSDEV;
-   c->flags &= ~CON_PRINTBUFFER; /* don't print again */
-   register_console(c);
-   }
+   if (c)
+   console_force_preferred_locked(c);
+   console_list_unlock();
 }
 
 static int xenfb_resume(struct xenbus_device *dev)
diff --git a/include/linux/console.h b/include/linux/console.h
index f716e1dd9eaf..9cea254b34b8 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -291,6 +291,7 @@ enum con_flush_mode {
 };
 
 extern int add_preferred_console(char *name, int idx, char *options);
+extern void console_force_preferred_locked(struct console *con);
 extern void register_console(struct console *);
 extern int unregister_console(struct console *);
 extern void console_lock(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 410d3f2cdeb3..ece34abbc9cc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -247,9 +247,10 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int 
write,
 void console_list_lock(void)
 {
/*
-* In unregister_console(), synchronize_srcu() is called with the
-* console_list_lock held. Therefore it is not allowed that the
-* console_list_lock is taken with the srcu_lock held.
+* In unregister_console() and console_force_preferred_locked(),
+* synchronize_srcu() is called with the console_list_lock held.
+* Therefore it is not allowed that the console_list_lock is taken
+* with the srcu_lock held.
 *
 * Detecting if this context is really in the read-side critical
 * section is only possible if the appropriate debug options are
@@ -3489,6 +3490,48 @@ int unregister_console(struct console *console)
 }
 EXPORT_SYMBOL(unregister_console);
 
+/**
+ * console_force_preferred_locked - force a registered console preferred
+ * @con: The registered console to force preferred.
+ *
+ * Must be called under console_list_lock().
+ */
+void console_force_preferred_locked(struct console *con)
+{
+   struct console *cur_pref_con;
+
+   if (!console_is_registered_locked(con))
+   return;
+
+   cur_pref_con = console_first();
+
+   /* Already preferred? */
+   if (cur_pref_con == con)
+   return;
+
+   /*
+* Delete, but do not re-initialize the entry. This allows the console
+* to continue to appear registered (via any hlist_unhashed_lockless()
+* checks), even though it was briefly removed from the console list.
+*/
+   hlist_del_rcu(>node);
+
+   /*
+* Ensure that all SRCU list walks have completed so that the console
+* can be added to the beginning of the console list and its forward
+* list pointer can be re-initialized.
+*/
+   synchronize_srcu(_srcu);
+
+   con->flags |= CON_CONSDEV;
+   WARN_ON(!con->device);
+
+   /* Only the new head can have CON_CONSDEV set. */
+   console_srcu_write_flags(cur_pref_con, cur_pref_con->flags & 
~CON_CONSDEV);
+   hlist_add_head_rcu(>node, _list);
+}
+EXPORT_SYMBOL(console_force_preferred_locked);
+
 /*
  * Initialize the console device. This is called *early*, so
  * we can't necessarily depend on lots of kernel help here.
-- 
2.30.2



[PATCH printk v5 00/40] reduce console_lock scope

2022-11-16 Thread John Ogness
This is v5 of a series to prepare for threaded/atomic
printing. v4 is here [0]. This series focuses on reducing the
scope of the BKL console_lock. It achieves this by switching to
SRCU and a dedicated mutex for console list iteration and
modification, respectively. The console_lock will no longer
offer this protection.

Also, during the review of v2 it came to our attention that
many console drivers are checking CON_ENABLED to see if they
are registered. Because this flag can change without
unregistering and because this flag does not represent an
atomic point when an (un)registration process is complete,
a new console_is_registered() function is introduced. This
function uses the console_list_lock to synchronize with the
(un)registration process to provide a reliable status.

All users of the console_lock for list iteration have been
modified. For the call sites where the console_lock is still
needed (for other reasons), comments are added to explain
exactly why the console_lock is needed.

All users of CON_ENABLED for registration status have been
modified to use console_is_registered(). Note that there are
still users of CON_ENABLED, but this is for legitimate purposes
about a registered console being able to print.

The base commit for this series is from Paul McKenney's RCU tree
and provides an NMI-safe SRCU implementation [1]. Without the
NMI-safe SRCU implementation, this series is not less safe than
mainline. But we will need the NMI-safe SRCU implementation for
atomic consoles anyway, so we might as well get it in
now. Especially since it _does_ increase the reliability for
mainline in the panic path.

Changes since v4:

printk:

- Introduce console_init_seq() to handle the now rather complex
  procedure to find an appropriate start sequence number for a
  new console upon registration.

- When registering a non-boot console and boot consoles are
  registered, try to flush all the consoles to get the next @seq
  value before falling back to use the @seq of the enabled boot
  console that is furthest behind.

- For console_force_preferred_locked(), make the console the
  head of the console list.

John Ogness

[0] 
https://lore.kernel.org/lkml/20221114162932.141883-1-john.ogn...@linutronix.de
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/log/?h=srcunmisafe.2022.11.09a

John Ogness (38):
  printk: Prepare for SRCU console list protection
  printk: register_console: use "registered" for variable names
  printk: move @seq initialization to helper
  printk: fix setting first seq for consoles
  um: kmsg_dump: only dump when no output console available
  tty: serial: kgdboc: document console_lock usage
  tty: tty_io: document console_lock usage
  proc: consoles: document console_lock usage
  printk: introduce console_list_lock
  console: introduce wrappers to read/write console flags
  um: kmsg_dumper: use srcu console list iterator
  kdb: use srcu console list iterator
  printk: console_flush_all: use srcu console list iterator
  printk: __pr_flush: use srcu console list iterator
  printk: console_is_usable: use console_srcu_read_flags
  printk: console_unblank: use srcu console list iterator
  printk: console_flush_on_panic: use srcu console list iterator
  printk: console_device: use srcu console list iterator
  console: introduce console_is_registered()
  serial_core: replace uart_console_enabled() with
uart_console_registered()
  tty: nfcon: use console_is_registered()
  efi: earlycon: use console_is_registered()
  tty: hvc: use console_is_registered()
  tty: serial: earlycon: use console_is_registered()
  tty: serial: pic32_uart: use console_is_registered()
  tty: serial: samsung_tty: use console_is_registered()
  tty: serial: xilinx_uartps: use console_is_registered()
  usb: early: xhci-dbc: use console_is_registered()
  netconsole: avoid CON_ENABLED misuse to track registration
  printk, xen: fbfront: create/use safe function for forcing preferred
  tty: tty_io: use console_list_lock for list synchronization
  proc: consoles: use console_list_lock for list iteration
  tty: serial: kgdboc: use srcu console list iterator
  tty: serial: kgdboc: use console_list_lock for list traversal
  tty: serial: kgdboc: synchronize tty_find_polling_driver() and
register_console()
  tty: serial: kgdboc: use console_list_lock to trap exit
  printk: relieve console_lock of list synchronization duties
  tty: serial: sh-sci: use setup() callback for early console

Thomas Gleixner (2):
  serial: kgdboc: Lock console list in probe function
  printk: Convert console_drivers list to hlist

 .clang-format   |   1 +
 arch/m68k/emu/nfcon.c   |   9 +-
 arch/um/kernel/kmsg_dump.c  |  24 +-
 drivers/firmware/efi/earlycon.c |   8 +-
 drivers/net/netconsole.c|  21 +-
 drivers/tty/hvc/hvc_console.c   |   4 +-
 drivers/tty/serial/8250/8250_core.c |   2 +-
 drivers/tty/serial/earlycon.c   |   4 +-
 drivers/tty/seria

Re: [PATCH printk v4 31/39] printk, xen: fbfront: create/use safe function for forcing preferred

2022-11-14 Thread John Ogness
Hi,

After more detailed runtime testing I discovered that I didn't re-insert
the console to the correct place in the list. More below...

On 2022-11-14, John Ogness  wrote:
> diff --git a/include/linux/console.h b/include/linux/console.h
> index f716e1dd9eaf..9cea254b34b8 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -291,6 +291,7 @@ enum con_flush_mode {
>  };
>  
>  extern int add_preferred_console(char *name, int idx, char *options);
> +extern void console_force_preferred_locked(struct console *con);
>  extern void register_console(struct console *);
>  extern int unregister_console(struct console *);
>  extern void console_lock(void);
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index e770b1ede6c9..dff76c1cef80 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3461,6 +3462,48 @@ int unregister_console(struct console *console)
>  }
>  EXPORT_SYMBOL(unregister_console);
>  
> +/**
> + * console_force_preferred_locked - force a registered console preferred
> + * @con: The registered console to force preferred.
> + *
> + * Must be called under console_list_lock().
> + */
> +void console_force_preferred_locked(struct console *con)
> +{
> + struct console *cur_pref_con;
> +
> + if (!console_is_registered_locked(con))
> + return;
> +
> + cur_pref_con = console_first();
> +
> + /* Already preferred? */
> + if (cur_pref_con == con)
> + return;
> +
> + /*
> +  * Delete, but do not re-initialize the entry. This allows the console
> +  * to continue to appear registered (via any hlist_unhashed_lockless()
> +  * checks), even though it was briefly removed from the console list.
> +  */
> + hlist_del_rcu(>node);
> +
> + /*
> +  * Ensure that all SRCU list walks have completed so that the console
> +  * can be added to the beginning of the console list and its forward
> +  * list pointer can be re-initialized.
> +  */
> + synchronize_srcu(_srcu);
> +
> + con->flags |= CON_CONSDEV;
> + WARN_ON(!con->device);
> +
> + /* Only the new head can have CON_CONSDEV set. */
> + console_srcu_write_flags(cur_pref_con, cur_pref_con->flags & 
> ~CON_CONSDEV);
> + hlist_add_behind_rcu(>node, console_list.first);

This is adding the console as the 2nd item. It should be the new
head. The patch below fixes it.

I have done careful runtime testing with this fixup. After the
force_preferred, the console is the new head and sending data to
/dev/console redirects to that console.

It would be nice if we could fold this in. Sorry.

John Ogness

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8d635467882f..4b77586cf4cb 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3494,7 +3494,7 @@ void console_force_preferred_locked(struct console *con)
 
/* Only the new head can have CON_CONSDEV set. */
console_srcu_write_flags(cur_pref_con, cur_pref_con->flags & 
~CON_CONSDEV);
-   hlist_add_behind_rcu(>node, console_list.first);
+   hlist_add_head_rcu(>node, _list);
 }
 EXPORT_SYMBOL(console_force_preferred_locked);
 


[PATCH printk v4 31/39] printk, xen: fbfront: create/use safe function for forcing preferred

2022-11-14 Thread John Ogness
With commit 9e124fe16ff2("xen: Enable console tty by default in domU
if it's not a dummy") a hack was implemented to make sure that the
tty console remains the console behind the /dev/console device. The
main problem with the hack is that, after getting the console pointer
to the tty console, it is assumed the pointer is still valid after
releasing the console_sem. This assumption is incorrect and unsafe.

Make the hack safe by introducing a new function
console_force_preferred_locked() and perform the full operation
under the console_list_lock.

Signed-off-by: John Ogness 
---
 drivers/video/fbdev/xen-fbfront.c | 12 +++-
 include/linux/console.h   |  1 +
 kernel/printk/printk.c| 49 +--
 3 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/drivers/video/fbdev/xen-fbfront.c 
b/drivers/video/fbdev/xen-fbfront.c
index 4d2694d904aa..8752d389e382 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -504,18 +504,14 @@ static void xenfb_make_preferred_console(void)
if (console_set_on_cmdline)
return;
 
-   console_lock();
+   console_list_lock();
for_each_console(c) {
if (!strcmp(c->name, "tty") && c->index == 0)
break;
}
-   console_unlock();
-   if (c) {
-   unregister_console(c);
-   c->flags |= CON_CONSDEV;
-   c->flags &= ~CON_PRINTBUFFER; /* don't print again */
-   register_console(c);
-   }
+   if (c)
+   console_force_preferred_locked(c);
+   console_list_unlock();
 }
 
 static int xenfb_resume(struct xenbus_device *dev)
diff --git a/include/linux/console.h b/include/linux/console.h
index f716e1dd9eaf..9cea254b34b8 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -291,6 +291,7 @@ enum con_flush_mode {
 };
 
 extern int add_preferred_console(char *name, int idx, char *options);
+extern void console_force_preferred_locked(struct console *con);
 extern void register_console(struct console *);
 extern int unregister_console(struct console *);
 extern void console_lock(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e770b1ede6c9..dff76c1cef80 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -247,9 +247,10 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int 
write,
 void console_list_lock(void)
 {
/*
-* In unregister_console(), synchronize_srcu() is called with the
-* console_list_lock held. Therefore it is not allowed that the
-* console_list_lock is taken with the srcu_lock held.
+* In unregister_console() and console_force_preferred_locked(),
+* synchronize_srcu() is called with the console_list_lock held.
+* Therefore it is not allowed that the console_list_lock is taken
+* with the srcu_lock held.
 *
 * Detecting if this context is really in the read-side critical
 * section is only possible if the appropriate debug options are
@@ -3461,6 +3462,48 @@ int unregister_console(struct console *console)
 }
 EXPORT_SYMBOL(unregister_console);
 
+/**
+ * console_force_preferred_locked - force a registered console preferred
+ * @con: The registered console to force preferred.
+ *
+ * Must be called under console_list_lock().
+ */
+void console_force_preferred_locked(struct console *con)
+{
+   struct console *cur_pref_con;
+
+   if (!console_is_registered_locked(con))
+   return;
+
+   cur_pref_con = console_first();
+
+   /* Already preferred? */
+   if (cur_pref_con == con)
+   return;
+
+   /*
+* Delete, but do not re-initialize the entry. This allows the console
+* to continue to appear registered (via any hlist_unhashed_lockless()
+* checks), even though it was briefly removed from the console list.
+*/
+   hlist_del_rcu(>node);
+
+   /*
+* Ensure that all SRCU list walks have completed so that the console
+* can be added to the beginning of the console list and its forward
+* list pointer can be re-initialized.
+*/
+   synchronize_srcu(_srcu);
+
+   con->flags |= CON_CONSDEV;
+   WARN_ON(!con->device);
+
+   /* Only the new head can have CON_CONSDEV set. */
+   console_srcu_write_flags(cur_pref_con, cur_pref_con->flags & 
~CON_CONSDEV);
+   hlist_add_behind_rcu(>node, console_list.first);
+}
+EXPORT_SYMBOL(console_force_preferred_locked);
+
 /*
  * Initialize the console device. This is called *early*, so
  * we can't necessarily depend on lots of kernel help here.
-- 
2.30.2



[PATCH printk v4 00/39] reduce console_lock scope

2022-11-14 Thread John Ogness
This is v4 of a series to prepare for threaded/atomic
printing. v3 is here [0]. This series focuses on reducing the
scope of the BKL console_lock. It achieves this by switching to
SRCU and a dedicated mutex for console list iteration and
modification, respectively. The console_lock will no longer
offer this protection.

Also, during the review of v2 it came to our attention that
many console drivers are checking CON_ENABLED to see if they
are registered. Because this flag can change without
unregistering and because this flag does not represent an
atomic point when an (un)registration process is complete,
a new console_is_registered() function is introduced. This
function uses the console_list_lock to synchronize with the
(un)registration process to provide a reliable status.

All users of the console_lock for list iteration have been
modified. For the call sites where the console_lock is still
needed (for other reasons), comments are added to explain
exactly why the console_lock was needed.

All users of CON_ENABLED for registration status have been
modified to use console_is_registered(). Note that there are
still users of CON_ENABLED, but this is for legitimate purposes
about a registered console being able to print.

The base commit for this series is from Paul McKenney's RCU tree
and provides an NMI-safe SRCU implementation [1]. Without the
NMI-safe SRCU implementation, this series is not less safe than
mainline. But we will need the NMI-safe SRCU implementation for
atomic consoles anyway, so we might as well get it in
now. Especially since it _does_ increase the reliability for
mainline in the panic path.

Changes since v3:

general:

- Implement console_srcu_read_flags() and console_srcu_write_flags()
  to be used for console->flags access under the srcu_read_lock or
  console_list_lock, respectively. The functions document their
  relationship to one another and use data_race(), READ_ONCE(), and
  WRITE_ONCE() macros to annotate their relationship. They also make
  use of lockdep to warn if used in improper contexts.

- Replace all console_is_enabled() usage with
  console_srcu_read_flags() (all were under the srcu_read_lock).

serial_core:

- For uart_console_registered(), check uart_console() before taking
  the console_list_lock to avoid unnecessary lock contention for
  non-console ports.

m68k/emu/nfcon:

- Only explicitly enable the console if registering via debug=nfcon.

tty/serial/sh-sci:

- Add comments about why @options will always be NULL for the
  earlyprintk console.

kdb:

- Add comments explaining the expectations for console drivers to
  work correctly.

printk:

- Some code sites under SRCU were checking flags directly. Use
  console_srcu_read_flags() instead.

- In register_console() rename bootcon_enabled/realcon_enabled to
  bootcon_registered/realcon_registered to avoid confusion.

- In register_console() only check for boot console sequences if a
  boot console is registered and !keep_bootcon. In this case, also
  take the console_lock to guarantee safe access to console->seq.

- In console_force_preferred_locked() use hlist_del_rcu() instead of
  hlist_del_init_rcu() so that there is never a window where the
  console can be viewed as unregistered.

John Ogness

[0] 
https://lore.kernel.org/lkml/20221107141638.3790965-1-john.ogn...@linutronix.de
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/log/?h=srcunmisafe.2022.11.09a

John Ogness (37):
  printk: Prepare for SRCU console list protection
  printk: register_console: use "registered" for variable names
  printk: fix setting first seq for consoles
  um: kmsg_dump: only dump when no output console available
  tty: serial: kgdboc: document console_lock usage
  tty: tty_io: document console_lock usage
  proc: consoles: document console_lock usage
  printk: introduce console_list_lock
  console: introduce wrappers to read/write console flags
  um: kmsg_dumper: use srcu console list iterator
  kdb: use srcu console list iterator
  printk: console_flush_all: use srcu console list iterator
  printk: __pr_flush: use srcu console list iterator
  printk: console_is_usable: use console_srcu_read_flags
  printk: console_unblank: use srcu console list iterator
  printk: console_flush_on_panic: use srcu console list iterator
  printk: console_device: use srcu console list iterator
  console: introduce console_is_registered()
  serial_core: replace uart_console_enabled() with
uart_console_registered()
  tty: nfcon: use console_is_registered()
  efi: earlycon: use console_is_registered()
  tty: hvc: use console_is_registered()
  tty: serial: earlycon: use console_is_registered()
  tty: serial: pic32_uart: use console_is_registered()
  tty: serial: samsung_tty: use console_is_registered()
  tty: serial: xilinx_uartps: use console_is_registered()
  usb: early: xhci-dbc: use console_is_registered()
  netconsole: avoid CON_ENABLED misuse to track registration
  printk, xen: fbfront: create/u

Re: [PATCH printk v3 33/40] printk, xen: fbfront: create/use safe function for forcing preferred

2022-11-10 Thread John Ogness
On 2022-11-10, Petr Mladek  wrote:
 +  /* Only the new head can have CON_CONSDEV set. */
 +  WRITE_ONCE(cur_pref_con->flags, cur_pref_con->flags & ~CON_CONSDEV);
>>>
>>> As mentioned in the reply for 7th patch, I would prefer to hide this
>>> WRITE_ONCE into a wrapper, e.g. console_set_flag(). It might also
>>> check that the console_list_lock is taken...
>> 
>> Agreed. For v4 it will become:
>> 
>> console_srcu_write_flags(cur_pref_con, cur_pref_con->flags & ~CON_CONSDEV);
>
> I am happy that your are going to introduce an API for this.
>
> Just to be sure. The _srcu_ in the name means that the write
> will use WRITE_ONCE() so that it can be read safely in SRCU
> context using READ_ONCE(). Do I get it correctly, please?

Yes.

> I expect that the counter part will be console_srcu_read_flags().
> I like the name. It is better than _unsafe_ that I proposed earlier.

Since the only flag that is ever checked in this way is CON_ENABLED, I
was planning on calling it console_srcu_is_enabled(). But I suppose I
could have it be: (console_srcu_read_flags() & CON_ENABLED). That would
keep it more abstract in case anyone should want to read other flag bits
under SRCU.

There are only 4 call sites that need this, so I suppose we don't need a
special _is_enabled() variant. Thanks for the suggestion!

John


Re: [PATCH printk v3 33/40] printk, xen: fbfront: create/use safe function for forcing preferred

2022-11-10 Thread John Ogness
On 2022-11-10, Petr Mladek  wrote:
>> +void console_force_preferred_locked(struct console *con)
>> +{
>> +struct console *cur_pref_con;
>> +
>> +if (!console_is_registered_locked(con))
>> +return;
>> +
>> +cur_pref_con = console_first();
>> +
>> +/* Already preferred? */
>> +if (cur_pref_con == con)
>> +return;
>> +
>> +hlist_del_init_rcu(>node);
>
> We actually should re-initialize the node only after all existing
> console list walks are finished. Se we should use here:
>
>   hlist_del_rcu(>node);

hlist_del_init_rcu() only re-initializes @pprev pointer. But maybe you
are concerned that there is a window where list_unhashed() becomes true?
I agree that it should be changed to hlist_del_rcu() because there
should not be a window where this console appears unregistered.

>> +/* Only the new head can have CON_CONSDEV set. */
>> +WRITE_ONCE(cur_pref_con->flags, cur_pref_con->flags & ~CON_CONSDEV);
>
> As mentioned in the reply for 7th patch, I would prefer to hide this
> WRITE_ONCE into a wrapper, e.g. console_set_flag(). It might also
> check that the console_list_lock is taken...

Agreed. For v4 it will become:

console_srcu_write_flags(cur_pref_con->flags & ~CON_CONSDEV);

John


[PATCH printk v3 33/40] printk, xen: fbfront: create/use safe function for forcing preferred

2022-11-07 Thread John Ogness
With commit 9e124fe16ff2("xen: Enable console tty by default in domU
if it's not a dummy") a hack was implemented to make sure that the
tty console remains the console behind the /dev/console device. The
main problem with the hack is that, after getting the console pointer
to the tty console, it is assumed the pointer is still valid after
releasing the console_sem. This assumption is incorrect and unsafe.

Make the hack safe by introducing a new function
console_force_preferred_locked() and perform the full operation
under the console_list_lock.

Signed-off-by: John Ogness 
---
 drivers/video/fbdev/xen-fbfront.c | 12 +++--
 include/linux/console.h   |  1 +
 kernel/printk/printk.c| 44 ---
 3 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/drivers/video/fbdev/xen-fbfront.c 
b/drivers/video/fbdev/xen-fbfront.c
index 4d2694d904aa..8752d389e382 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -504,18 +504,14 @@ static void xenfb_make_preferred_console(void)
if (console_set_on_cmdline)
return;
 
-   console_lock();
+   console_list_lock();
for_each_console(c) {
if (!strcmp(c->name, "tty") && c->index == 0)
break;
}
-   console_unlock();
-   if (c) {
-   unregister_console(c);
-   c->flags |= CON_CONSDEV;
-   c->flags &= ~CON_PRINTBUFFER; /* don't print again */
-   register_console(c);
-   }
+   if (c)
+   console_force_preferred_locked(c);
+   console_list_unlock();
 }
 
 static int xenfb_resume(struct xenbus_device *dev)
diff --git a/include/linux/console.h b/include/linux/console.h
index cdae70e27377..b6b5d796d15c 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -273,6 +273,7 @@ enum con_flush_mode {
 };
 
 extern int add_preferred_console(char *name, int idx, char *options);
+extern void console_force_preferred_locked(struct console *con);
 extern void register_console(struct console *);
 extern int unregister_console(struct console *);
 extern void console_lock(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index be40a9688403..d74e6e609f7d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -247,9 +247,10 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int 
write,
 void console_list_lock(void)
 {
/*
-* In unregister_console(), synchronize_srcu() is called with the
-* console_list_lock held. Therefore it is not allowed that the
-* console_list_lock is taken with the srcu_lock held.
+* In unregister_console() and console_force_preferred_locked(),
+* synchronize_srcu() is called with the console_list_lock held.
+* Therefore it is not allowed that the console_list_lock is taken
+* with the srcu_lock held.
 *
 * Whether or not this context is in the read-side critical section
 * can only be detected if the appropriate debug options are enabled.
@@ -3457,6 +3458,43 @@ int unregister_console(struct console *console)
 }
 EXPORT_SYMBOL(unregister_console);
 
+/**
+ * console_force_preferred_locked - force a registered console preferred
+ * @con: The registered console to force preferred.
+ *
+ * Must be called under console_list_lock().
+ */
+void console_force_preferred_locked(struct console *con)
+{
+   struct console *cur_pref_con;
+
+   if (!console_is_registered_locked(con))
+   return;
+
+   cur_pref_con = console_first();
+
+   /* Already preferred? */
+   if (cur_pref_con == con)
+   return;
+
+   hlist_del_init_rcu(>node);
+
+   /*
+* Ensure that all SRCU list walks have completed so that the console
+* can be added to the beginning of the console list and its forward
+* list pointer can be re-initialized.
+*/
+   synchronize_srcu(_srcu);
+
+   con->flags |= CON_CONSDEV;
+   WARN_ON(!con->device);
+
+   /* Only the new head can have CON_CONSDEV set. */
+   WRITE_ONCE(cur_pref_con->flags, cur_pref_con->flags & ~CON_CONSDEV);
+   hlist_add_behind_rcu(>node, console_list.first);
+}
+EXPORT_SYMBOL(console_force_preferred_locked);
+
 /*
  * Initialize the console device. This is called *early*, so
  * we can't necessarily depend on lots of kernel help here.
-- 
2.30.2



[PATCH printk v3 00/40] reduce console_lock scope

2022-11-07 Thread John Ogness
This is v3 of a series to prepare for threaded/atomic
printing. v2 is here [0]. This series focuses on reducing the
scope of the BKL console_lock. It achieves this by switching to
SRCU and a dedicated mutex for console list iteration and
modification, respectively. The console_lock will no longer
offer this protection and is completely removed from
(un)register_console() and console_stop/start() code.

Also, during the review of v2 it came to our attention that
many console drivers are checking CON_ENABLED to see if they
are registered. Because this flag can change without
unregistering and because this flag does not represent an
atomic point when an (un)registration process is complete,
a new console_is_registered() function is introduced. This
function uses the console_list_lock to synchronize with the
(un)registration process to provide a reliable status.

All users of the console_lock for list iteration have been
modified. For the call sites where the console_lock is still
needed (because of other reasons), comments are added to
explain exactly why the console_lock was needed.

All users of CON_ENABLED for registration status have been
modified to use console_is_registered(). Note that there are
still users of CON_ENABLED, but this is for legitimate purposes
about a registered console being able to print.

The base commit for this series is from Paul McKenney's RCU tree
and provides an NMI-safe SRCU implementation [1]. Without the
NMI-safe SRCU implementation, this series is not less safe than
mainline. But we will need the NMI-safe SRCU implementation for
atomic consoles anyway, so we might as well get it in
now. Especially since it _does_ increase the reliability for
mainline in the panic path.

Changes since v3:

general:

- introduce a synchronized console_is_registered() to query if
  a console is registered, meant to replace CON_ENABLED
  (mis)use for this purpose

- directly read console->flags for registered consoles if it is
  race-free (and document that it is so)

- replace uart_console_enabled() with a new
  uart_console_registered() based on console_is_registered()

- change comments about why console_lock is used to synchronize
  console->device() by providing an example

registration check fixups:

- the following drivers were modified to use the new
  console_is_registered() instead of CON_ENABLED checks

   - arch/m68k/emu/nfcon.c
   - drivers/firmware/efi/earlycon.c
   - drivers/net/netconsole.c
   - drivers/tty/hvc/hvc_console.c
   - drivers/tty/serial/8250/8250_core.c
   - drivers/tty/serial/earlycon.c
   - drivers/tty/serial/pic32_uart.c
   - drivers/tty/serial/samsung_tty.c
   - drivers/tty/serial/serial_core.c
   - drivers/tty/serial/xilinx_uartps.c
   - drivers/usb/early/xhci-dbc.c

um: kmsg_dumper:

- change stdout dump criteria to match original intention

kgdb/kdb:

- in configure_kgdboc(), take console_list_lock to synchronize
  tty_find_polling_driver() against register_console()

- add comments explaining why calling console->write() without
  locking might work

tty: sh-sci:

- use a setup() callback to setup the early console

fbdev: xen:

- implement a cleaner approach for
  console_force_preferred_locked()

rcu:

- implement debug_lockdep_rcu_enabled() for
  !CONFIG_DEBUG_LOCK_ALLOC

printk:

- check CONFIG_DEBUG_LOCK_ALLOC for srcu_read_lock_held()
  availability

- for console_lock/_trylock/_unlock, replace "lock the console
  system" language with "block the console subsystem from
  printing"

- use WRITE_ONCE() for updating console->flags of registered
  consoles

- expand comments of synchronize_srcu() calls to explain why
  they are needed, and also expand comments to explain when it
  is not needed

- change CON_BOOT consoles to always begin at earliest message

- for non-BOOT/non-PRINTBUFFER consoles, initialize @seq to the
  minimal @seq of any of the enabled boot consoles

- add comments and lockdep assertion to
  unregister_console_locked() because it is not clear from the
  name which lock is implied

- dropped patches that caused unnecessary churn in the series

John Ogness

[0] 
https://lore.kernel.org/lkml/20221019145600.1282823-1-john.ogn...@linutronix.de
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/log/?h=srcunmisafe.2022.10.21a

John Ogness (38):
  rcu: implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC
  printk: Prepare for SRCU console list protection
  printk: fix setting first seq for consoles
  um: kmsg_dump: only dump when no output console available
  console: introduce console_is_enabled() wrapper
  printk: use console_is_enabled()
  um: kmsg_dump: use console_is_enabled()
  kdb: kdb_io: use console_is_enabled()
  um: kmsg_dumper: use srcu console list iterator
  tty: serial: kgdboc: document console_lock usage
  tty: tty_io: document console_lock usage
  proc: consoles: document console_lock usage
  kdb: use srcu console list iterator
  printk: console_flush_all: use src

Re: [PATCH printk v2 38/38] printk, xen: fbfront: create/use safe function for forcing preferred

2022-10-27 Thread John Ogness
On 2022-10-27, Petr Mladek  wrote:
>> -if (c) {
>> -unregister_console(c);
>> -c->flags |= CON_CONSDEV;
>> -c->flags &= ~CON_PRINTBUFFER; /* don't print again */
>> -register_console(c);
>> -}
>> +if (c)
>> +console_force_preferred(c);
>
> I would prefer to fix this a clean way.
>
> [...]
>
> I would suggest to implement:
>
> [...]
>
> It is a more code. But it is race-free. Also it is much more clear
> what is going on.
>
> How does this sound, please?

I wasn't sure if any of the other preferred-console magic in
register_console() was needed, which is why I kept a full
register_console() call. But if it really is just about forcing it the
head and setting a new CON_CONSDEV, then your suggestion is much
simpler. Thanks.

John


[PATCH printk v2 24/38] xen: fbfront: use srcu console list iterator

2022-10-19 Thread John Ogness
Since the console_lock is not being used for anything other than
safe console list traversal, use srcu console list iteration instead.

Signed-off-by: John Ogness 
---
 drivers/video/fbdev/xen-fbfront.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/video/fbdev/xen-fbfront.c 
b/drivers/video/fbdev/xen-fbfront.c
index 4d2694d904aa..2552c853c6c2 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -500,16 +500,18 @@ static int xenfb_probe(struct xenbus_device *dev,
 static void xenfb_make_preferred_console(void)
 {
struct console *c;
+   int cookie;
 
if (console_set_on_cmdline)
return;
 
-   console_lock();
-   for_each_console(c) {
+   cookie = console_srcu_read_lock();
+   for_each_console_srcu(c) {
if (!strcmp(c->name, "tty") && c->index == 0)
break;
}
-   console_unlock();
+   console_srcu_read_unlock(cookie);
+
if (c) {
unregister_console(c);
c->flags |= CON_CONSDEV;
-- 
2.30.2



[PATCH printk v2 38/38] printk, xen: fbfront: create/use safe function for forcing preferred

2022-10-19 Thread John Ogness
With commit 9e124fe16ff2("xen: Enable console tty by default in domU
if it's not a dummy") a hack was implemented to make sure that the
tty console remains the console behind the /dev/console device. The
main problem with the hack is that, after getting the console pointer
to the tty console, it is assumed the pointer is still valid after
releasing the console_sem. This assumption is incorrect and unsafe.

Make the hack safe by introducing a new function
console_force_preferred() to perform the full operation under
the console_list_lock.

Signed-off-by: John Ogness 
---
 drivers/video/fbdev/xen-fbfront.c |  8 +---
 include/linux/console.h   |  1 +
 kernel/printk/printk.c| 69 +++
 3 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/drivers/video/fbdev/xen-fbfront.c 
b/drivers/video/fbdev/xen-fbfront.c
index 2552c853c6c2..aa362b25a60f 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -512,12 +512,8 @@ static void xenfb_make_preferred_console(void)
}
console_srcu_read_unlock(cookie);
 
-   if (c) {
-   unregister_console(c);
-   c->flags |= CON_CONSDEV;
-   c->flags &= ~CON_PRINTBUFFER; /* don't print again */
-   register_console(c);
-   }
+   if (c)
+   console_force_preferred(c);
 }
 
 static int xenfb_resume(struct xenbus_device *dev)
diff --git a/include/linux/console.h b/include/linux/console.h
index bf1e8136424a..41378b00bbdd 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -235,6 +235,7 @@ enum con_flush_mode {
 };
 
 extern int add_preferred_console(char *name, int idx, char *options);
+extern void console_force_preferred(struct console *c);
 extern void register_console(struct console *);
 extern int unregister_console(struct console *);
 extern void console_lock(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 840d581c4b23..9a056a42b8d8 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3207,38 +3207,17 @@ static void try_enable_default_console(struct console 
*newcon)
 
 static int unregister_console_locked(struct console *console);
 
-/*
- * The console driver calls this routine during kernel initialization
- * to register the console printing procedure with printk() and to
- * print any messages that were printed by the kernel before the
- * console driver was initialized.
- *
- * This can happen pretty early during the boot process (because of
- * early_printk) - sometimes before setup_arch() completes - be careful
- * of what kernel features are used - they may not be initialised yet.
- *
- * There are two types of consoles - bootconsoles (early_printk) and
- * "real" consoles (everything which is not a bootconsole) which are
- * handled differently.
- *  - Any number of bootconsoles can be registered at any time.
- *  - As soon as a "real" console is registered, all bootconsoles
- *will be unregistered automatically.
- *  - Once a "real" console is registered, any attempt to register a
- *bootconsoles will be rejected
- */
-void register_console(struct console *newcon)
+static void register_console_locked(struct console *newcon)
 {
struct console *con;
bool bootcon_enabled = false;
bool realcon_enabled = false;
int err;
 
-   console_list_lock();
-
for_each_console(con) {
if (WARN(con == newcon, "console '%s%d' already registered\n",
 con->name, con->index)) {
-   goto unlock;
+   return;
}
 
if (con->flags & CON_BOOT)
@@ -3251,7 +3230,7 @@ void register_console(struct console *newcon)
if (newcon->flags & CON_BOOT && realcon_enabled) {
pr_info("Too late to register bootconsole %s%d\n",
newcon->name, newcon->index);
-   goto unlock;
+   return;
}
 
/*
@@ -3282,7 +3261,7 @@ void register_console(struct console *newcon)
 
/* printk() messages are not printed to the Braille console. */
if (err || newcon->flags & CON_BRL)
-   goto unlock;
+   return;
 
/*
 * If we have a bootconsole, and are switching to a real console,
@@ -3346,7 +3325,31 @@ void register_console(struct console *newcon)
unregister_console_locked(con);
}
}
-unlock:
+}
+
+/*
+ * The console driver calls this routine during kernel initialization
+ * to register the console printing procedure with printk() and to
+ * print any messages that were printed by the kernel before the
+ * console driver was initialized.
+ *
+ * This can happen pretty early during the boot process (because of
+ * early

[PATCH printk v2 00/38] reduce console_lock scope

2022-10-19 Thread John Ogness
This is v2 of a series to prepare for threaded/atomic
printing. It is a rework of patches 6-12 of the v1 [0]. From
the v1, patches 1-5 are already mainline and a rework of
patches >12 will be posted in a later series.

This series focuses on reducing the scope of the BKL
console_lock. It achieves this by switching to SRCU and a
dedicated mutex for console list iteration and modification,
respectively. The console_lock will no longer offer this
protection and is completely removed from
(un)register_console() and console_stop/start() code.

All users of the console_lock for list iteration have been
modified. For the call sites where the console_lock is still
needed (because of other reasons), I added comments to explain
exactly why the console_lock was needed.

The base commit for this series is from Paul McKenney's RCU tree
and provides an NMI-safe SRCU implementation [1]. Without the
NMI-safe SRCU implementation, this series is not less safe than
mainline. But we will need the NMI-safe SRCU implementation for
atomic consoles anyway, so we might as well get it in
now. Especially since it _does_ increase the reliability for
mainline in the panic path.

Changes since v2:

general:

- introduce console_is_enabled() to document safe data race on
  console->flags

- switch all "console->flags & CON_ENABLED" code sites to
  console_is_enabled()

- add "for_each_console_srcu" to .clang-format

- cleanup/clarify comments relating to console_lock
  coverage/usage

um:

- kmsg_dumper: use srcu instead of console_lock for list
  iteration

kgdb/kdb:

- configure_kgdboc: keep console_lock for console->device()
  synchronization, use srcu for list iteration

- kgdboc_earlycon_pre_exp_handler: use srcu instead of
  documenting unsafety for list iteration

- kgdboc_earlycon_init: use console_list_lock instead of
  console_lock to lock list

- kdb_msg_write: use srcu instead of documenting unsafety for
  list iteration

tty:

- show_cons_active: keep console_lock for console->device()
  synchronization

fbdev:

- xen-fbfront: xenfb_probe: use srcu instead of console_lock
  for list iteration, introduce console_force_preferred() to
  safely implement hack

proc/consoles:

- show_console_dev: keep console_lock for console->device()
  synchronization

- c_next: use hlist_entry_safe() instead of
  hlist_for_each_entry_continue()

printk:

- remove console_lock from console_stop/start() and
  (un)register_console()

- introduce console_srcu_read_(un)lock() to wrap scru read
  (un)lock

- rename cons_first() macro to console_first()

- for_each_console: add lockdep check instead of introducing
  new for_each_registered_console()

- console_list_lock: add warning if in read-side critical
  section

- release srcu read lock on handover

- console_flush_all: use srcu instead of relying on console
  lock for list iteration

- console_unblank: use srcu instead of relying on console_lock
  for list iteration

- console_flush_on_panic: use srcu for list iteration and
  document console->seq race

- device: keep console_lock for console->device()
  synchronization, usr srcu for list iteration

- register_console: split list adding logic into the 3 distinct
  scenarios

- register_console: set initial sequence number before adding
  to list

- unregister_console: fix ENODEV return value if the console is
  not registered

- console_stop: synchronize srcu

- printk_late_init: use _safe variant of iteration

- __pr_flush: use srcu instead of relying on console_lock for
  list iteration

John Ogness

[0] https://lore.kernel.org/r/20220924000454.3319186-1-john.ogn...@linutronix.de
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/log/?h=srcunmisafe.2022.10.18b

John Ogness (37):
  printk: Convert console_drivers list to hlist
  printk: Prepare for SRCU console list protection
  printk: introduce console_is_enabled() wrapper
  printk: use console_is_enabled()
  tty: nfcon: use console_is_enabled()
  um: kmsg_dump: use console_is_enabled()
  efi: earlycon: use console_is_enabled()
  netconsole: use console_is_enabled()
  tty: hvc: use console_is_enabled()
  tty: serial: earlycon: use console_is_enabled()
  tty: serial: kgdboc: use console_is_enabled()
  tty: serial: pic32_uart: use console_is_enabled()
  tty: serial: samsung_tty: use console_is_enabled()
  tty: serial: serial_core: use console_is_enabled()
  tty: serial: xilinx_uartps: use console_is_enabled()
  tty: tty_io: use console_is_enabled()
  usb: early: xhci-dbc: use console_is_enabled()
  kdb: kdb_io: use console_is_enabled()
  um: kmsg_dumper: use srcu console list iterator
  serial: kgdboc: use srcu console list iterator
  serial: kgdboc: document console_lock usage
  tty: tty_io: document console_lock usage
  xen: fbfront: use srcu console list iterator
  proc: consoles: document console_lock usage
  kdb: use srcu console list iterator
  printk: console_flush_all: use srcu console list iterator
  printk:

Re: [PATCH] drm: fb-helper: Avoid nesting spinlock_t into raw_spinlock_t

2022-02-15 Thread John Ogness
On 2022-02-15, Sebastian Siewior  wrote:
>> From: Jiri Kosina 
>> 
>> drm_fb_helper_damage() is acquiring spinlock_t (helper->damage_lock), 
>> while it can be called from contexts where raw_spinlock_t is held (e.g. 
>> console_owner lock obtained on vprintk_emit() codepath).
>> 
>> As the critical sections protected by damage_lock are super-tiny, let's 
>> fix this by converting it to raw_spinlock_t in order not to violate 
>> PREEMPT_RT-imposed lock nesting rules.
>> 
>> This fixes the splat below.
>> 
>>  =
>>  [ BUG: Invalid wait context ]
>>  5.17.0-rc4-2-gd567f5db412e #1 Not tainted
>
> rc4. Is this also the case in the RT tree which includes John's printk
> changes?

In the RT tree, the fbcon's write() callback is only called in
preemptible() contexts. So this is only a mainline issue.

The series I recently posted to LKML [0] should also address this issue
(if/when it gets accepted).

John

[0] 
https://lore.kernel.org/lkml/20220207194323.273637-1-john.ogn...@linutronix.de


Re: printk deadlock due to double lock attempt on current CPU's runqueue

2021-11-11 Thread John Ogness
On 2021-11-10, Sultan Alsawaf  wrote:
> On Wed, Nov 10, 2021 at 11:13:37AM +0106, John Ogness wrote:
>> Even after we introduce kthread printers, there will still be
>> situations where direct printing is used: booting (before kthreads
>> exist) and shutdown/suspend/crash situations, when the kthreads may
>> not be active.
>
> Although I'm unaware of any ongoing kthread printer work, I'm curious
> to know how a kthread approach wouldn't employ a try_to_wake_up() from
> directly inside printk(), since the try_to_wake_up() hit from inside
> the fbcon code is what caused my deadlock.

The kthread approach triggers irq_work from printk(). The kthread
printer is then woken from the irq_work.

John Ogness


Re: printk deadlock due to double lock attempt on current CPU's runqueue

2021-11-10 Thread John Ogness
On 2021-11-10, Daniel Vetter  wrote:
> I'm a bit out of the loop but from lwn articles my understanding is
> that part of upstreaming from -rt we no longer have the explicit "I'm
> a safe console for direct printing" opt-in. Which I get from a
> backwards compat pov, but I still think for at least fbcon we really
> should never attempt a direct printk con->write, it's just all around
> terrible.

Right now we don't have an explicit "I'm a safe console for direct
printing" option. Right now all printing is direct. But it sounds to me
that we should add this console flag when we introduce kthread printers.

> So yeah for fbcon at least I think we really should throw out direct
> con->write from printk completely.

Even after we introduce kthread printers, there will still be situations
where direct printing is used: booting (before kthreads exist) and
shutdown/suspend/crash situations, when the kthreads may not be
active.

I will introduce a console flag so that consoles can opt-out for direct
printing. (opt-out rather than opt-in is probably easier, since there
are only a few that would need to opt-out).

Since kthread printers do not yet exist (hoping to get them in for
5.17), I am not sure how we should address the reported bug for existing
kernels.

John Ogness


kmemleak report: 5.15.0-rc3: nouveau_fence_new

2021-09-29 Thread John Ogness
Hello,

With 5.15.0-rc3 on my ppc64 (PowerMac G5) I am seeing kmemleak
reports. They are always 96 bytes and with the same stacktrace.

unreferenced object 0xc00011d2a7e0 (size 96):
  comm "X", pid 1743, jiffies 4295010075 (age 5457.040s)
  hex dump (first 32 bytes):
c0 00 00 00 0b 9f f0 00 c0 00 3d 00 00 b0 85 90  ..=.
00 00 00 a9 77 41 30 23 c0 00 00 00 08 db b7 c8  wA0#
  backtrace:
[<6f102108>] .nouveau_fence_new+0x4c/0x120 [nouveau]
[<395e0a83>] .nouveau_bo_move+0x4f0/0x870 [nouveau]
[<f17bc6da>] .ttm_bo_handle_move_mem+0xb4/0x1e0 [ttm]
[<fb36762f>] .ttm_bo_validate+0x144/0x230 [ttm]
[<a84dc7b3>] .nouveau_bo_validate+0x70/0xc0 [nouveau]
[<b4e870a2>] .nouveau_gem_ioctl_pushbuf+0x6e0/0x1a90 [nouveau]
[<7b7c5c38>] .drm_ioctl_kernel+0x104/0x180 [drm]
[<0af76e30>] .drm_ioctl+0x244/0x490 [drm]
[<ebb759e8>] .nouveau_drm_ioctl+0x78/0x140 [nouveau]
[<263274a7>] .__se_sys_ioctl+0xfc/0x160
[<88c39f3d>] .system_call_exception+0x178/0x2a0
[<0cfdf34f>] system_call_common+0xec/0x250

If I decode this stacktrace using decode_stacktrace.sh so that the line
numbers can be seen, I get the following:

.nouveau_fence_new+0x4c/0x120 [nouveau]
linux-5.15-rc3/include/linux/slab.h:591
linux-5.15-rc3/include/linux/slab.h:721
linux-5.15-rc3/drivers/gpu/drm/nouveau/nouveau_fence.c:424

.nouveau_bo_move+0x4f0/0x870 [nouveau]
linux-5.15-rc3/drivers/gpu/drm/nouveau/nouveau_bo.c:821
linux-5.15-rc3/drivers/gpu/drm/nouveau/nouveau_bo.c:1032

.ttm_bo_handle_move_mem+0xb4/0x1e0 [ttm]
linux-5.15-rc3/drivers/gpu/drm/ttm/ttm_bo.c:197

.ttm_bo_validate+0x144/0x230 [ttm]
linux-5.15-rc3/drivers/gpu/drm/ttm/ttm_bo.c:904
linux-5.15-rc3/drivers/gpu/drm/ttm/ttm_bo.c:981

.nouveau_bo_validate+0x70/0xc0 [nouveau]
linux-5.15-rc3/drivers/gpu/drm/nouveau/nouveau_bo.c:647

.nouveau_gem_ioctl_pushbuf+0x6e0/0x1a90 [nouveau]
linux-5.15-rc3/drivers/gpu/drm/nouveau/nouveau_gem.c:548
linux-5.15-rc3/drivers/gpu/drm/nouveau/nouveau_gem.c:605
linux-5.15-rc3/drivers/gpu/drm/nouveau/nouveau_gem.c:799

.drm_ioctl_kernel+0x104/0x180 [drm]
linux-5.15-rc3/drivers/gpu/drm/drm_ioctl.c:795

.drm_ioctl+0x244/0x490 [drm]
linux-5.15-rc3/include/linux/thread_info.h:185
linux-5.15-rc3/include/linux/thread_info.h:218
linux-5.15-rc3/include/linux/uaccess.h:199
linux-5.15-rc3/drivers/gpu/drm/drm_ioctl.c:899

.nouveau_drm_ioctl+0x78/0x140 [nouveau]
linux-5.15-rc3/drivers/gpu/drm/nouveau/nouveau_drm.c:1163

.__se_sys_ioctl+0xfc/0x160
linux-5.15-rc3/fs/ioctl.c:51
linux-5.15-rc3/fs/ioctl.c:874
linux-5.15-rc3/fs/ioctl.c:860

.system_call_exception+0x178/0x2a0
.system_call_exception
linux-5.15-rc3/arch/powerpc/kernel/interrupt.c:233

system_call_common+0xec/0x250
linux-5.15-rc3/arch/powerpc/kernel/interrupt_64.S:314

Here are all enabled DRM and NOUVEAU configs in my kernel:

CONFIG_DRM=m
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
CONFIG_DRM_TTM=m
CONFIG_DRM_TTM_HELPER=m
CONFIG_DRM_NOUVEAU=m
CONFIG_NOUVEAU_DEBUG=5
CONFIG_NOUVEAU_DEBUG_DEFAULT=3
CONFIG_DRM_NOUVEAU_BACKLIGHT=y
CONFIG_DRM_PANEL=y
CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=m

And lspci output:

:f0:10.0 VGA compatible controller: NVIDIA Corporation NV34 [GeForce FX 
5200 Ultra] (rev a1)

I have been running 5.12 on my machine without these reports. So it
might be something that showed up in 5.13 or 5.14 as well.

I do not know if this is a good channel for reporting this, so please
let me know if I should report it somewhere else. Also let me know if
you need any additional information from me.

John Ogness


Re: [PATCH 1/3] printk: use CONFIG_CONSOLE_LOGLEVEL_* directly

2021-02-03 Thread John Ogness
On 2021-02-02, Masahiro Yamada  wrote:
> CONSOLE_LOGLEVEL_DEFAULT is nothing more than a shorthand of
> CONFIG_CONSOLE_LOGLEVEL_DEFAULT.
>
> When you change CONFIG_CONSOLE_LOGLEVEL_DEFAULT from Kconfig, almost
> all objects are rebuilt because CONFIG_CONSOLE_LOGLEVEL_DEFAULT is
> used in , which is included from most of source files.
>
> In fact, there are only 4 users of CONSOLE_LOGLEVEL_DEFAULT:
>
>   arch/x86/platform/uv/uv_nmi.c
>   drivers/firmware/efi/libstub/efi-stub-helper.c
>   drivers/tty/sysrq.c
>   kernel/printk/printk.c
>
> So, when you change CONFIG_CONSOLE_LOGLEVEL_DEFAULT and rebuild the
> kernel, it is enough to recompile those 4 files.
>
> Remove the CONSOLE_LOGLEVEL_DEFAULT definition from ,
> and use CONFIG_CONSOLE_LOGLEVEL_DEFAULT directly.

With commit a8fe19ebfbfd ("kernel/printk: use symbolic defines for
console loglevels") it can be seen that various drivers used to
hard-code their own values. The introduction of the macros in an
intuitive location (include/linux/printk.h) made it easier for authors
to find/use the various available printk settings and thresholds.

Technically there is no problem using Kconfig macros directly. But will
authors bother to hunt down available Kconfig settings? Or will they
only look in printk.h to see what is available?

IMHO if code wants to use settings from a foreign subsystem, it should
be taking those from headers of that subsystem, rather than using some
Kconfig settings from that subsystem. Headers exist to make information
available to external code. Kconfig (particularly for a subsystem) exist
to configure that subsystem.

But my feeling on this may be misguided. Is it generally accepted in the
kernel that any code can use Kconfig settings of any other code?

John Ogness
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v9 04/18] kunit: test: add kunit_stream a std::stream like logger

2019-08-02 Thread John Ogness
On 2019-08-01, Brendan Higgins  wrote:
> On Fri, Jul 26, 2019 at 1:31 AM Petr Mladek  wrote:
>> On Thu 2019-07-25 13:21:12, Brendan Higgins wrote:
>>> On Wed, Jul 24, 2019 at 12:31 AM Petr Mladek  wrote:
>>>> On Mon 2019-07-22 16:54:10, Stephen Boyd wrote:
>>>>> Quoting Brendan Higgins (2019-07-22 15:30:49)
>>>>>> On Mon, Jul 22, 2019 at 1:03 PM Stephen Boyd  wrote:
>>>>>>> What's the calling context of the assertions and expectations? I
>>>>>>> still don't like the fact that string stream needs to allocate
>>>>>>> buffers and throw them into a list somewhere because the calling
>>>>>>> context matters there.
>>>>>>
>>>>>> The calling context is the same as before, which is anywhere.
>>>>>
>>>>> Ok. That's concerning then.
>>>>>
>>>>>>> I'd prefer we just wrote directly to the console/log via printk
>>>>>>> instead. That way things are simple because we use the existing
>>>>>>> buffering path of printk, but maybe there's some benefit to the
>>>>>>> string stream that I don't see? Right now it looks like it
>>>>>>> builds a string and then dumps it to printk so I'm sort of lost
>>>>>>> what the benefit is over just writing directly with printk.
>>>>>>
>>>>>> It's just buffering it so the whole string gets printed
>>>>>> uninterrupted.  If we were to print out piecemeal to printk,
>>>>>> couldn't we have another call to printk come in causing it to
>>>>>> garble the KUnit message we are in the middle of printing?
>>>>>
>>>>> Yes, printing piecemeal by calling printk many times could lead to
>>>>> interleaving of messages if something else comes in such as an
>>>>> interrupt printing something. Printk has some support to hold
>>>>> "records" but I'm not sure how that would work here because
>>>>> KERN_CONT talks about only being used early on in boot code. I
>>>>> haven't looked at printk in detail though so maybe I'm all wrong
>>>>> and KERN_CONT just works?
>>>>
>>>> KERN_CONT does not guarantee that the message will get printed
>>>> together. The pieces get interleaved with messages printed in
>>>> parallel.
>>>>
>>>> Note that KERN_CONT was originally really meant to be used only
>>>> during boot. It was later used more widely and ended in the best
>>>> effort category.
>>>>
>>>> There were several attempts to make it more reliable. But it was
>>>> always either too complicated or error prone or both.
>>>>
>>>> You need to use your own buffering if you rely want perfect output.
>>>> The question is if it is really worth the complexity. Also note
>>>> that any buffering reduces the chance that the messages will reach
>>>> the console.
>>>
>>> Seems like that settles it then. Thanks!
>>>
>>>> BTW: There is a work in progress on a lockless printk ring buffer.
>>>> It will make printk() more secure regarding deadlocks. But it might
>>>> make transparent handling of continuous lines even more tricky.
>>>>
>>>> I guess that local buffering, before calling printk(), will be
>>>> even more important then. Well, it might really force us to create
>>>> an API for it.
>>>
>>> Cool! Can you CC me on that discussion?
>>
>> Adding John Oggness into CC.
>>
>> John, please CC Brendan Higgins on the patchsets eventually switching
>> printk() into the lockless buffer. The test framework is going to
>> do its own buffering to keep the related messages together.
>>
>> The lockless ringbuffer might make handling of related (partial)
>> lines worse or better. It might justify KUnit's extra buffering
>> or it might allow to get rid of it.
>
> Thanks for CC'ing me on the printk ringbuffer thread. It looks like it
> actually probably won't affect my needs for KUnit logging. The biggest
> reason I need some sort of buffering system is to be able to compose
> messages piece meal into a single message that will be printed out to
> the user as a single message with no messages from other printk
> callers printed out in the middle of mine.

printk has this same requirement for its CONT messages. You can read
about how I propose to implement that here[0], using a sep

Re: [PATCH v2 1/3] drm: Add support for panic message output

2019-03-15 Thread John Ogness
On 2019-03-14, John Ogness  wrote:
> On 2019-03-14, Daniel Vetter  wrote:
>> That's why we came up with the trylock + immediate bail out design if
>> that fails. Plus really only render the oops int whatever is the
>> current display buffer, so that we don't have to do any hw
>> programming at all.
>
> I think this is your best option. The real work will be identifying
> any/all spin locking that currently exists. For all of those, the code
> needs to change to:
>
> 1. trylock if oops_in_progress, otherwise spinlock

On second thought, you shouldn't use oops_in_progress. It would be
better if DRM had its own flag to signify that it is currently being
used in kmsg_dump context.

> 2. if trylock fails, the code must have a sane failure
>
> The 2nd point will be the difficult one. For example, you may have
> functions without a return value taking spinlocks. But now those
> functions could fail.

John Ogness
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 1/3] drm: Add support for panic message output

2019-03-15 Thread John Ogness
On 2019-03-14, Daniel Vetter  wrote:
> That's why we came up with the trylock + immediate bail out design if
> that fails. Plus really only render the oops int whatever is the
> current display buffer, so that we don't have to do any hw programming
> at all.

I think this is your best option. The real work will be identifying
any/all spin locking that currently exists. For all of those, the code
needs to change to:

1. trylock if oops_in_progress, otherwise spinlock

2. if trylock fails, the code must have a sane failure

The 2nd point will be the difficult one. For example, you may have
functions without a return value taking spinlocks. But now those
functions could fail.

John Ogness
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 1/3] drm: Add support for panic message output

2019-03-13 Thread John Ogness
On 2019-03-12, Ahmed S. Darwish  wrote:
>>>> +
>>>> +static void drm_panic_kmsg_dump(struct kmsg_dumper *dumper,
>>>> +  enum kmsg_dump_reason reason)
>>>> +{
>>>> +  class_for_each_device(drm_class, NULL, dumper, drm_panic_dev_iter);
>>>
>>> class_for_each_device uses klist, which only uses an irqsave
>>> spinlock. I think that's good enough. Comment to that effect would
>>> be good e.g.
>>>
>>> /* based on klist, which uses only a spin_lock_irqsave, which we
>>>  * assume still works */
>>>
>>> If we aim for perfect this should be a trylock still, maybe using
>>> our own device list.
>>>
>
> I definitely agree here.
>
> The lock may already be locked either by a stopped CPU, or by the
> very same CPU we execute panic() on (e.g. NMI panic() on the
> printing CPU).
>
> This is why it's very common for example in serial consoles, which
> are usually careful about re-entrance and panic contexts, to do:
>
>   xx_console_write(...) {
>   if (oops_in_progress)
>   locked = spin_trylock_irqsave(>lock, flags);
>   else
>   spin_lock_irqsave(>lock, flags);
>   }
>
> I'm quite positive we should do the same for panic drm drivers.

This construction will continue, even if the trylock fails. It only
makes sense to do this if the driver has a chance of being
successful. Ignoring locks is a serious issue. I personally am quite
unhappy that the serial drivers do this, which was part of my motivation
for the new printk design I'm working on.

If the driver is not capable of doing something useful on a failed
trylock, then I recommend just skipping that device. Maybe trying it
again later after trying all the devices?

John Ogness
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel