Re: [PATCH 09/15] staging: fbtft: fb_ssd1351.c: Introduce backlight_is_blank()

2023-01-09 Thread Daniel Thompson
On Sun, Jan 08, 2023 at 09:29:43PM +0100, Sam Ravnborg wrote:
> Hi Stephen,
>
> On Sun, Jan 08, 2023 at 08:28:17PM +0100, Stephen Kitt wrote:
> > On Sat, 07 Jan 2023 19:26:23 +0100, Sam Ravnborg via B4 Submission Endpoint
> >  wrote:
> >
> > > From: Sam Ravnborg 
> > >
> > > Avoiding direct access to backlight_properties.props.
> > >
> > > Access to the deprecated props.fb_blank replaced by backlight_is_blank().
> > > Access to props.power is dropped - it was only used for debug.
> > >
> > > Signed-off-by: Sam Ravnborg 
> > > Cc: Stephen Kitt 
> > > Cc: Greg Kroah-Hartman 
> > > Cc: Daniel Thompson 
> > > Cc: Andy Shevchenko 
> > > Cc: linux-fb...@vger.kernel.org
> > > ---
> > >  drivers/staging/fbtft/fb_ssd1351.c | 9 +++--
> > >  1 file changed, 3 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/staging/fbtft/fb_ssd1351.c
> > > b/drivers/staging/fbtft/fb_ssd1351.c index b8d55aa8c5c7..995fbd2f3dc6 
> > > 100644
> > > --- a/drivers/staging/fbtft/fb_ssd1351.c
> > > +++ b/drivers/staging/fbtft/fb_ssd1351.c
> > > @@ -190,15 +190,12 @@ static struct fbtft_display display = {
> > >  static int update_onboard_backlight(struct backlight_device *bd)
> > >  {
> > >   struct fbtft_par *par = bl_get_data(bd);
> > > - bool on;
> > > + bool blank = backlight_is_blank(bd);
> > >
> > > - fbtft_par_dbg(DEBUG_BACKLIGHT, par,
> > > -   "%s: power=%d, fb_blank=%d\n",
> > > -   __func__, bd->props.power, bd->props.fb_blank);
> > > + fbtft_par_dbg(DEBUG_BACKLIGHT, par, "%s: blank=%d\n", __func__,
> > > blank);
> > > - on = !backlight_is_blank(bd);
> > >   /* Onboard backlight connected to GPIO0 on SSD1351, GPIO1 unused */
> > > - write_reg(par, 0xB5, on ? 0x03 : 0x02);
> > > + write_reg(par, 0xB5, !blank ? 0x03 : 0x02);
> > >
> > >   return 0;
> > >  }
> > >
> > > --
> > > 2.34.1
> >
> > For debugging purposes here, would there be any point in logging 
> > props.state?
> > As in
> >
> > fbtft_par_dbg(DEBUG_BACKLIGHT, par,
> > - "%s: power=%d, fb_blank=%d\n",
> > - __func__, bd->props.power, bd->props.fb_blank);
> > +     "%s: power=%d, state=%u\n",
> > + __func__, bd->props.power, bd->props.state);
>
> Thanks for the suggestion - and the reviews!
>
> I was tempted to just remove the debugging.
> If we require debugging, then this could be added in the backlight core,
> thus everyone would benefit from it.

I had the same instinct to remove the debug print here (esp. ones with
__func__ in them) but I think that's probably a much more widely scoped
clean up for fbtft ;-).

On that basis:
Reviewed-by: Daniel Thompson 


Daniel.


Re: [PATCH 15/15] backlight: backlight: Drop the deprecated fb_blank property

2023-01-09 Thread Daniel Thompson
On Sat, Jan 07, 2023 at 07:26:29PM +0100, Sam Ravnborg via B4 Submission 
Endpoint wrote:
> From: Sam Ravnborg 
>
> With all users gone remove the deprecated fb_blank member in
> backlight_properties.
>
> Signed-off-by: Sam Ravnborg 
> Cc: Lee Jones 
> Cc: Daniel Thompson 
> Cc: Jingoo Han 


Reviewed-by: Daniel Thompson 


PS Please don't treat this like a maintainer Acked-by: and merge it
   (Lee's not on holiday so work with Lee to figure out the merge
   strategy ;-) ).


Re: [PATCH 14/15] backlight: tosa: Use backlight helper

2023-01-09 Thread Daniel Thompson
On Sat, Jan 07, 2023 at 07:26:28PM +0100, Sam Ravnborg via B4 Submission 
Endpoint wrote:
> From: Stephen Kitt 
>
> Instead of retrieving the backlight brightness in struct
> backlight_properties manually, and then checking whether the backlight
> should be on at all, use backlight_get_brightness() which does all
> this and insulates this from future changes.
>
> Signed-off-by: Stephen Kitt 
> Signed-off-by: Sam Ravnborg 

Just for completeness...
Reviewed-by: Daniel Thompson 


Daniel.


Re: [PATCH 13/15] backlight: omap1: Use backlight helpers

2023-01-09 Thread Daniel Thompson
On Sat, Jan 07, 2023 at 07:26:27PM +0100, Sam Ravnborg via B4 Submission 
Endpoint wrote:
> From: Sam Ravnborg 
>
> Rework backlight handling to avoid access to the deprecated
> backlight_properties.fb_blank member.
>
> The rework includes removal of get_brightness() operation,
> because there was no read back from HW so no use for it.
>
> Signed-off-by: Sam Ravnborg 
> Cc: Lee Jones 
> Cc: Daniel Thompson 
> Cc: Jingoo Han 

This rework will result in additional calls to omapbl_send_enable()
during updates so, if anyone who cares about omap1, is filtering LKML
then a Tested-by: would be nice.

However, I doubt the additional calls will do any harm so even with
that absent:
Reviewed-by: Daniel Thompson 


Daniel.


Re: [PATCH v5 00/33] objtool: add base support for arm64

2022-06-22 Thread Daniel Thompson
On Wed, Jun 22, 2022 at 11:48:47PM +0800, Chen Zhongjin wrote:
> This series enables objtool to start doing stack validation and orc
> generation on arm64 kernel builds.
> 
> Based on Julien's previous work(1)(2), Now I have finished most of work
> for objtool enable on arm64. This series includes objtool part [1-13]
> and arm64 support part [14-33], the second part is to make objtool run
> correctly with no warning on arm64 so if necessary it can be taken apart
> as two series.
> 
> ORC generation feature is implemented but not used because we don't have
> an unwinder_orc on arm64, now it only be used to check whether objtool
> has correct validation.
> 
> This series depends on (https://lkml.org/lkml/2022/6/22/463)
> I moved some changes which work for all architectures to that series
> because this one becomes too big now.
> And it is rebased to tip/objtool/core branch.

What is the sha1 of the base?

With b4 and git am the patch series is derailing at patch 6 and I'm even
after a bit of fixup (had to use rediff) I'm still getting a cascade of
errors in later patches to decode.c .


Daniel.


Re: [PATCH] backlight: Use backlight helper

2022-06-08 Thread Daniel Thompson
^^^
Subject seems a bit generic...

On Tue, Jun 07, 2022 at 08:34:11PM +0200, Stephen Kitt wrote:
> diff --git a/drivers/macintosh/via-pmu-backlight.c 
> b/drivers/macintosh/via-pmu-backlight.c
> index 2194016122d2..c2d87e7fa85b 100644
> --- a/drivers/macintosh/via-pmu-backlight.c
> +++ b/drivers/macintosh/via-pmu-backlight.c
> @@ -71,12 +71,7 @@ static int pmu_backlight_get_level_brightness(int level)
>  static int __pmu_backlight_update_status(struct backlight_device *bd)
>  {
>   struct adb_request req;
> - int level = bd->props.brightness;
> -
> -
> - if (bd->props.power != FB_BLANK_UNBLANK ||
> - bd->props.fb_blank != FB_BLANK_UNBLANK)
> - level = 0;
> + int level = backlight_get_brightness(bd);

Other than that LGTM.


Daniel.


Re: [PATCH 02/22] s3c: Replace comments with C99 initializers

2022-03-28 Thread Daniel Thompson
On Sat, Mar 26, 2022 at 05:58:49PM +0100, Benjamin Stürz wrote:
> This replaces comments with C99's designated
> initializers because the kernel supports them now.

I'm a bit puzzled by "because the kernel supports them now". Designated
initializers are not purely a C99 feature... it is also a GNU C extension
to C89. This language feature has been used by the kernel for a very long time
(well over a decade).

On other words it would be much more effective to advocate for the
change by saying "because the code is clearer and easier to read" rather
than "because we can".


> Signed-off-by: Benjamin Stürz 
> ---
>  arch/arm/mach-s3c/bast-irq.c | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm/mach-s3c/bast-irq.c b/arch/arm/mach-s3c/bast-irq.c
> index d299f124e6dc..bd5471f9973b 100644
> --- a/arch/arm/mach-s3c/bast-irq.c
> +++ b/arch/arm/mach-s3c/bast-irq.c
> @@ -29,22 +29,22 @@
>   * the irq is not implemented
>  */
>  static const unsigned char bast_pc104_irqmasks[] = {
> - 0,   /* 0 */
> - 0,   /* 1 */
> - 0,   /* 2 */
> - 1,   /* 3 */
> - 0,   /* 4 */
> - 2,   /* 5 */
> - 0,   /* 6 */
> - 4,   /* 7 */
> - 0,   /* 8 */
> - 0,   /* 9 */
> - 8,   /* 10 */
> - 0,   /* 11 */
> - 0,   /* 12 */
> - 0,   /* 13 */
> - 0,   /* 14 */
> - 0,   /* 15 */
> + [0]  = 0,
> + [1]  = 0,
> + [2]  = 0,
> + [3]  = 1,
> + [4]  = 0,
> + [5]  = 2,
> + [6]  = 0,
> + [7]  = 4,
> + [8]  = 0,
> + [9]  = 0,
> + [10] = 8,
> + [11] = 0,
> + [12] = 0,
> + [13] = 0,
> + [14] = 0,
> + [15] = 0,

Shouldn't this just be as follows (in order to match bast_pc104_irqs)?

+static const unsigned char bast_pc104_irqmasks[16] = {
+   [3]  = 1,
+   [5]  = 2,
+   [7]  = 4,
+   [10] = 8,
 };
 
 static const unsigned char bast_pc104_irqs[] = { 3, 5, 7, 10 };


Daniel.


Re: [PATCH 01/22] orion5x: Replace comments with C99 initializers

2022-03-28 Thread Daniel Thompson
On Sat, Mar 26, 2022 at 05:58:48PM +0100, Benjamin Stürz wrote:
> This replaces comments with C99's designated
> initializers because the kernel supports them now.

This commit description seems wrong to me. This patch doesn't include
use C99 designated initializers (or AFAICT any other language feature
that has recently been enabled in the kernel).

The changes here are just plain constant-expressions in enumeration
lists and were included in C89/C90.


Daniel.


> 
> Signed-off-by: Benjamin Stürz 
> ---
>  arch/arm/mach-orion5x/dns323-setup.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/mach-orion5x/dns323-setup.c 
> b/arch/arm/mach-orion5x/dns323-setup.c
> index 87cb47220e82..d762248c6512 100644
> --- a/arch/arm/mach-orion5x/dns323-setup.c
> +++ b/arch/arm/mach-orion5x/dns323-setup.c
> @@ -61,9 +61,9 @@
>  
>  /* Exposed to userspace, do not change */
>  enum {
> - DNS323_REV_A1,  /* 0 */
> - DNS323_REV_B1,  /* 1 */
> - DNS323_REV_C1,  /* 2 */
> + DNS323_REV_A1 = 0,
> + DNS323_REV_B1 = 1,
> + DNS323_REV_C1 = 2,
>  };
>  
>  
> -- 
> 2.35.1
> 


Re: [PATCH 00/22] Replace comments with C99 initializers

2022-03-28 Thread Daniel Thompson
On Sun, Mar 27, 2022 at 02:46:00PM +0200, Benjamin Stürz wrote:
> This patch series replaces comments with C99's designated initializers
> in a few places. It also adds some enum initializers. This is my first
> time contributing to the Linux kernel, therefore I'm probably doing a
> lot of things the wrong way. I'm sorry for that.

Welcome!


> I've gotten a few emails so far stating that this patch series is
> unnecessary. Yes, in fact this patch series is not necessary by itself,
> but it could help me understand how the whole process works and maybe I
> could help somewhere, where help is actually needed.

Have you been told the series is unnecessary or too big?

Although all patches represent a variant of the same mechanical
transformation but they are mostly unrelated to each other and, if
accepted, they will be applied by many different people.

Taken as a whole presenting this to maintainers as a 22 patch set is too
big. I'd recommend starting with a smaller patch or patch series where
all the patches get picked up by the same maintainer.


> This patch itself is a no-op.

PATCH 0/XX is for the covering letter. You should generate a template for
it using the --cover-letter option of git format-patch. That way patch 0
will contain the diffstat for the whole series (which is often useful
to help understand what the series is for) and there is no need to
make no-op changes.


Daniel.

> 
> Signed-off-by: Benjamin Stürz 
> ---
>  .gitignore | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/.gitignore b/.gitignore
> index 7afd412dadd2..706f667261eb 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -20,7 +20,7 @@
>  *.dtb
>  *.dtbo
>  *.dtb.S
> -*.dwo
> +*.dwo
>  *.elf
>  *.gcno
>  *.gz
> -- 
> 2.35.1


Re: [Kgdb-bugreport] [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-03 Thread Daniel Thompson
On Thu, Mar 03, 2022 at 03:26:57PM +0800, Xiaomeng Tong wrote:
> On Thu, 3 Mar 2022 04:58:23 +, David Laight wrote:
> > on 3 Mar 2022 10:27:29 +0800, Xiaomeng Tong wrote:
> > > The problem is the mis-use of iterator outside the loop on exit, and
> > > the iterator will be the HEAD's container_of pointer which pointers
> > > to a type-confused struct. Sidenote: The *mis-use* here refers to
> > > mistakely access to other members of the struct, instead of the
> > > list_head member which acutally is the valid HEAD.
> >
> > The problem is that the HEAD's container_of pointer should never
> > be calculated at all.
> > This is what is fundamentally broken about the current definition.
> 
> Yes, the rule is "the HEAD's container_of pointer should never be
> calculated at all outside the loop", but how do you make sure everyone
> follows this rule?

Your formulation of the rule is correct: never run container_of() on HEAD
pointer.

However the rule that is introduced by list_for_each_entry_inside() is
*not* this rule. The rule it introduces is: never access the iterator
variable outside the loop.

Making the iterator NULL on loop exit does follow the rule you proposed
but using a different technique: do not allow HEAD to be stored in the
iterator variable after loop exit. This also makes it impossible to run
container_of() on the HEAD pointer.


> Everyone makes mistakes, but we can eliminate them all from the beginning
> with the help of compiler which can catch such use-after-loop things.

Indeed but if we introduce new interfaces then we don't have to worry
about existing usages and silent regressions. Code will have been
written knowing the loop can exit with the iterator set to NULL.

Sure it is still possible for programmers to make mistakes and
dereference the NULL pointer but C programmers are well training w.r.t.
NULL pointer checking so such mistakes are much less likely than with
the current list_for_each_entry() macro. This risk must be offset
against the way a NULLify approach can lead to more elegant code when we
are doing a list search.


Daniel.


Re: [PATCH v2 4/5] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-01-28 Thread Daniel Thompson
On Thu, Jan 27, 2022 at 11:28:09AM +, Christophe Leroy wrote:
> Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC to allow architectures
> to request having modules data in vmalloc area instead of module area.
> 
> This is required on powerpc book3s/32 in order to set data non
> executable, because it is not possible to set executability on page
> basis, this is done per 256 Mbytes segments. The module area has exec
> right, vmalloc area has noexec.
> 
> This can also be useful on other powerpc/32 in order to maximize the
> chance of code being close enough to kernel core to avoid branch
> trampolines.
> 
> Signed-off-by: Christophe Leroy 
> Cc: Jason Wessel 
> Cc: Daniel Thompson 
> Cc: Douglas Anderson 

Thanks for diligence in making sure kdb is up to date!

Acked-by: Daniel Thompson 


Daniel.


Re: [PATCH printk v1 03/10] kgdb: delay roundup if holding printk cpulock

2021-08-06 Thread Daniel Thompson
On Thu, Aug 05, 2021 at 05:52:43AM +0206, John Ogness wrote:
> On 2021-08-04, Daniel Thompson  wrote:
> > On Wed, Aug 04, 2021 at 02:12:22PM +0200, Petr Mladek wrote:
> >> On Wed 2021-08-04 12:31:59, Daniel Thompson wrote:
> >> > On Tue, Aug 03, 2021 at 05:36:32PM +0206, John Ogness wrote:
> >> > > On 2021-08-03, Daniel Thompson  wrote:
> >> > > > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> >> > > >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> >> > > >> during cpu roundup. This will conflict with the printk cpulock.
> >> > > >
> >> > > > When the full vision is realized what will be the purpose of the 
> >> > > > printk
> >> > > > cpulock?
> >> > > >
> >> > > > I'm asking largely because it's current role is actively unhelpful
> >> > > > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
> >> > > > be a better (and safer) solution. However it sounds like there is a
> >> > > > larger role planned for the printk cpulock...
> >> > > 
> >> > > The printk cpulock is used as a synchronization mechanism for
> >> > > implementing atomic consoles, which need to be able to safely interrupt
> >> > > the console write() activity at any time and immediately continue with
> >> > > their own printing. The ultimate goal is to move all console printing
> >> > > into per-console dedicated kthreads, so the primary function of the
> >> > > printk cpulock is really to immediately _stop_ the CPU/kthread
> >> > > performing write() in order to allow write_atomic() (from any context 
> >> > > on
> >> > > any CPU) to safely and reliably take over.
> >> > 
> >> > I see.
> >> > 
> >> > Is there any mileage in allowing in_dbg_master() to suppress taking
> >> > the console lock?
> >> > 
> >> > There's a couple of reasons to worry about the current approach.
> >> > 
> >> > The first is that we don't want this code to trigger in the case when
> >> > kgdb is enabled and kdb is not since it is only kdb (a self-hosted
> >> > debugger) than uses the consoles. This case is relatively trivial to
> >> > address since we can rename it kdb_roundup_delay() and alter the way it
> >> > is conditionally compiled.
> 
> Well, _I_ want this code to trigger even without kdb. The printk cpulock
> is meant to be the innermost locking for the entire kernel. No code is
> allowed to block/spin on any kind of lock if holding the printk
> cpulock. This is the only way to guarantee the functionality of the
> atomic consoles.
>
> For example, if the kernel were to crash while inside kgdb code, we want
> to see the backtrace.

That would certainly help me debug any such problems in kgdb ;-) .


> Since kgdb _does_ take locks (spinning on @dbg_slave_lock during roundup
> and the master's own cpu lock as a retry loop on @dbg_master_lock),
> clearly it is not allowed to hold the printk cpulock. The simplest
> solution I could find was just to make sure kgdb_cpu_enter() isn't
> called while holding the printk cpulock.

We might have to come back to this. I'm pretty certain your patch
does not currently achieve this goal.


> >> > The second is more of a problem however. kdb will only call into the
> >> > console code from the debug master. By default this is the CPU that
> >> > takes the debug trap so initial prints will work fine. However it is
> >> > possible to switch to a different master (so we can read per-CPU
> >> > registers and things like that). This will result in one of the CPUs
> >> > that did the IPI round up calling into console code and this is unsafe
> >> > in that instance.
> 
> It is only unsafe if a CPU enters "kgdb/kdb context" while holding the
> printk cpulock. That is what I want to prevent.

Currently you can preventing this only for CPUs that enter the debugger
via an IPI. CPUs that enter due to a breakpoint (and there can be more
than one at a time) cannot just continue until the lock is dropped
since they would end up re-executing the breakpoint instruction.


> >> > There are a couple of tricks we could adopt to work around this but
> >> > given the slightly odd calling context for kdb (all CPUs quiesced, no
> >> > log interleaving possible) it sounds like it would remain safe to
> >> > bypass the lock if in_dbg

Re: [PATCH printk v1 03/10] kgdb: delay roundup if holding printk cpulock

2021-08-04 Thread Daniel Thompson
On Wed, Aug 04, 2021 at 02:12:22PM +0200, Petr Mladek wrote:
> On Wed 2021-08-04 12:31:59, Daniel Thompson wrote:
> > On Tue, Aug 03, 2021 at 05:36:32PM +0206, John Ogness wrote:
> > > On 2021-08-03, Daniel Thompson  wrote:
> > > > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> > > >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> > > >> during cpu roundup. This will conflict with the printk cpulock.
> > > >
> > > > When the full vision is realized what will be the purpose of the printk
> > > > cpulock?
> > > >
> > > > I'm asking largely because it's current role is actively unhelpful
> > > > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
> > > > be a better (and safer) solution. However it sounds like there is a
> > > > larger role planned for the printk cpulock...
> > > 
> > > The printk cpulock is used as a synchronization mechanism for
> > > implementing atomic consoles, which need to be able to safely interrupt
> > > the console write() activity at any time and immediately continue with
> > > their own printing. The ultimate goal is to move all console printing
> > > into per-console dedicated kthreads, so the primary function of the
> > > printk cpulock is really to immediately _stop_ the CPU/kthread
> > > performing write() in order to allow write_atomic() (from any context on
> > > any CPU) to safely and reliably take over.
> > 
> > I see.
> > 
> > Is there any mileage in allowing in_dbg_master() to suppress taking
> > the console lock?
> > 
> > There's a couple of reasons to worry about the current approach.
> > 
> > The first is that we don't want this code to trigger in the case when
> > kgdb is enabled and kdb is not since it is only kdb (a self-hosted
> > debugger) than uses the consoles. This case is relatively trivial to
> > address since we can rename it kdb_roundup_delay() and alter the way it
> > is conditionally compiled.
> > 
> > The second is more of a problem however. kdb will only call into the
> > console code from the debug master. By default this is the CPU that
> > takes the debug trap so initial prints will work fine. However it is
> > possible to switch to a different master (so we can read per-CPU
> > registers and things like that). This will result in one of the CPUs
> > that did the IPI round up calling into console code and this is unsafe
> > in that instance.
> > 
> > There are a couple of tricks we could adopt to work around this but
> > given the slightly odd calling context for kdb (all CPUs quiesced, no
> > log interleaving possible) it sounds like it would remain safe to
> > bypass the lock if in_dbg_master() is true.
> > 
> > Bypassing an inconvenient lock might sound icky but:
> > 
> > 1. If the lock is not owned by any CPU then what kdb will do is safe.
> >
> > 2. If the lock is owned by any CPU then we have quiesced it anyway
> >and this makes is safe for the owning CPU to share its ownership
> >(since it isn't much different to recursive acquisition on a single
> >CPU)
> 
> I think about the following:
> 
> void kgdb_roundup_cpus(void)
> {
>   __printk_cpu_lock();
>   __kgdb_roundup_cpus();
> }
> 
> , where __printk_cpu_lock() waits/takes printk_cpu_lock()
>   __kgdb_roundup_cpus() is the original kgdb_roundup_cpus();
> 
> 
> The idea is that kgdb_roundup_cpus() caller takes the printk_cpu lock.
> The owner will be well defined.
> 
> As a result any other CPU will not be able to take the printk_cpu lock
> as long as it is owned by the kgdb lock. But as you say, kgdb will
> make sure that everything is serialized at this stage. So that
> the original raw_printk_cpu_lock_irqsave() might just disable
> IRQs when called under debugger.
> 
> Does it make any sense?

Yes but I think it is still has problems.

Primarily is doesn't solve the issue I raised. It would still be unsafe
to change debug master: we can guarantee the initial master owns the
lock but if it has been multiply acquired we cannot transfer ownership
when we want to change master.

Additionally it will delay the round up of cores that do not own the
lock. The quiescing is never atomic and the operator needs to know
that but the longer CPUs are allows to execute for the more confusing
things can become for the operator.

Finally on machines without an NMI this could cause trouble with the
interrupt disable in raw_printk_cpu_lock_irqsave() (or any outer level
interrupt disable). If the master get the lock 

Re: [PATCH printk v1 03/10] kgdb: delay roundup if holding printk cpulock

2021-08-04 Thread Daniel Thompson
On Tue, Aug 03, 2021 at 05:36:32PM +0206, John Ogness wrote:
> On 2021-08-03, Daniel Thompson  wrote:
> > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> >> during cpu roundup. This will conflict with the printk cpulock.
> >
> > When the full vision is realized what will be the purpose of the printk
> > cpulock?
> >
> > I'm asking largely because it's current role is actively unhelpful
> > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
> > be a better (and safer) solution. However it sounds like there is a
> > larger role planned for the printk cpulock...
> 
> The printk cpulock is used as a synchronization mechanism for
> implementing atomic consoles, which need to be able to safely interrupt
> the console write() activity at any time and immediately continue with
> their own printing. The ultimate goal is to move all console printing
> into per-console dedicated kthreads, so the primary function of the
> printk cpulock is really to immediately _stop_ the CPU/kthread
> performing write() in order to allow write_atomic() (from any context on
> any CPU) to safely and reliably take over.

I see.

Is there any mileage in allowing in_dbg_master() to suppress taking
the console lock?

There's a couple of reasons to worry about the current approach.

The first is that we don't want this code to trigger in the case when
kgdb is enabled and kdb is not since it is only kdb (a self-hosted
debugger) than uses the consoles. This case is relatively trivial to
address since we can rename it kdb_roundup_delay() and alter the way it
is conditionally compiled.

The second is more of a problem however. kdb will only call into the
console code from the debug master. By default this is the CPU that
takes the debug trap so initial prints will work fine. However it is
possible to switch to a different master (so we can read per-CPU
registers and things like that). This will result in one of the CPUs
that did the IPI round up calling into console code and this is unsafe
in that instance.

There are a couple of tricks we could adopt to work around this but
given the slightly odd calling context for kdb (all CPUs quiesced, no
log interleaving possible) it sounds like it would remain safe to
bypass the lock if in_dbg_master() is true.

Bypassing an inconvenient lock might sound icky but:

1. If the lock is not owned by any CPU then what kdb will do is safe.

2. If the lock is owned by any CPU then we have quiesced it anyway
   and this makes is safe for the owning CPU to share its ownership
   (since it isn't much different to recursive acquisition on a single
   CPU)


> Atomic consoles are actually quite similar to the kgdb_io ops. For
> example, comparing:
> 
> serial8250_console_write_atomic() + serial8250_console_putchar_locked()
> 
> with
> 
> serial8250_put_poll_char()
> 
> The difference is that serial8250_console_write_atomic() is line-based
> and synchronizing with serial8250_console_write() so that if the kernel
> crashes while outputing to the console, write() can be interrupted by
> write_atomic() and cleanly formatted crash data can be output.
> 
> Also serial8250_put_poll_char() is calling into __pm_runtime_resume(),
> which includes a spinlock and possibly sleeping. This would not be
> acceptable for atomic consoles.

spinlocks aren't allowed in polled I/O either.

However IIRC there is a rather nasty trick being played here to allow
code sharing. I believe there was a deliberate unbalanced resume in the
poll_init() function that results (again IIRC) in the PM calls in
poll_char() becoming nothing but atomic add and subtract (e.g. enabling
polled I/O effectively suppresses PM activity).


Daniel.

> Although, as Andy pointed out [0], I
> will need to figure out how to deal with suspended consoles. Or just
> implement a policy that registered atomic consoles may never be
> suspended.
> 
> I had not considered merging kgdb_io ops with atomic console ops. But
> now that I look at it more closely, there may be some useful overlap. I
> will consider this. Thank you for this idea.
> 
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index 3d0c933937b4..1b546e117f10 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, 
> >> int write,
> >>  #ifdef CONFIG_SMP
> >>  static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
> >>  static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
> >> +static unsigned int kgdb_cpu = -1;
> >
> > Is this the flag to provoke retriggering? It appears to be a write-only
> &g

Re: [PATCH printk v1 03/10] kgdb: delay roundup if holding printk cpulock

2021-08-03 Thread Daniel Thompson
On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> during cpu roundup. This will conflict with the printk cpulock.

When the full vision is realized what will be the purpose of the printk
cpulock?

I'm asking largely because it's current role is actively unhelpful
w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
be a better (and safer) solution. However it sounds like there is a
larger role planned for the printk cpulock...


> Therefore, a CPU must ensure that it is not holding the printk
> cpulock when calling kgdb_cpu_enter(). If it is, it must allow its
> printk context to complete first.
> 
> A new helper function kgdb_roundup_delay() is introduced for kgdb
> to determine if it is holding the printk cpulock. If so, a flag is
> set so that when the printk cpulock is released, kgdb will be
> re-triggered for that CPU.
> 
> Signed-off-by: John Ogness 
> ---
>  arch/powerpc/include/asm/smp.h |  1 +
>  arch/powerpc/kernel/kgdb.c | 10 +++-
>  arch/powerpc/kernel/smp.c  |  5 
>  arch/x86/kernel/kgdb.c |  9 ---
>  include/linux/kgdb.h   |  3 +++
>  include/linux/printk.h |  8 ++
>  kernel/debug/debug_core.c  | 45 --
>  kernel/printk/printk.c | 12 +
>  8 files changed, 70 insertions(+), 23 deletions(-)
> 
> [...]
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 3d0c933937b4..1b546e117f10 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, 
> int write,
>  #ifdef CONFIG_SMP
>  static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
>  static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
> +static unsigned int kgdb_cpu = -1;

Is this the flag to provoke retriggering? It appears to be a write-only
variable (at least in this patch). How is it consumed?


Daniel.


>  /**
>   * __printk_wait_on_cpu_lock() - Busy wait until the printk cpu-reentrant
> @@ -325,6 +327,16 @@ void __printk_cpu_unlock(void)
>  -1); /* LMM(__printk_cpu_unlock:B) */
>  }
>  EXPORT_SYMBOL(__printk_cpu_unlock);
> +
> +bool kgdb_roundup_delay(unsigned int cpu)
> +{
> + if (cpu != atomic_read(_cpulock_owner))
> + return false;
> +
> + kgdb_cpu = cpu;
> + return true;
> +}
> +EXPORT_SYMBOL(kgdb_roundup_delay);
>  #endif /* CONFIG_SMP */
>  
>  /* Number of registered extended console drivers. */
> -- 
> 2.20.1
> 


Re: [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K

2020-12-14 Thread Daniel Thompson
On Fri, Dec 11, 2020 at 05:05:58PM +, Daniel Thompson wrote:
> On Fri, Dec 11, 2020 at 08:37:40AM -0600, Rob Herring wrote:
> > On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson
> > > BTW I noticed many other pcie-designware drivers take advantage
> > > of a function called dw_pcie_wait_for_link() in their init paths...
> > > but my naive attempts to add it to the layerscape driver results
> > > in non-booting systems so I haven't embarrassed myself by including
> > > that in the patch!
> > 
> > You need to look at what's pending for v5.11, because I reworked this
> > to be more unified. The ordering of init is also possibly changed. The
> > sequence is now like this:
> > 
> > dw_pcie_setup_rc(pp);
> > dw_pcie_msi_init(pp);
> > 
> > if (!dw_pcie_link_up(pci) && pci->ops->start_link) {
> > ret = pci->ops->start_link(pci);
> > if (ret)
> > goto err_free_msi;
> > }
> > 
> > /* Ignore errors, the link may come up later */
> > dw_pcie_wait_for_link(pci);
> 
> Thanks. That looks likely to fix it since IIUC dw_pcie_wait_for_link()
> will end up waiting somewhat like the double check I added to
> ls_pcie_link_up().
> 
> I'll take a look at let you know.

Yes. These changes have fixed the enumeration problems for me.

I tested pci/next and I cherry picked your patch series onto v5.10 and
both are working well.

Given this fixes a bug for me, do you think there is any scope for me
to whittle down your series into patches for the stable kernels or am
I likely to find too many extra bits being pulled in?


Daniel.


Re: [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K

2020-12-11 Thread Daniel Thompson
On Fri, Dec 11, 2020 at 08:37:40AM -0600, Rob Herring wrote:
> On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson
>  wrote:
> >
> > I have been chasing down a problem enumerating an NVMe drive on a
> > Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate
> > successfully if the we are emitting lots of console messages via a UART.
> > If the system is booted with `quiet` set then enumeration fails.
> 
> We really need to get away from work-arounds for device X on host Y. I
> suspect they are not limited to that combination only...

No objection on that. This patch was essentially sharing the result of
an investigation where I got stuck at the "now fix it properly" stage!


> How exactly does it fail to enumerate? There's a (racy) linkup check
> on config accesses, is it reporting link down and skipping config
> accesses?

In dmesg terms it looked like this:

--- nvme_borked_gpu_working.linted.dmesg2020-11-18 15:28:35.544118213 
+
+++ nvme_working_gpu_working.linted.dmesg   2020-11-18 15:28:56.180076946 
+
@@ -241,11 +241,19 @@
 pci :00:00.0: reg 0x38: [mem 0x904800-0x9048ff pref]
 pci :00:00.0: supports D1 D2
 pci :00:00.0: PME# supported from D0 D1 D2 D3hot
+pci :01:00.0: [15b7:5009] type 00 class 0x010802
+pci :01:00.0: reg 0x10: [mem 0x904900-0x9049003fff 64bit]
+pci :01:00.0: reg 0x20: [mem 0x9049004000-0x90490040ff 64bit]
 pci :00:00.0: BAR 1: assigned [mem 0x904000-0x9043ff]
 pci :00:00.0: BAR 0: assigned [mem 0x904400-0x9044ff]
 pci :00:00.0: BAR 6: assigned [mem 0x904500-0x9045ff pref]
+pci :00:00.0: BAR 14: assigned [mem 0x904600-0x90460f]
+pci :01:00.0: BAR 0: assigned [mem 0x904600-0x9046003fff 64bit]
+pci :01:00.0: BAR 4: assigned [mem 0x9046004000-0x90460040ff 64bit]
 pci :00:00.0: PCI bridge to [bus 01-ff]
+pci :00:00.0:   bridge window [mem 0x904600-0x90460f]
 pci :00:00.0: Max Payload Size set to  256/ 256 (was  128), Max Read Rq  
256
+pci :01:00.0: Max Payload Size set to  256/ 512 (was  128), Max Read Rq  
256
 layerscape-pcie 380.pcie: host bridge /soc/pcie@380 ranges:
 layerscape-pcie 380.pcie:  MEM 0xa04000..0xa07fff -> 
0x004000
 layerscape-pcie 380.pcie: PCI host bridge to bus 0001:00

... and be aware that the last three lines here are another PCIe
controller coming up just fine and it is about to detect the GPU
(which like the NVMe is also gen3) just fine whether or not we
add a short delay.


> > I guessed this would be due to the timing impact of printk-to-UART and
> > tried to find out where a delay could be added to provoke a successful
> > enumeration.
> >
> > This patch contains the results. The delay length (1ms) was selected by
> > binary searching downwards until the delay was not effective for these
> > devices (Honeycomb LX2K and a Western Digital WD Blue SN550).
> >
> > I have also included the workaround twice (conditionally compiled). The
> > first change is the *latest* possible code path that we can deploy a
> > delay whilst the second is the earliest place I could find.
> >
> > The summary is that the critical window were we are currently relying on
> > a call to the console UART code can "mend" the driver runs from calling
> > dw_pcie_setup_rc() in host init to just before we read the state in the
> > link up callback.
> >
> > Signed-off-by: Daniel Thompson 
> > ---
> >
> > Notes:
> > This patch is RFC (and HACK) because I don't have much clue *why* this
> > patch works... merely that this is the smallest possible change I need
> > to replicate the current accidental printk() workaround.  Perhaps one
> > could argue that RFC here stands for request-for-clue.  All my
> > observations and changes here are empirical and I don't know how best to
> > turn them into something that is not a hack!
> >
> > BTW I noticed many other pcie-designware drivers take advantage
> > of a function called dw_pcie_wait_for_link() in their init paths...
> > but my naive attempts to add it to the layerscape driver results
> > in non-booting systems so I haven't embarrassed myself by including
> > that in the patch!
> 
> You need to look at what's pending for v5.11, because I reworked this
> to be more unified. The ordering of init is also possibly changed. The
> sequence is now like this:
> 
> dw_pcie_setup_rc(pp);
> dw_pcie_msi_init(pp);
> 
> if (!dw_pcie_link_up(pci) && pci->ops->start_link) {
> ret = pci->ops->start_link(pci);
> if (ret)
> goto err_free_msi;
> }
> 
> /* Ignore errors, the link may come up later */
> dw_pcie_wait_for_link(pci);

Thanks. That looks likely to fix it since IIUC dw_pcie_wait_for_link()
will end up waiting somewhat like the double check I added to
ls_pcie_link_up().

I'll take a look at let you know.


Daniel.


[RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K

2020-12-11 Thread Daniel Thompson
I have been chasing down a problem enumerating an NVMe drive on a
Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate
successfully if the we are emitting lots of console messages via a UART.
If the system is booted with `quiet` set then enumeration fails.

I guessed this would be due to the timing impact of printk-to-UART and
tried to find out where a delay could be added to provoke a successful
enumeration.

This patch contains the results. The delay length (1ms) was selected by
binary searching downwards until the delay was not effective for these
devices (Honeycomb LX2K and a Western Digital WD Blue SN550).

I have also included the workaround twice (conditionally compiled). The
first change is the *latest* possible code path that we can deploy a
delay whilst the second is the earliest place I could find.

The summary is that the critical window were we are currently relying on
a call to the console UART code can "mend" the driver runs from calling
dw_pcie_setup_rc() in host init to just before we read the state in the
link up callback.

Signed-off-by: Daniel Thompson 
---

Notes:
This patch is RFC (and HACK) because I don't have much clue *why* this
patch works... merely that this is the smallest possible change I need
to replicate the current accidental printk() workaround.  Perhaps one
could argue that RFC here stands for request-for-clue.  All my
observations and changes here are empirical and I don't know how best to
turn them into something that is not a hack!

BTW I noticed many other pcie-designware drivers take advantage
of a function called dw_pcie_wait_for_link() in their init paths...
but my naive attempts to add it to the layerscape driver results
in non-booting systems so I haven't embarrassed myself by including
that in the patch!

 drivers/pci/controller/dwc/pci-layerscape.c | 35 +
 1 file changed, 35 insertions(+)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index f24f79a70d9a..c354904b90ef 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -22,6 +22,8 @@

 #include "pcie-designware.h"

+#define WORKAROUND_LATEST_POSSIBLE
+
 /* PEX1/2 Misc Ports Status Register */
 #define SCFG_PEXMSCPORTSR(pex_idx) (0x94 + (pex_idx) * 4)
 #define LTSSM_STATE_SHIFT  20
@@ -113,10 +115,31 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
struct ls_pcie *pcie = to_ls_pcie(pci);
u32 state;

+   /*
+* Strictly speaking *this* (before the ioread32) is the latest
+* point a simple delay can be effective. If we move the delay
+* after the ioread32 then the NVMe does not enumerate.
+*
+* However this function appears to be frequently called so an
+* unconditional delay here causes noticeable delay at boot
+* time. Hence we implement the workaround by retrying the read
+* after a short delay if we think we might need to return false.
+*/
+
state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
 pcie->drvdata->ltssm_shift) &
 LTSSM_STATE_MASK;

+#ifdef WORKAROUND_LATEST_POSSIBLE
+   if (state < LTSSM_PCIE_L0) {
+   /* see comment above */
+   mdelay(1);
+   state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
+pcie->drvdata->ltssm_shift) &
+LTSSM_STATE_MASK;
+   }
+#endif
+
if (state < LTSSM_PCIE_L0)
return 0;

@@ -152,6 +175,18 @@ static int ls_pcie_host_init(struct pcie_port *pp)

dw_pcie_setup_rc(pp);

+#ifdef WORKAROUND_EARLIEST_POSSIBLE
+   /*
+* This is the earliest point the delay is effective.
+* If we move it before dw_pcie_setup_rc() then the
+* NVMe does not enumerate.
+*
+* 500us is too short to reliably work around the issue
+* hence adopting 1000us here.
+*/
+   mdelay(1);
+#endif
+
return 0;
 }


base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f
--
2.29.2



Re: [PATCH 0/5] drop unused BACKLIGHT_GENERIC option

2020-12-01 Thread Daniel Thompson
On Mon, Nov 30, 2020 at 03:21:32PM +, Andrey Zhizhikin wrote:
> Since the removal of generic_bl driver from the source tree in commit
> 7ecdea4a0226 ("backlight: generic_bl: Remove this driver as it is
> unused") BACKLIGHT_GENERIC config option became obsolete as well and
> therefore subject to clean-up from all configuration files.
> 
> This series introduces patches to address this removal, separated by
> architectures in the kernel tree.
> 
> Andrey Zhizhikin (5):
>   ARM: configs: drop unused BACKLIGHT_GENERIC option
>   arm64: defconfig: drop unused BACKLIGHT_GENERIC option
>   MIPS: configs: drop unused BACKLIGHT_GENERIC option
>   parisc: configs: drop unused BACKLIGHT_GENERIC option
>   powerpc/configs: drop unused BACKLIGHT_GENERIC option

Whole series:
Acked-by: Daniel Thompson 


Daniel.


Re: [PATCH 04/20] Documentation: kgdb: eliminate duplicated word

2020-07-08 Thread Daniel Thompson
On Tue, Jul 07, 2020 at 11:03:58AM -0700, Randy Dunlap wrote:
> Drop the doubled word "driver".
> 
> Signed-off-by: Randy Dunlap 
> Cc: Jonathan Corbet 
> Cc: linux-...@vger.kernel.org
> Cc: Jason Wessel 
> Cc: Daniel Thompson 
> Cc: Douglas Anderson 
> Cc: kgdb-bugrep...@lists.sourceforge.net

Acked-by: Daniel Thompson 


Daniel.


Re: [PATCH v2 2/2] kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops

2018-12-19 Thread Daniel Thompson
On Thu, Dec 06, 2018 at 08:07:40PM +, Christophe Leroy wrote:
> checkpatch.pl reports the following:
> 
>   WARNING: struct kgdb_arch should normally be const
>   #28: FILE: arch/mips/kernel/kgdb.c:397:
>   +struct kgdb_arch arch_kgdb_ops = {
> 
> This report makes sense, as all other ops struct, this
> one should also be const. This patch does the change.
> 
> Cc: Vineet Gupta 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Yoshinori Sato 
> Cc: Richard Kuo 
> Cc: Michal Simek 
> Cc: Ralf Baechle 
> Cc: Paul Burton 
> Cc: James Hogan 
> Cc: Ley Foon Tan 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Rich Felker 
> Cc: "David S. Miller" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: x...@kernel.org
> Acked-by: Daniel Thompson 
> Acked-by: Paul Burton 
> Signed-off-by: Christophe Leroy 

I've not heard any objections from the arch/ maintainers so...

Applied! Thanks.


> -
> ---
>  v2: Added CCs to all maintainers/supporters identified by get_maintainer.pl 
> and Acks from Daniel and Paul.
> 
>  arch/arc/kernel/kgdb.c| 2 +-
>  arch/arm/kernel/kgdb.c| 2 +-
>  arch/arm64/kernel/kgdb.c  | 2 +-
>  arch/h8300/kernel/kgdb.c  | 2 +-
>  arch/hexagon/kernel/kgdb.c| 2 +-
>  arch/microblaze/kernel/kgdb.c | 2 +-
>  arch/mips/kernel/kgdb.c   | 2 +-
>  arch/nios2/kernel/kgdb.c  | 2 +-
>  arch/powerpc/kernel/kgdb.c| 2 +-
>  arch/sh/kernel/kgdb.c | 2 +-
>  arch/sparc/kernel/kgdb_32.c   | 2 +-
>  arch/sparc/kernel/kgdb_64.c   | 2 +-
>  arch/x86/kernel/kgdb.c| 2 +-
>  include/linux/kgdb.h  | 2 +-
>  14 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..bfd04b442e36 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -204,7 +204,7 @@ void kgdb_roundup_cpus(unsigned long flags)
>   local_irq_disable();
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* breakpoint instruction: TRAP_S 0x3 */
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   .gdb_bpt_instr  = {0x78, 0x7e},
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..21a6d5958955 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -274,7 +274,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
>   * and we handle the normal undef case within the do_undefinstr
>   * handler.
>   */
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>  #ifndef __ARMEB__
>   .gdb_bpt_instr  = {0xfe, 0xde, 0xff, 0xe7}
>  #else /* ! __ARMEB__ */
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..fe1d1f935b90 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -357,7 +357,7 @@ void kgdb_arch_exit(void)
>   unregister_die_notifier(_notifier);
>  }
>  
> -struct kgdb_arch arch_kgdb_ops;
> +const struct kgdb_arch arch_kgdb_ops;
>  
>  int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
>  {
> diff --git a/arch/h8300/kernel/kgdb.c b/arch/h8300/kernel/kgdb.c
> index 1a1d30cb0609..602e478afbd5 100644
> --- a/arch/h8300/kernel/kgdb.c
> +++ b/arch/h8300/kernel/kgdb.c
> @@ -129,7 +129,7 @@ void kgdb_arch_exit(void)
>   /* Nothing to do */
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* Breakpoint instruction: trapa #2 */
>   .gdb_bpt_instr = { 0x57, 0x20 },
>  };
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..f1924d483e78 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -83,7 +83,7 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
>   { "syscall_nr", GDB_SIZEOF_REG, offsetof(struct pt_regs, syscall_nr)},
>  };
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* trap0(#0xDB) 0x0cdb0054 */
>   .gdb_bpt_instr = {0x54, 0x00, 0xdb, 0x0c},
>  };
> diff --git a/arch/microblaze/kernel/kgdb.c b/arch/microblaze/kernel/kgdb.c
> index 6366f69d118e..130cd0f064ce 100644
> --- a/arch/microblaze/kernel/kgdb.c
> +++ b/arch/microblaze/kernel/kgdb.c
> @@ -143,7 +143,7 @@ void kgdb_arch_exit(void)
>  /*
>   * Global data
>   */
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>  #ifdef __MICROBLAZEEL__
>   .gdb_bpt_instr = {0x18, 0x00, 0x0c, 0xba}, /* brki r16, 0x18 */
>  #else
> diff --git a/arch/mips

Re: [PATCH v2 1/2] mips/kgdb: prepare arch_kgdb_ops for constness

2018-12-19 Thread Daniel Thompson
On Thu, Dec 06, 2018 at 08:07:38PM +, Christophe Leroy wrote:
> MIPS is the only architecture modifying arch_kgdb_ops during init.
> This patch makes the init static, so that it can be changed to
> const in following patch, as recommended by checkpatch.pl
> 
> Suggested-by: Paul Burton 
> Acked-by: Daniel Thompson 
> Acked-by: Paul Burton 
> Signed-off-by: Christophe Leroy 

Applied! Thanks.


> -
> ---
>  v2: Added acks from Daniel and Paul.
> 
>  arch/mips/kernel/kgdb.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index eb6c0d582626..31eff1bec577 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -394,18 +394,16 @@ int kgdb_arch_handle_exception(int vector, int signo, 
> int err_code,
>   return -1;
>  }
>  
> -struct kgdb_arch arch_kgdb_ops;
> +struct kgdb_arch arch_kgdb_ops = {
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> + .gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
> +#else
> + .gdb_bpt_instr = { break_op, 0x00, 0x00, spec_op << 2 },
> +#endif
> +};
>  
>  int kgdb_arch_init(void)
>  {
> - union mips_instruction insn = {
> - .r_format = {
> - .opcode = spec_op,
> - .func   = break_op,
> - }
> - };
> - memcpy(arch_kgdb_ops.gdb_bpt_instr, insn.byte, BREAK_INSTR_SIZE);
> -
>   register_die_notifier(_notifier);
>  
>   return 0;
> -- 
> 2.13.3
> 


Re: [REPOST PATCH v6 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-12-19 Thread Daniel Thompson
On Tue, Dec 04, 2018 at 07:38:26PM -0800, Douglas Anderson wrote:
> When I had lockdep turned on and dropped into kgdb I got a nice splat
> on my system.  Specifically it hit:
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> 
> Specifically it looked like this:
>   sysrq: SysRq : DEBUG
>   [ cut here ]
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
> lockdep_hardirqs_on+0xf0/0x160
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
>   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
>   pc : lockdep_hardirqs_on+0xf0/0x160
>   ...
>   Call trace:
>lockdep_hardirqs_on+0xf0/0x160
>trace_hardirqs_on+0x188/0x1ac
>kgdb_roundup_cpus+0x14/0x3c
>kgdb_cpu_enter+0x53c/0x5cc
>kgdb_handle_exception+0x180/0x1d4
>kgdb_compiled_brk_fn+0x30/0x3c
>brk_handler+0x134/0x178
>do_debug_exception+0xfc/0x178
>el1_dbg+0x18/0x78
>kgdb_breakpoint+0x34/0x58
>sysrq_handle_dbg+0x54/0x5c
>__handle_sysrq+0x114/0x21c
>handle_sysrq+0x30/0x3c
>qcom_geni_serial_isr+0x2dc/0x30c
>   ...
>   ...
>   irq event stamp: ...45
>   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
>   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
>   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
>   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
>   ---[ end trace adf21f830c46e638 ]---
> 
> Looking closely at it, it seems like a really bad idea to be calling
> local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
> like it could violate spinlock semantics and cause a deadlock.
> 
> Instead, let's use a private csd alongside
> smp_call_function_single_async() to round up the other CPUs.  Using
> smp_call_function_single_async() doesn't require interrupts to be
> enabled so we can remove the offending bit of code.
> 
> In order to avoid duplicating this across all the architectures that
> use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
> to debug_core.c.
> 
> Looking at all the people who previously had copies of this code,
> there were a few variants.  I've attempted to keep the variants
> working like they used to.  Specifically:
> * For arch/arc we passed NULL to kgdb_nmicallback() instead of
>   get_irq_regs().
> * For arch/mips there was a bit of extra code around
>   kgdb_nmicallback()
> 
> NOTE: In this patch we will still get into trouble if we try to round
> up a CPU that failed to round up before.  We'll try to round it up
> again and potentially hang when we try to grab the csd lock.  That's
> not new behavior but we'll still try to do better in a future patch.
> 
> Suggested-by: Daniel Thompson 
> Signed-off-by: Douglas Anderson 
> Cc: Vineet Gupta 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Richard Kuo 
> Cc: Ralf Baechle 
> Cc: Paul Burton 
> Cc: James Hogan 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: "David S. Miller" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Acked-by: Will Deacon 

I've not heard any objections from the arch/ maintainers so...

Applied! Thanks.


> -
> ---
> 
> Changes in v6:
> - Moved smp_call_function_single_async() error check to patch 3.
> 
> Changes in v5:
> - Add a comment about get_irq_regs().
> - get_cpu() => raw_smp_processor_id() in kgdb_roundup_cpus().
> - for_each_cpu() => for_each_online_cpu()
> - Error check smp_call_function_single_async()
> 
> Changes in v4: None
> Changes in v3:
> - No separate init call.
> - Don't round up the CPU that is doing the rounding up.
> - Add "#ifdef CONFIG_SMP" to match the rest of the file.
> - Updated desc saying we don't solve the "failed to roundup" case.
> - Document the ignored parameter.
> 
> Changes in v2:
> - Removing irq flags separated from fixing lockdep splat.
> - Don't use smp_call_function (Daniel).
> 
>  arch/arc/kernel/kgdb.c | 10 ++
>  arch/arm/kernel/kgdb.c | 12 ---
>  arch/arm64/kernel/kgdb.c   | 12 ---
>  arch/hexagon/kernel/kgdb.c | 27 -
>  arch/mips/kernel/kgdb.c|  9 +
>  arch/powerpc/kernel/kgdb.c |  4 ++--
>  arch/sh/kernel/kgdb.c  | 12 ---
>  include/linux/kgdb.h   | 15 --
>  kernel/debug/debug_core.c  | 41 ++
>  9 files changed, 59 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch

Re: [REPOST PATCH v6 1/4] kgdb: Remove irq flags from roundup

2018-12-19 Thread Daniel Thompson
On Tue, Dec 04, 2018 at 07:38:25PM -0800, Douglas Anderson wrote:
> The function kgdb_roundup_cpus() was passed a parameter that was
> documented as:
> 
> > the flags that will be used when restoring the interrupts. There is
> > local_irq_save() call before kgdb_roundup_cpus().
> 
> Nobody used those flags.  Anyone who wanted to temporarily turn on
> interrupts just did local_irq_enable() and local_irq_disable() without
> looking at them.  So we can definitely remove the flags.
> 
> Signed-off-by: Douglas Anderson 
> Cc: Vineet Gupta 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Richard Kuo 
> Cc: Ralf Baechle 
> Cc: Paul Burton 
> Cc: James Hogan 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: "David S. Miller" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Acked-by: Will Deacon 

I've not heard any objections from the arch/ maintainers so...

Applied! Thanks.


> ---
> 
> Changes in v6: None
> Changes in v5: None
> Changes in v4: None
> Changes in v3: None
> Changes in v2:
> - Removing irq flags separated from fixing lockdep splat.
> 
>  arch/arc/kernel/kgdb.c | 2 +-
>  arch/arm/kernel/kgdb.c | 2 +-
>  arch/arm64/kernel/kgdb.c   | 2 +-
>  arch/hexagon/kernel/kgdb.c | 9 ++---
>  arch/mips/kernel/kgdb.c| 2 +-
>  arch/powerpc/kernel/kgdb.c | 2 +-
>  arch/sh/kernel/kgdb.c  | 2 +-
>  arch/sparc/kernel/smp_64.c | 2 +-
>  arch/x86/kernel/kgdb.c | 9 ++---
>  include/linux/kgdb.h   | 9 ++---
>  kernel/debug/debug_core.c  | 2 +-
>  11 files changed, 14 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..0932851028e0 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -197,7 +197,7 @@ static void kgdb_call_nmi_hook(void *ignored)
>   kgdb_nmicallback(raw_smp_processor_id(), NULL);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..f21077b077be 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -175,7 +175,7 @@ static void kgdb_call_nmi_hook(void *ignored)
> kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..12c339ff6e75 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -289,7 +289,7 @@ static void kgdb_call_nmi_hook(void *ignored)
>   kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..012e0e230ac2 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned 
> long pc)
>  
>  /**
>   * kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - * @flags: Current IRQ state
>   *
>   * On SMP systems, we need to get the attention of the other CPUs
>   * and get them be in a known state.  This should do what is needed
>   * to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   * the NMI approach is not used for rounding up all the CPUs. For example,
> - * in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - * this case, we have to make sure that interrupts are enabled before
> - * calling smp_call_function(). The argument to this function is
> - * the flags that will be used when restoring the interrupts. There is
> - * local_irq_save() call before kgdb_roundup_cpus().
> + * in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   * On non-SMP systems, this is not called.
>   */
> @@ -139,7 +134,7 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
>   kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>   local_irq_enable();
>   smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index eb6c0d582626..2b05effc17b4 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -219,7 +219,7 @@ static void kgdb_call_nmi_hook(void *ignored)
>   set_fs(old_fs);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>   local_irq_enable();
>   

Re: [PATCH 1/2] mips/kgdb: prepare arch_kgdb_ops for constness

2018-12-06 Thread Daniel Thompson
On Wed, Dec 05, 2018 at 04:41:09AM +, Christophe Leroy wrote:
> MIPS is the only architecture modifying arch_kgdb_ops during init.
> This patch makes the init static, so that it can be changed to
> const in following patch, as recommended by checkpatch.pl
> 
> Suggested-by: Paul Burton 
> Signed-off-by: Christophe Leroy 

>From my side this is
Acked-by: Daniel Thompson 

Since this is a dependency for the next patch I'd be happy to take via
my tree... but would need an ack from the MIPS guys to do that.


Daniel.

> ---
>  arch/mips/kernel/kgdb.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index eb6c0d582626..31eff1bec577 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -394,18 +394,16 @@ int kgdb_arch_handle_exception(int vector, int signo, 
> int err_code,
>   return -1;
>  }
>  
> -struct kgdb_arch arch_kgdb_ops;
> +struct kgdb_arch arch_kgdb_ops = {
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> + .gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
> +#else
> + .gdb_bpt_instr = { break_op, 0x00, 0x00, spec_op << 2 },
> +#endif
> +};
>  
>  int kgdb_arch_init(void)
>  {
> - union mips_instruction insn = {
> - .r_format = {
> - .opcode = spec_op,
> - .func   = break_op,
> - }
> - };
> - memcpy(arch_kgdb_ops.gdb_bpt_instr, insn.byte, BREAK_INSTR_SIZE);
> -
>   register_die_notifier(_notifier);
>  
>   return 0;
> -- 
> 2.13.3
> 


Re: [PATCH 2/2] kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops

2018-12-06 Thread Daniel Thompson
On Wed, Dec 05, 2018 at 04:41:11AM +, Christophe Leroy wrote:
> checkpatch.pl reports the following:
> 
>   WARNING: struct kgdb_arch should normally be const
>   #28: FILE: arch/mips/kernel/kgdb.c:397:
>   +struct kgdb_arch arch_kgdb_ops = {
> 
> This report makes sense, as all other ops struct, this
> one should also be const. This patch does the change.
> 
> Signed-off-by: Christophe Leroy 

Acked-by: Daniel Thompson 

Similar to https://patchwork.kernel.org/patch/10701129/ I would be more
comfortable to see a resend with the relevant arch maintainers
explicitly called out with a Cc: entry here.


> ---
>  arch/arc/kernel/kgdb.c| 2 +-
>  arch/arm/kernel/kgdb.c| 2 +-
>  arch/arm64/kernel/kgdb.c  | 2 +-
>  arch/h8300/kernel/kgdb.c  | 2 +-
>  arch/hexagon/kernel/kgdb.c| 2 +-
>  arch/microblaze/kernel/kgdb.c | 2 +-
>  arch/mips/kernel/kgdb.c   | 2 +-
>  arch/nios2/kernel/kgdb.c  | 2 +-
>  arch/powerpc/kernel/kgdb.c| 2 +-
>  arch/sh/kernel/kgdb.c | 2 +-
>  arch/sparc/kernel/kgdb_32.c   | 2 +-
>  arch/sparc/kernel/kgdb_64.c   | 2 +-
>  arch/x86/kernel/kgdb.c| 2 +-
>  include/linux/kgdb.h  | 2 +-
>  14 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..bfd04b442e36 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -204,7 +204,7 @@ void kgdb_roundup_cpus(unsigned long flags)
>   local_irq_disable();
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* breakpoint instruction: TRAP_S 0x3 */
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   .gdb_bpt_instr  = {0x78, 0x7e},
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..21a6d5958955 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -274,7 +274,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
>   * and we handle the normal undef case within the do_undefinstr
>   * handler.
>   */
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>  #ifndef __ARMEB__
>   .gdb_bpt_instr  = {0xfe, 0xde, 0xff, 0xe7}
>  #else /* ! __ARMEB__ */
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..fe1d1f935b90 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -357,7 +357,7 @@ void kgdb_arch_exit(void)
>   unregister_die_notifier(_notifier);
>  }
>  
> -struct kgdb_arch arch_kgdb_ops;
> +const struct kgdb_arch arch_kgdb_ops;
>  
>  int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
>  {
> diff --git a/arch/h8300/kernel/kgdb.c b/arch/h8300/kernel/kgdb.c
> index 1a1d30cb0609..602e478afbd5 100644
> --- a/arch/h8300/kernel/kgdb.c
> +++ b/arch/h8300/kernel/kgdb.c
> @@ -129,7 +129,7 @@ void kgdb_arch_exit(void)
>   /* Nothing to do */
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* Breakpoint instruction: trapa #2 */
>   .gdb_bpt_instr = { 0x57, 0x20 },
>  };
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..f1924d483e78 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -83,7 +83,7 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
>   { "syscall_nr", GDB_SIZEOF_REG, offsetof(struct pt_regs, syscall_nr)},
>  };
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* trap0(#0xDB) 0x0cdb0054 */
>   .gdb_bpt_instr = {0x54, 0x00, 0xdb, 0x0c},
>  };
> diff --git a/arch/microblaze/kernel/kgdb.c b/arch/microblaze/kernel/kgdb.c
> index 6366f69d118e..130cd0f064ce 100644
> --- a/arch/microblaze/kernel/kgdb.c
> +++ b/arch/microblaze/kernel/kgdb.c
> @@ -143,7 +143,7 @@ void kgdb_arch_exit(void)
>  /*
>   * Global data
>   */
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>  #ifdef __MICROBLAZEEL__
>   .gdb_bpt_instr = {0x18, 0x00, 0x0c, 0xba}, /* brki r16, 0x18 */
>  #else
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index 31eff1bec577..edfdc2ec2d16 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -394,7 +394,7 @@ int kgdb_arch_handle_exception(int vector, int signo, int 
> err_code,
>   return -1;
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   .gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
>  #else
> diff --git a/arch/nios2/kernel/kgdb.c b/arch/nios2/kernel/k

Re: [PATCH v6 1/4] kgdb: Remove irq flags from roundup

2018-12-03 Thread Daniel Thompson
On Tue, Nov 27, 2018 at 09:38:36AM -0800, Douglas Anderson wrote:
> The function kgdb_roundup_cpus() was passed a parameter that was
> documented as:
> 
> > the flags that will be used when restoring the interrupts. There is
> > local_irq_save() call before kgdb_roundup_cpus().
> 
> Nobody used those flags.  Anyone who wanted to temporarily turn on
> interrupts just did local_irq_enable() and local_irq_disable() without
> looking at them.  So we can definitely remove the flags.
> 
> Signed-off-by: Douglas Anderson 
> Acked-by: Will Deacon 
> ---
> 
> Changes in v6: None
> Changes in v5: None
> Changes in v4: None
> Changes in v3: None
> Changes in v2:
> - Removing irq flags separated from fixing lockdep splat.
> 
>  arch/arc/kernel/kgdb.c | 2 +-
>  arch/arm/kernel/kgdb.c | 2 +-
>  arch/arm64/kernel/kgdb.c   | 2 +-
>  arch/hexagon/kernel/kgdb.c | 9 ++---
>  arch/mips/kernel/kgdb.c| 2 +-
>  arch/powerpc/kernel/kgdb.c | 2 +-
>  arch/sh/kernel/kgdb.c  | 2 +-
>  arch/sparc/kernel/smp_64.c | 2 +-
>  arch/x86/kernel/kgdb.c | 9 ++---

Again, please formally Cc: the arch maintainers in the patch header
(just after your Signed-off-by: if I'm not being explicitly clear).


Daniel.

>  include/linux/kgdb.h   | 9 ++---
>  kernel/debug/debug_core.c  | 2 +-
>  11 files changed, 14 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..0932851028e0 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -197,7 +197,7 @@ static void kgdb_call_nmi_hook(void *ignored)
>   kgdb_nmicallback(raw_smp_processor_id(), NULL);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..f21077b077be 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -175,7 +175,7 @@ static void kgdb_call_nmi_hook(void *ignored)
> kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..12c339ff6e75 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -289,7 +289,7 @@ static void kgdb_call_nmi_hook(void *ignored)
>   kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..012e0e230ac2 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned 
> long pc)
>  
>  /**
>   * kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - * @flags: Current IRQ state
>   *
>   * On SMP systems, we need to get the attention of the other CPUs
>   * and get them be in a known state.  This should do what is needed
>   * to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   * the NMI approach is not used for rounding up all the CPUs. For example,
> - * in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - * this case, we have to make sure that interrupts are enabled before
> - * calling smp_call_function(). The argument to this function is
> - * the flags that will be used when restoring the interrupts. There is
> - * local_irq_save() call before kgdb_roundup_cpus().
> + * in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   * On non-SMP systems, this is not called.
>   */
> @@ -139,7 +134,7 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
>   kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>   local_irq_enable();
>   smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index eb6c0d582626..2b05effc17b4 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -219,7 +219,7 @@ static void kgdb_call_nmi_hook(void *ignored)
>   set_fs(old_fs);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
> index 59c578f865aa..b0e804844be0 100644
> --- a/arch/powerpc/kernel/kgdb.c
> +++ b/arch/powerpc/kernel/kgdb.c
> @@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
>  }
>  
>  #ifdef CONFIG_SMP
> -void 

Re: [PATCH v6 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-12-03 Thread Daniel Thompson
On Tue, Nov 27, 2018 at 09:38:37AM -0800, Douglas Anderson wrote:
> When I had lockdep turned on and dropped into kgdb I got a nice splat
> on my system.  Specifically it hit:
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> 
> Specifically it looked like this:
>   sysrq: SysRq : DEBUG
>   [ cut here ]
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
> lockdep_hardirqs_on+0xf0/0x160
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
>   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
>   pc : lockdep_hardirqs_on+0xf0/0x160
>   ...
>   Call trace:
>lockdep_hardirqs_on+0xf0/0x160
>trace_hardirqs_on+0x188/0x1ac
>kgdb_roundup_cpus+0x14/0x3c
>kgdb_cpu_enter+0x53c/0x5cc
>kgdb_handle_exception+0x180/0x1d4
>kgdb_compiled_brk_fn+0x30/0x3c
>brk_handler+0x134/0x178
>do_debug_exception+0xfc/0x178
>el1_dbg+0x18/0x78
>kgdb_breakpoint+0x34/0x58
>sysrq_handle_dbg+0x54/0x5c
>__handle_sysrq+0x114/0x21c
>handle_sysrq+0x30/0x3c
>qcom_geni_serial_isr+0x2dc/0x30c
>   ...
>   ...
>   irq event stamp: ...45
>   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
>   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
>   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
>   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
>   ---[ end trace adf21f830c46e638 ]---
> 
> Looking closely at it, it seems like a really bad idea to be calling
> local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
> like it could violate spinlock semantics and cause a deadlock.
> 
> Instead, let's use a private csd alongside
> smp_call_function_single_async() to round up the other CPUs.  Using
> smp_call_function_single_async() doesn't require interrupts to be
> enabled so we can remove the offending bit of code.
> 
> In order to avoid duplicating this across all the architectures that
> use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
> to debug_core.c.
> 
> Looking at all the people who previously had copies of this code,
> there were a few variants.  I've attempted to keep the variants
> working like they used to.  Specifically:
> * For arch/arc we passed NULL to kgdb_nmicallback() instead of
>   get_irq_regs().
> * For arch/mips there was a bit of extra code around
>   kgdb_nmicallback()
> 
> NOTE: In this patch we will still get into trouble if we try to round
> up a CPU that failed to round up before.  We'll try to round it up
> again and potentially hang when we try to grab the csd lock.  That's
> not new behavior but we'll still try to do better in a future patch.
> 
> Suggested-by: Daniel Thompson 
> Signed-off-by: Douglas Anderson 
> ---
> 
> Changes in v6:
> - Moved smp_call_function_single_async() error check to patch 3.
> 
> Changes in v5:
> - Add a comment about get_irq_regs().
> - get_cpu() => raw_smp_processor_id() in kgdb_roundup_cpus().
> - for_each_cpu() => for_each_online_cpu()
> - Error check smp_call_function_single_async()
> 
> Changes in v4: None
> Changes in v3:
> - No separate init call.
> - Don't round up the CPU that is doing the rounding up.
> - Add "#ifdef CONFIG_SMP" to match the rest of the file.
> - Updated desc saying we don't solve the "failed to roundup" case.
> - Document the ignored parameter.
> 
> Changes in v2:
> - Removing irq flags separated from fixing lockdep splat.
> - Don't use smp_call_function (Daniel).
> 
>  arch/arc/kernel/kgdb.c | 10 ++
>  arch/arm/kernel/kgdb.c | 12 ---
>  arch/arm64/kernel/kgdb.c   | 12 ---
>  arch/hexagon/kernel/kgdb.c | 27 -
>  arch/mips/kernel/kgdb.c|  9 +
>  arch/powerpc/kernel/kgdb.c |  4 ++--
>  arch/sh/kernel/kgdb.c  | 12 ---

Please could you re-send with the arch maintainers for these platforms
included in the Cc: of the patch description rather than just in the
e-mail header.

I'd like to make sure arch maintainers have a chance to opt out if they
want to (it's a weak symbol so an arch could chose not to use it).

Otherwise I shall start testing shortly!


Daniel.


>  include/linux/kgdb.h   | 15 --
>  kernel/debug/debug_core.c  | 41 ++
>  9 files changed, 59 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 0932851028e0..68d9fe4b5aa7 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -192,18 +192,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned 
> long ip)
>   i

Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-03 Thread Daniel Thompson
On Wed, Oct 31, 2018 at 02:41:14PM -0700, Doug Anderson wrote:
> > As mentioned in another part of the thread we can also add robustness
> > by skipping a cpu where csd->flags != 0 (and adding an appropriately
> > large comment regarding why). Doing the check directly is abusing
> > internal knowledge that smp.c normally keeps to itself so an accessor
> > of some kind would be needed.
> 
> Sure.  I could add smp_async_func_finished() that just looked like:
> 
> int smp_async_func_finished(call_single_data_t *csd)
> {
>   return !(csd->flags & CSD_FLAG_LOCK);
> }
> 
> My understanding of all the mutual exclusion / memory barrier concepts
> employed by smp.c is pretty weak, though.  I'm hoping that it's safe
> to just access the structure and check the bit directly.
> 
> ...but do you think adding a generic accessor like this is better than
> just keeping track of this in kgdb directly?  I could avoid the
> accessor by adding a "rounding_up" member to "struct
> debuggerinfo_struct" and doing something like this in roundup:
> 
>   /* If it didn't round up last time, don't try again */
>   if (kgdb_info[cpu].rounding_up)
> continue
> 
>   kgdb_info[cpu].rounding_up = true
>   smp_call_function_single_async(cpu, csd);
> 
> ...and then in kgdb_nmicallback() I could just add:
> 
>   kgdb_info[cpu].rounding_up = false
> 
> In that case we're not adding a generic accessor to smp.c that most
> people should never use.

Whilst it is very tempting to make a sarcastic reply here ("Of course! What
kgdb really needs is yet another set of condition variables") I can't
because I actually agree with the proposal. I don't really want kgdb to
be too "special" especially when it doesn't need to be.

Only thing to note is that rounding_up will not be manipulated within a
common spin lock so you might have to invest a bit of thought to make
sure any races between the master and slave as the slave CPU clears the
flag are benign.


Daniel.


Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-10-31 Thread Daniel Thompson
On Tue, Oct 30, 2018 at 03:18:43PM -0700, Douglas Anderson wrote:
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index f3cadda45f07..9a3f952de6ed 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -55,6 +55,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -220,6 +221,39 @@ int __weak kgdb_skipexception(int exception, struct 
> pt_regs *regs)
>   return 0;
>  }
>  
> +/*
> + * Default (weak) implementation for kgdb_roundup_cpus
> + */
> +
> +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
> +
> +void __weak kgdb_call_nmi_hook(void *ignored)
> +{
> + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> +}
> +
> +void __weak kgdb_roundup_cpus(void)
> +{
> + call_single_data_t *csd;
> + int cpu;
> +
> + for_each_cpu(cpu, cpu_online_mask) {
> + csd = _cpu(kgdb_roundup_csd, cpu);
> + smp_call_function_single_async(cpu, csd);
> + }

smp_call_function() automatically skips the calling CPU but this code does
not. It isn't a hard bug since kgdb_nmicallback() does a re-entrancy
check but I'd still prefer to skip the calling CPU.

As mentioned in another part of the thread we can also add robustness
by skipping a cpu where csd->flags != 0 (and adding an appropriately
large comment regarding why). Doing the check directly is abusing
internal knowledge that smp.c normally keeps to itself so an accessor
of some kind would be needed.


> +}
> +
> +static void kgdb_generic_roundup_init(void)
> +{
> + call_single_data_t *csd;
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + csd = _cpu(kgdb_roundup_csd, cpu);
> + csd->func = kgdb_call_nmi_hook;
> + }
> +}

I can't help noticing this code is very similar to kgdb_roundup_cpus. Do
we really gain much from ahead-of-time initializing csd->func?


Daniel.


Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-10-31 Thread Daniel Thompson
On Wed, Oct 31, 2018 at 02:49:26PM +0100, Peter Zijlstra wrote:
> On Tue, Oct 30, 2018 at 03:18:43PM -0700, Douglas Anderson wrote:
> > Looking closely at it, it seems like a really bad idea to be calling
> > local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
> > like it could violate spinlock semantics and cause a deadlock.
> > 
> > Instead, let's use a private csd alongside
> > smp_call_function_single_async() to round up the other CPUs.  Using
> > smp_call_function_single_async() doesn't require interrupts to be
> > enabled so we can remove the offending bit of code.
> 
> You might want to mention that the only reason this isn't a deadlock
> itself is because there is a timeout on waiting for the slaves to
> check-in.

dbg_master_lock must be owned to call kgdb_roundup_cpus() so
the calls to smp_call_function_single_async() should never deadlock the
calling CPU unless there has been a previous failure to round up (e.g.
cores that cannot react to the round up signal).

When there is a failure to round up when we resume, there is a window (before
whatever locks that prevented the IPI being serviced are released) during which
the system will deadlock if the debugger is re entered.

I don't think there is any point trying to round up a CPU that did not
previously respond... it should still have an IPI pending. The deadlock
can be eliminated by getting the round up code to avoid CPUs whose csd->flags
are non-zero either by checking the flag in the kgdb code or adding something
like smp_trycall_function_single_async().


Daniel.


Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb

2018-10-30 Thread Daniel Thompson
On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> Looking back, this is pretty much two series squashed that could be
> treated indepdently.  The first is a serial series and the second is a
> kgdb series.

Indeed.

I couldn't work out the link between the first 5 patches and the last 2
until I read this...

Is anything in the 01->05 patch set even related to kgdb? From the stack
traces it looks to me like the lock dep warning would trigger for any
sysrq. I think separating into two threads for v2 would be sensible.


Daniel.


> 
> For all serial patches I'd expect them to go through the tty tree once
> they've been reviewed.
> 
> If folks are OK w/ the 'smp' patch it probably should go in some core
> kernel tree.  The kgdb patch won't work without it, though, so to land
> that we'd need coordination between the folks landing that and the
> folks landing the 'smp' patch.
> 
> 
> Douglas Anderson (7):
>   serial: qcom_geni_serial: Finish supporting sysrq
>   serial: core: Allow processing sysrq at port unlock time
>   serial: qcom_geni_serial: Process sysrq at port unlock time
>   serial: core: Include console.h from serial_core.h
>   serial: 8250: Process sysrq at port unlock time
>   smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
>   kgdb: Remove irq flags and local_irq_enable/disable from roundup
> 
>  arch/arc/kernel/kgdb.c  |  4 +--
>  arch/arm/kernel/kgdb.c  |  4 +--
>  arch/arm64/kernel/kgdb.c|  4 +--
>  arch/hexagon/kernel/kgdb.c  | 11 ++
>  arch/mips/kernel/kgdb.c |  4 +--
>  arch/powerpc/kernel/kgdb.c  |  2 +-
>  arch/sh/kernel/kgdb.c   |  4 +--
>  arch/sparc/kernel/smp_64.c  |  2 +-
>  arch/x86/kernel/kgdb.c  |  9 ++---
>  drivers/tty/serial/8250/8250_aspeed_vuart.c |  6 +++-
>  drivers/tty/serial/8250/8250_fsl.c  |  6 +++-
>  drivers/tty/serial/8250/8250_omap.c |  6 +++-
>  drivers/tty/serial/8250/8250_port.c |  8 ++---
>  drivers/tty/serial/qcom_geni_serial.c   | 10 --
>  include/linux/kgdb.h|  9 ++---
>  include/linux/serial_core.h | 38 -
>  kernel/debug/debug_core.c   |  2 +-
>  kernel/smp.c|  4 ++-
>  18 files changed, 80 insertions(+), 53 deletions(-)
> 
> -- 
> 2.19.1.568.g152ad8e336-goog
> 


Re: [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup

2018-10-30 Thread Daniel Thompson
On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote:
> The function kgdb_roundup_cpus() was passed a parameter that was
> documented as:
> 
> > the flags that will be used when restoring the interrupts. There is
> > local_irq_save() call before kgdb_roundup_cpus().
> 
> Nobody used those flags.  Anyone who wanted to temporarily turn on
> interrupts just did local_irq_enable() and local_irq_disable() without
> looking at them.  So we can definitely remove the flags.

On the whole I'd rather that this change...


> Speaking of calling local_irq_enable(), it seems like a bad idea and
> it caused a nice splat on my system with lockdep turned on.
> Specifically it hit:
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)

... and fixes for this this were in separate patches. They don't appear
especially related.


Daniel.

 
> See the previous patch in this series ("smp: Don't yell about IRQs
> disabled in kgdb_roundup_cpus()") for more details, but the last few
> things on the stack were this on my arm64 board:
>   lockdep_hardirqs_on+0xf0/0x160
>   trace_hardirqs_on+0x188/0x1ac
>   kgdb_roundup_cpus+0x14/0x3c
> 
> As agrued in the the commit text of ("smp: Don't yell about IRQs
> disabled in kgdb_roundup_cpus()"), it seems better to make
> smp_call_function() lenient about kgdb than to locally turn on IRQs
> here.  Thus let's totally remove all the local_irq_enable() and
> local_irq_disable() calls from all of the kgdb_roundup_cpus() calls.
> 
> Signed-off-by: Douglas Anderson 
> ---
> 
>  arch/arc/kernel/kgdb.c |  4 +---
>  arch/arm/kernel/kgdb.c |  4 +---
>  arch/arm64/kernel/kgdb.c   |  4 +---
>  arch/hexagon/kernel/kgdb.c | 11 ++-
>  arch/mips/kernel/kgdb.c|  4 +---
>  arch/powerpc/kernel/kgdb.c |  2 +-
>  arch/sh/kernel/kgdb.c  |  4 +---
>  arch/sparc/kernel/smp_64.c |  2 +-
>  arch/x86/kernel/kgdb.c |  9 ++---
>  include/linux/kgdb.h   |  9 ++---
>  kernel/debug/debug_core.c  |  2 +-
>  11 files changed, 14 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..d94d3cb7f9e8 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -197,11 +197,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>   kgdb_nmicallback(raw_smp_processor_id(), NULL);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> - local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> - local_irq_disable();
>  }
>  
>  struct kgdb_arch arch_kgdb_ops = {
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..a80e9259f7e9 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -175,11 +175,9 @@ static void kgdb_call_nmi_hook(void *ignored)
> kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -   local_irq_enable();
> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -   local_irq_disable();
>  }
>  
>  static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..5d171c26788f 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -289,11 +289,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>   kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> - local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> - local_irq_disable();
>  }
>  
>  static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..30fbc491cf45 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned 
> long pc)
>  
>  /**
>   * kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - * @flags: Current IRQ state
>   *
>   * On SMP systems, we need to get the attention of the other CPUs
>   * and get them be in a known state.  This should do what is needed
>   * to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   * the NMI approach is not used for rounding up all the CPUs. For example,
> - * in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - * this case, we have to make sure that interrupts are enabled before
> - * calling smp_call_function(). The argument to this function is
> - * the flags that will be used when restoring the interrupts. There is
> - * local_irq_save() call before kgdb_roundup_cpus().
> + * in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   * On non-SMP systems, this is not called.
>   */
> @@ -139,11 +134,9 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
>   

Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()

2018-10-30 Thread Daniel Thompson
On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> In kgdb_roundup_cpus() we've got code that looks like:
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
>   local_irq_disable();
> 
> In certain cases when we drop into kgdb (like with sysrq-g on a serial
> console) we'll get a big yell that looks like:
> 
>   sysrq: SysRq : DEBUG
>   [ cut here ]
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
> lockdep_hardirqs_on+0xf0/0x160
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
>   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
>   pc : lockdep_hardirqs_on+0xf0/0x160
>   ...
>   Call trace:
>lockdep_hardirqs_on+0xf0/0x160
>trace_hardirqs_on+0x188/0x1ac
>kgdb_roundup_cpus+0x14/0x3c
>kgdb_cpu_enter+0x53c/0x5cc
>kgdb_handle_exception+0x180/0x1d4
>kgdb_compiled_brk_fn+0x30/0x3c
>brk_handler+0x134/0x178
>do_debug_exception+0xfc/0x178
>el1_dbg+0x18/0x78
>kgdb_breakpoint+0x34/0x58
>sysrq_handle_dbg+0x54/0x5c
>__handle_sysrq+0x114/0x21c
>handle_sysrq+0x30/0x3c
>qcom_geni_serial_isr+0x2dc/0x30c
>   ...
>   ...
>   irq event stamp: ...45
>   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
>   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
>   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
>   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
>   ---[ end trace adf21f830c46e638 ]---
> 
> Let's add kgdb to the list of reasons not to warn in
> smp_call_function_many().  That will allow us (in a future patch) to
> stop calling local_irq_enable() which will get rid of the original
> splat.
> 
> NOTE: with this change comes the obvious question: will we start
> deadlocking more often now when we drop into the debugger.  I can't
> say that for sure one way or the other, but the fact that we do the
> same logic for "oops_in_progress" makes me feel like it shouldn't
> happen too often.  Also note that the old logic of turning on
> interrupts temporarily wasn't exactly safe since (I presume) that
> could have violated spin_lock_irqsave() semantics and ended up with a
> deadlock of its own.

This is part of the code to bring all the cores to a halt and since
the other cores are still running kgdb isn't yet able to use the fact
all the CPUs are halted to bend the rules. It is better for this code
to play by the rules if it can.

Is is possible to get the roundup functions to use a private csd
alongside smp_call_function_single_async()? We could add a helper
function to the debug core to avoid having to add cpu_online loops into
every kgdb_roundup_cpus() implementaton.


Daniel.



> 
> Signed-off-by: Douglas Anderson 
> ---
> 
>  kernel/smp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 163c451af42e..bb581e58c8dc 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "smpboot.h"
>  
> @@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
>* can't happen.
>*/
>   WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> -  && !oops_in_progress && !early_boot_irqs_disabled);
> +  && !oops_in_progress && !early_boot_irqs_disabled
> +  && !in_dbg_master());
>  
>   /* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
>   cpu = cpumask_first_and(mask, cpu_online_mask);
> -- 
> 2.19.1.568.g152ad8e336-goog
> 


Re: Re: [PATCH] kdb: use correct pointer when 'btc' calls 'btt'

2018-09-26 Thread Daniel Thompson

On 16/09/2018 20:06, Daniel Thompson wrote:

On Fri, Sep 14, 2018 at 12:35:44PM +, Christophe Leroy wrote:

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Signed-off-by: Christophe Leroy 
Cc:  # 4.15+


Would a Fixes: be better here?
Fixes: ad67b74d2469d9b82 ("printk: hash addresses printed with %p")


Christophe, When you add the Fixes: could you also add my

Reviewed-by: Daniel Thompson 


Thanks.




No blame attached to Tobin, but the fixes makes it super clear what
changed and why this breaks kdb (which was not explicitly called out
the patch description).


Daniel.


---
  kernel/debug/kdb/kdb_bt.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
index 6ad4a9fcbd6f..7921ae4fca8d 100644
--- a/kernel/debug/kdb/kdb_bt.c
+++ b/kernel/debug/kdb/kdb_bt.c
@@ -179,14 +179,14 @@ kdb_bt(int argc, const char **argv)
kdb_printf("no process for cpu %ld\n", cpu);
return 0;
}
-   sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
+   sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
kdb_parse(buf);
return 0;
}
kdb_printf("btc: cpu status: ");
kdb_parse("cpu\n");
for_each_online_cpu(cpu) {
-   sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
+   sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
kdb_parse(buf);
touch_nmi_watchdog();
}
--
2.13.3





Re: [PATCH] kdb: use correct pointer when 'btc' calls 'btt'

2018-09-16 Thread Daniel Thompson
On Fri, Sep 14, 2018 at 12:35:44PM +, Christophe Leroy wrote:
> On a powerpc 8xx, 'btc' fails as follows:
> 
> Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
> kdb> btc
> btc: cpu status: Currently on cpu 0
> Available cpus: 0
> kdb_getarea: Bad address 0x0
> 
> when booting the kernel with 'debug_boot_weak_hash', it fails as well
> 
> Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
> kdb> btc
> btc: cpu status: Currently on cpu 0
> Available cpus: 0
> kdb_getarea: Bad address 0xba99ad80
> 
> On other platforms, Oopses have been observed too, see
> https://github.com/linuxppc/linux/issues/139
> 
> This is due to btc calling 'btt' with %p pointer as an argument.
> 
> This patch replaces %p by %px to get the real pointer value as
> expected by 'btt'
> 
> Signed-off-by: Christophe Leroy 
> Cc:  # 4.15+

Would a Fixes: be better here?
Fixes: ad67b74d2469d9b82 ("printk: hash addresses printed with %p")

No blame attached to Tobin, but the fixes makes it super clear what
changed and why this breaks kdb (which was not explicitly called out
the patch description).


Daniel.

> ---
>  kernel/debug/kdb/kdb_bt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
> index 6ad4a9fcbd6f..7921ae4fca8d 100644
> --- a/kernel/debug/kdb/kdb_bt.c
> +++ b/kernel/debug/kdb/kdb_bt.c
> @@ -179,14 +179,14 @@ kdb_bt(int argc, const char **argv)
>   kdb_printf("no process for cpu %ld\n", cpu);
>   return 0;
>   }
> - sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
> + sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
>   kdb_parse(buf);
>   return 0;
>   }
>   kdb_printf("btc: cpu status: ");
>   kdb_parse("cpu\n");
>   for_each_online_cpu(cpu) {
> - sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
> + sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
>   kdb_parse(buf);
>   touch_nmi_watchdog();
>   }
> -- 
> 2.13.3
> 


Re: [PATCH 2/2] backlight: Fix old-style function definition

2018-01-03 Thread Daniel Thompson
On Tue, Dec 26, 2017 at 03:12:41PM +0100, Mathieu Malaterre wrote:
> Fix warning:
> 
> drivers/macintosh/via-pmu-backlight.c: In function ‘pmu_backlight_init’:
> drivers/macintosh/via-pmu-backlight.c:140:13: warning: old-style function 
> definition [-Wold-style-definition]
>  void __init pmu_backlight_init()
>  ^~
> 
> Signed-off-by: Mathieu Malaterre <ma...@debian.org>

Acked-by: Daniel Thompson <daniel.thomp...@linaro.org>

> ---
>  drivers/macintosh/via-pmu-backlight.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/macintosh/via-pmu-backlight.c 
> b/drivers/macintosh/via-pmu-backlight.c
> index 89ed51571b62..50ada02ae75d 100644
> --- a/drivers/macintosh/via-pmu-backlight.c
> +++ b/drivers/macintosh/via-pmu-backlight.c
> @@ -137,7 +137,7 @@ void pmu_backlight_set_sleep(int sleep)
>  }
>  #endif /* CONFIG_PM */
>  
> -void __init pmu_backlight_init()
> +void __init pmu_backlight_init(void)
>  {
>   struct backlight_properties props;
>   struct backlight_device *bd;
> -- 
> 2.11.0
> 


Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

2016-03-01 Thread Daniel Thompson

On 18/12/15 17:00, Daniel Thompson wrote:

The MCE handlers should only call printk() when they decide to panic
and *after* busting the spinlocks. At this point deferring printk()
until it is safe is not very helpful.

When we bust the spinlocks we should probably restore the normal
printk() function to give best chance of the failure messages making
it out.


The problem is that we do not know what locks need to be busted. There
are too many consoles and too many locks involved. Also busting locks
open another can of worms.


Yes, I agree that busting the spinlocks doesn't avoid all risk of deadlock.

Probably I've been placing too much weight on the importance of getting
messages out when dying. You're right that surviving far enough through
a panic to trigger kdump or reset is equally (or more) important in many
scenarios than getting a failure message out.

However on a system with nothing but "while(1) {}" hooked up to panic()
then its worth risking a lock up. In this case restoring normal printk()
behavior and dumping the NMI buffers would be worthwhile.


An a (much) later thread[1] Andrew Morton described this comment as 
non-committal. Sorry for that.


I don't object to the overall approach taken by Petr, merely that I 
think there are cases where the current patchset doesn't put in quite 
enough effort to issue messages.


Panic triggered during NMI is the only case I can think of and that, I 
think, could be addressed by adding an extra call to printk_nmi_flush() 
during panic(). It should probably only cover cases where we *don't* 
call kdump and the panic handler doesn't restart the machine... so just 
after the pr_emerg("...end kernel panic") would be OK for me.



Daniel.


[1]: http://thread.gmane.org/gmane.linux.ports.arm.kernel/482845

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

2015-12-18 Thread Daniel Thompson

On 11/12/15 23:26, Jiri Kosina wrote:

On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:


I'm personally happy with the existing code, and I've been wondering why
there's this effort to apply further cleanups - to me, the changelogs
don't seem to make that much sense, unless we want to start using
printk() extensively in NMI functions - using the generic nmi backtrace
code surely gets us something that works across all architectures...


It is already being used extensively, and not only for all-CPU backtraces.
For starters, please consider

- WARN_ON(in_nmi())
- BUG_ON(in_nmi())


Sorry to join in so late but...

Today we risk deadlock when we try to issue these diagnostic errors 
directly from NMI context.


After this change we will still risk deadlock, because that's what the 
diagnostic code is trying to tell us, *and* we delay actually reporting 
the error until, and only if, the NMI handler completes.


I'm not entirely sure that this is an improvement.



- anything being printed out from MCE handlers


The MCE handlers should only call printk() when they decide to panic and 
*after* busting the spinlocks. At this point deferring printk() until it 
is safe is not very helpful.


When we bust the spinlocks we should probably restore the normal 
printk() function to give best chance of the failure messages making it out.



Daniel.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

2015-12-18 Thread Daniel Thompson

On 18/12/15 14:52, Petr Mladek wrote:

On Fri 2015-12-18 10:18:08, Daniel Thompson wrote:

On 11/12/15 23:26, Jiri Kosina wrote:

On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:


I'm personally happy with the existing code, and I've been wondering why
there's this effort to apply further cleanups - to me, the changelogs
don't seem to make that much sense, unless we want to start using
printk() extensively in NMI functions - using the generic nmi backtrace
code surely gets us something that works across all architectures...


It is already being used extensively, and not only for all-CPU backtraces.
For starters, please consider

- WARN_ON(in_nmi())
- BUG_ON(in_nmi())


Sorry to join in so late but...

Today we risk deadlock when we try to issue these diagnostic errors
directly from NMI context.

After this change we will still risk deadlock, because that's what
the diagnostic code is trying to tell us, *and* we delay actually
reporting the error until, and only if, the NMI handler completes.


I think that NMI messages about a possible deadlock are the ones
from

 kernel/locking/rtmutex.c
 kernel/irq_work.c
 include/linux/hardirq.h

You are right that if the deadlock happens, this patch set lowers the
chance to see the message.

On the other hand, all the other printk's in NMI seems to be non-fatal
warnings. In this case, this patch set increases the chance to see
them.


Maybe for a WARN_ON() the trade off is worth it but I don't think a 
BUG_ON() trace would ever make it out.




A compromise might be to explicitly call printk_nmi_flush() in the few
fatal cases. Alternatively we could force the messages on the
early_console when available.



- anything being printed out from MCE handlers


The MCE handlers should only call printk() when they decide to panic
and *after* busting the spinlocks. At this point deferring printk()
until it is safe is not very helpful.

When we bust the spinlocks we should probably restore the normal
printk() function to give best chance of the failure messages making
it out.


The problem is that we do not know what locks need to be busted. There
are too many consoles and too many locks involved. Also busting locks
open another can of worms.


Yes, I agree that busting the spinlocks doesn't avoid all risk of deadlock.

Probably I've been placing too much weight on the importance of getting 
messages out when dying. You're right that surviving far enough through 
a panic to trigger kdump or reset is equally (or more) important in many 
scenarios than getting a failure message out.


However on a system with nothing but "while(1) {}" hooked up to panic() 
then its worth risking a lock up. In this case restoring normal printk() 
behavior and dumping the NMI buffers would be worthwhile.



Daniel.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

2015-12-14 Thread Daniel Thompson

On 11/12/15 23:21, Russell King - ARM Linux wrote:

As I explained when I did that work, the vast majority of ARM platforms
are unable to trigger anything like a NMI - the FIQ is something that's
generally a property of the secure monitor, and is not accessible to
Linux.  However, there are platforms where it is accessible.

The work to add the FIQ-based variant never happened (I've no idea what
happened to that part, Daniel seems to have lost interest in working on
it.)  So, what we have is the IRQ-based variant merged in mainline, which
would be the fallback for the "FIQ not available" cases, and I carry a
local hack in my tree which provides the FIQ-based version - but if it
were to trigger, it takes out all interrupts (hence why I've not merged
my hack.)

I think the reason that the FIQ-based variant has never really happened
is that hooking into the interrupt controller code to clear down the FIQ
creates such a horrid layering violation, and also a locking mess that
I suspect it's just been given up with.


I haven't quite given up; I'm still looking into this stuff. However 
you're certainly right that connecting the FIQ handler to the GIC code 
in an elegant way is tough.


I've been working in parallel on an arm64 implementation with the result 
that I'm now two lumps of code that are almost, but not quite, ready.


Right now I hope to share latest arm code fairly late in the this 
devcycle (for review rather than merge) followed up with a new version 
very early in v4.6. Even now I think the code needs a long soak in -next 
just in case there are any lurking regressions on particular platforms.


I don't expect anyone to base decisions on my aspirations above but 
would like to reassure Russell that I haven't given up on it.



Daniel.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev