Re: [PATCH v6 7/7] arm64: kgdb: Roundup cpus using IPI as NMI

2020-11-01 Thread Sumit Garg
On Thu, 29 Oct 2020 at 22:09, Daniel Thompson
 wrote:
>
> On Thu, Oct 29, 2020 at 04:22:34PM +, Daniel Thompson wrote:
> > On Thu, Oct 29, 2020 at 08:26:27PM +0530, Sumit Garg wrote:
> > > arm64 platforms with GICv3 or later supports pseudo NMIs which can be
> > > leveraged to roundup CPUs which are stuck in hard lockup state with
> > > interrupts disabled that wouldn't be possible with a normal IPI.
> > >
> > > So instead switch to roundup CPUs using IPI turned as NMI. And in
> > > case a particular arm64 platform doesn't supports pseudo NMIs,
> > > it will switch back to default kgdb CPUs roundup mechanism.
> > >
> > > Signed-off-by: Sumit Garg 
> > > ---
> > >  arch/arm64/include/asm/kgdb.h |  9 +
> > >  arch/arm64/kernel/ipi_nmi.c   |  5 +
> > >  arch/arm64/kernel/kgdb.c  | 35 +++
> > >  3 files changed, 49 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/kgdb.h b/arch/arm64/include/asm/kgdb.h
> > > index 21fc85e..c3d2425 100644
> > > --- a/arch/arm64/include/asm/kgdb.h
> > > +++ b/arch/arm64/include/asm/kgdb.h
> > > @@ -24,6 +24,15 @@ static inline void arch_kgdb_breakpoint(void)
> > >  extern void kgdb_handle_bus_error(void);
> > >  extern int kgdb_fault_expected;
> > >
> > > +#ifdef CONFIG_KGDB
> > > +extern bool kgdb_ipi_nmicallback(int cpu, void *regs);
> > > +#else
> > > +static inline bool kgdb_ipi_nmicallback(int cpu, void *regs)
> > > +{
> > > +   return false;
> > > +}
> > > +#endif
> > > +
> > >  #endif /* !__ASSEMBLY__ */
> > >
> > >  /*
> > > diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c
> > > index 597dcf7..6ace182 100644
> > > --- a/arch/arm64/kernel/ipi_nmi.c
> > > +++ b/arch/arm64/kernel/ipi_nmi.c
> > > @@ -8,6 +8,7 @@
> > >
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >
> > > @@ -45,10 +46,14 @@ bool arch_trigger_cpumask_backtrace(const cpumask_t 
> > > *mask, bool exclude_self)
> > >  static irqreturn_t ipi_nmi_handler(int irq, void *data)
> > >  {
> > > irqreturn_t ret = IRQ_NONE;
> > > +   unsigned int cpu = smp_processor_id();
> > >
> > > if (nmi_cpu_backtrace(get_irq_regs()))
> > > ret = IRQ_HANDLED;
> > >
> > > +   if (kgdb_ipi_nmicallback(cpu, get_irq_regs()))
> > > +   ret = IRQ_HANDLED;
> > > +
> > > return ret;
> >
> > It would be better to declare existing return value for
> > kgdb_nmicallback() to be dangerously stupid and fix it so it returns an
> > irqreturn_t (that's easy since most callers do not need to check the
> > return value).
> >
> > Then this code simply becomes:
> >
> >   return kgdb_nmicallback(cpu, get_irq_regs());
>
> Actually, reflecting on this maybe it is better to keep kgdb_nmicallin()
> and kgdb_nmicallback() aligned w.r.t. return codes (even if they are a
> little unusual).
>
> I'm still not sure why we'd keep kgdb_ipi_nmicallback() though.
> kgdb_nmicallback() is intended to be called from arch code...
>

I added kgdb_ipi_nmicallback() just to add a check for "kgdb_active"
prior to entry into kgdb as here we are sharing NMI among backtrace
and kgdb.

But after your comments, I looked carefully into kgdb_nmicallback()
and I see the "raw_spin_is_locked(&dbg_master_lock)" check as well. So
it looked sufficient to me for calling kgdb_nmicallback() directly
from the arch code and hence I will remove kgdb_ipi_nmicallback() in
the next version.

>
> Daniel.
>
>
> >
> >
> > >  }
> > >
> > > diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> > > index 1a157ca3..c26e710 100644
> > > --- a/arch/arm64/kernel/kgdb.c
> > > +++ b/arch/arm64/kernel/kgdb.c
> > > @@ -17,6 +17,7 @@
> > >
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >
> > >  struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
> > > @@ -353,3 +354,37 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt 
> > > *bpt)
> > > return aarch64_insn_write((void *)bpt->bpt_addr,
> > > *(u32 *)bpt->saved_instr);
> > >  }
> > > +
> > > +bool kgdb_ipi_nmicallback(int cpu, void *regs)
> > > +{
> > > +   if (atomic_read(&kgdb_active) != -1) {
> > > +   kgdb_nmicallback(cpu, regs);
> > > +   return true;
> > > +   }
> > > +
> > > +   return false;
> > > +}
> >
> > I *really* don't like this function.
> >
> > If the return code of kgdb_nmicallback() is broken then fix it, don't
> > just wrap it and invent a new criteria for the return code.
> >
> > To be honest I don't actually think the logic in kgdb_nmicallback() is
> > broken. As mentioned above the return value has a weird definition (0
> > for "handled it OK" and 1 for "nothing for me to do") but the logic to
> > calculate the return code looks OK.
> >

Makes sense, will remove it instead.

> >
> > > +
> > > +static void kgdb_smp_callback(void *data)
> > > +{
> > > +   unsigned int cpu = smp_processor_id();
> > > +
> > > +   if (atomic_read(&kgdb_active) != -1)
> > > +   kgdb_nmicallback(cpu, get_irq_regs());

Re: [PATCH v6 7/7] arm64: kgdb: Roundup cpus using IPI as NMI

2020-10-29 Thread Daniel Thompson
On Thu, Oct 29, 2020 at 04:22:34PM +, Daniel Thompson wrote:
> On Thu, Oct 29, 2020 at 08:26:27PM +0530, Sumit Garg wrote:
> > arm64 platforms with GICv3 or later supports pseudo NMIs which can be
> > leveraged to roundup CPUs which are stuck in hard lockup state with
> > interrupts disabled that wouldn't be possible with a normal IPI.
> > 
> > So instead switch to roundup CPUs using IPI turned as NMI. And in
> > case a particular arm64 platform doesn't supports pseudo NMIs,
> > it will switch back to default kgdb CPUs roundup mechanism.
> > 
> > Signed-off-by: Sumit Garg 
> > ---
> >  arch/arm64/include/asm/kgdb.h |  9 +
> >  arch/arm64/kernel/ipi_nmi.c   |  5 +
> >  arch/arm64/kernel/kgdb.c  | 35 +++
> >  3 files changed, 49 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kgdb.h b/arch/arm64/include/asm/kgdb.h
> > index 21fc85e..c3d2425 100644
> > --- a/arch/arm64/include/asm/kgdb.h
> > +++ b/arch/arm64/include/asm/kgdb.h
> > @@ -24,6 +24,15 @@ static inline void arch_kgdb_breakpoint(void)
> >  extern void kgdb_handle_bus_error(void);
> >  extern int kgdb_fault_expected;
> >  
> > +#ifdef CONFIG_KGDB
> > +extern bool kgdb_ipi_nmicallback(int cpu, void *regs);
> > +#else
> > +static inline bool kgdb_ipi_nmicallback(int cpu, void *regs)
> > +{
> > +   return false;
> > +}
> > +#endif
> > +
> >  #endif /* !__ASSEMBLY__ */
> >  
> >  /*
> > diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c
> > index 597dcf7..6ace182 100644
> > --- a/arch/arm64/kernel/ipi_nmi.c
> > +++ b/arch/arm64/kernel/ipi_nmi.c
> > @@ -8,6 +8,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -45,10 +46,14 @@ bool arch_trigger_cpumask_backtrace(const cpumask_t 
> > *mask, bool exclude_self)
> >  static irqreturn_t ipi_nmi_handler(int irq, void *data)
> >  {
> > irqreturn_t ret = IRQ_NONE;
> > +   unsigned int cpu = smp_processor_id();
> >  
> > if (nmi_cpu_backtrace(get_irq_regs()))
> > ret = IRQ_HANDLED;
> >  
> > +   if (kgdb_ipi_nmicallback(cpu, get_irq_regs()))
> > +   ret = IRQ_HANDLED;
> > +
> > return ret;
> 
> It would be better to declare existing return value for
> kgdb_nmicallback() to be dangerously stupid and fix it so it returns an
> irqreturn_t (that's easy since most callers do not need to check the
> return value).
> 
> Then this code simply becomes:
> 
>   return kgdb_nmicallback(cpu, get_irq_regs());

Actually, reflecting on this maybe it is better to keep kgdb_nmicallin()
and kgdb_nmicallback() aligned w.r.t. return codes (even if they are a
little unusual).

I'm still not sure why we'd keep kgdb_ipi_nmicallback() though.
kgdb_nmicallback() is intended to be called from arch code...


Daniel.


> 
> 
> >  }
> >  
> > diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> > index 1a157ca3..c26e710 100644
> > --- a/arch/arm64/kernel/kgdb.c
> > +++ b/arch/arm64/kernel/kgdb.c
> > @@ -17,6 +17,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
> > @@ -353,3 +354,37 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
> > return aarch64_insn_write((void *)bpt->bpt_addr,
> > *(u32 *)bpt->saved_instr);
> >  }
> > +
> > +bool kgdb_ipi_nmicallback(int cpu, void *regs)
> > +{
> > +   if (atomic_read(&kgdb_active) != -1) {
> > +   kgdb_nmicallback(cpu, regs);
> > +   return true;
> > +   }
> > +
> > +   return false;
> > +}
> 
> I *really* don't like this function.
> 
> If the return code of kgdb_nmicallback() is broken then fix it, don't
> just wrap it and invent a new criteria for the return code.
> 
> To be honest I don't actually think the logic in kgdb_nmicallback() is
> broken. As mentioned above the return value has a weird definition (0
> for "handled it OK" and 1 for "nothing for me to do") but the logic to
> calculate the return code looks OK.
> 
> 
> > +
> > +static void kgdb_smp_callback(void *data)
> > +{
> > +   unsigned int cpu = smp_processor_id();
> > +
> > +   if (atomic_read(&kgdb_active) != -1)
> > +   kgdb_nmicallback(cpu, get_irq_regs());
> > +}
> 
> This is Unused. I presume it is litter from a previous revision of the
> code and can be deleted?
> 
> 
> > +
> > +bool kgdb_arch_roundup_cpus(void)
> > +{
> > +   struct cpumask mask;
> > +
> > +   if (!arm64_supports_nmi())
> > +   return false;
> > +
> > +   cpumask_copy(&mask, cpu_online_mask);
> > +   cpumask_clear_cpu(raw_smp_processor_id(), &mask);
> > +   if (cpumask_empty(&mask))
> > +   return false;
> 
> Why do we need to fallback if there is no work to do? There will still
> be no work to do when we call the fallback.
> 
> 
> Daniel.


Re: [PATCH v6 7/7] arm64: kgdb: Roundup cpus using IPI as NMI

2020-10-29 Thread Daniel Thompson
On Thu, Oct 29, 2020 at 08:26:27PM +0530, Sumit Garg wrote:
> arm64 platforms with GICv3 or later supports pseudo NMIs which can be
> leveraged to roundup CPUs which are stuck in hard lockup state with
> interrupts disabled that wouldn't be possible with a normal IPI.
> 
> So instead switch to roundup CPUs using IPI turned as NMI. And in
> case a particular arm64 platform doesn't supports pseudo NMIs,
> it will switch back to default kgdb CPUs roundup mechanism.
> 
> Signed-off-by: Sumit Garg 
> ---
>  arch/arm64/include/asm/kgdb.h |  9 +
>  arch/arm64/kernel/ipi_nmi.c   |  5 +
>  arch/arm64/kernel/kgdb.c  | 35 +++
>  3 files changed, 49 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kgdb.h b/arch/arm64/include/asm/kgdb.h
> index 21fc85e..c3d2425 100644
> --- a/arch/arm64/include/asm/kgdb.h
> +++ b/arch/arm64/include/asm/kgdb.h
> @@ -24,6 +24,15 @@ static inline void arch_kgdb_breakpoint(void)
>  extern void kgdb_handle_bus_error(void);
>  extern int kgdb_fault_expected;
>  
> +#ifdef CONFIG_KGDB
> +extern bool kgdb_ipi_nmicallback(int cpu, void *regs);
> +#else
> +static inline bool kgdb_ipi_nmicallback(int cpu, void *regs)
> +{
> + return false;
> +}
> +#endif
> +
>  #endif /* !__ASSEMBLY__ */
>  
>  /*
> diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c
> index 597dcf7..6ace182 100644
> --- a/arch/arm64/kernel/ipi_nmi.c
> +++ b/arch/arm64/kernel/ipi_nmi.c
> @@ -8,6 +8,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -45,10 +46,14 @@ bool arch_trigger_cpumask_backtrace(const cpumask_t 
> *mask, bool exclude_self)
>  static irqreturn_t ipi_nmi_handler(int irq, void *data)
>  {
>   irqreturn_t ret = IRQ_NONE;
> + unsigned int cpu = smp_processor_id();
>  
>   if (nmi_cpu_backtrace(get_irq_regs()))
>   ret = IRQ_HANDLED;
>  
> + if (kgdb_ipi_nmicallback(cpu, get_irq_regs()))
> + ret = IRQ_HANDLED;
> +
>   return ret;

It would be better to declare existing return value for
kgdb_nmicallback() to be dangerously stupid and fix it so it returns an
irqreturn_t (that's easy since most callers do not need to check the
return value).

Then this code simply becomes:

return kgdb_nmicallback(cpu, get_irq_regs());


>  }
>  
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index 1a157ca3..c26e710 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -17,6 +17,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
> @@ -353,3 +354,37 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
>   return aarch64_insn_write((void *)bpt->bpt_addr,
>   *(u32 *)bpt->saved_instr);
>  }
> +
> +bool kgdb_ipi_nmicallback(int cpu, void *regs)
> +{
> + if (atomic_read(&kgdb_active) != -1) {
> + kgdb_nmicallback(cpu, regs);
> + return true;
> + }
> +
> + return false;
> +}

I *really* don't like this function.

If the return code of kgdb_nmicallback() is broken then fix it, don't
just wrap it and invent a new criteria for the return code.

To be honest I don't actually think the logic in kgdb_nmicallback() is
broken. As mentioned above the return value has a weird definition (0
for "handled it OK" and 1 for "nothing for me to do") but the logic to
calculate the return code looks OK.


> +
> +static void kgdb_smp_callback(void *data)
> +{
> + unsigned int cpu = smp_processor_id();
> +
> + if (atomic_read(&kgdb_active) != -1)
> + kgdb_nmicallback(cpu, get_irq_regs());
> +}

This is Unused. I presume it is litter from a previous revision of the
code and can be deleted?


> +
> +bool kgdb_arch_roundup_cpus(void)
> +{
> + struct cpumask mask;
> +
> + if (!arm64_supports_nmi())
> + return false;
> +
> + cpumask_copy(&mask, cpu_online_mask);
> + cpumask_clear_cpu(raw_smp_processor_id(), &mask);
> + if (cpumask_empty(&mask))
> + return false;

Why do we need to fallback if there is no work to do? There will still
be no work to do when we call the fallback.


Daniel.