Re: [PATCH 6/6] powerpc/64: irq replay remove decrementer overflow check

2020-09-18 Thread Michael Ellerman
Nicholas Piggin  writes:
> This is an ad-hoc way to catch some cases of decrementer overflow. It
> won't catch cases where interrupts were hard disabled before any soft
> masked interrupts fired, for example. And it doesn't catch cases that
> have overflowed an even number of times.
>
> It's not clear what exactly what problem s being solved here. A lost
> timer when we have an IRQ off latency of more than ~4.3 seconds could
> be avoided (so long as it's also less than ~8.6s) but this is already
> a hard lockup order of magnitude event, and the decrementer will wrap
> again and provide a timer interrupt within the same latency magnitdue.
>
> So the test catches some cases of lost decrementers in very exceptional
> (buggy) latency event cases, reducing timer interrupt latency in that
> case by up to 4.3 seconds. And for large decrementer, it's useless. It
> is performed in potentially quite a hot path, reading the TB can be
> a noticable overhead.
>
> Perhaps more importantly it allows the clunky MSR[EE] vs
> PACA_IRQ_HARD_DIS incoherency to be removed.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/irq.c | 50 +--
>  1 file changed, 1 insertion(+), 49 deletions(-)

Seems to be unhappy on qemu ppc64e:

  kernel BUG at arch/powerpc/kernel/irq.c:153!

Which is:

notrace unsigned int __check_irq_replay(void)
{
...

/* There should be nothing left ! */
BUG_ON(local_paca->irq_happened != 0);

return 0;
}

Full log below.

cheers


spawn qemu-system-ppc64 -nographic -M ppce500 -cpu e5500 -m 2G -kernel 
/home/michael/build/adhoc/ci_output/build/corenet64_smp_defconfig@ppc64@korg@10.1.0/uImage
 -initrd ppc64-novsx-rootfs.cpio.gz -append noreboot
MMU: Supported page sizes
 4 KB as direct
  4096 KB as direct
 16384 KB as direct
 65536 KB as direct
262144 KB as direct
   1048576 KB as direct
MMU: Book3E HW tablewalk not supported
Linux version 5.9.0-rc2-00187-gf523995cc1ee (linuxppc@e054daee57c9) 
(powerpc64-linux-gnu-gcc (GCC) 10.1.0, GNU ld (GNU Binutils) 2.34) #1 SMP Fri 
Sep 18 11:52:25 Australia 2020
Found initrd at 0xc500:0xc51e9a47
Using QEMU e500 machine description
ioremap() called early from .find_legacy_serial_ports+0x6cc/0x7bc. Use 
early_ioremap() instead
printk: bootconsole [udbg0] enabled
CPU maps initialized for 1 thread per core
-
phys_mem_size = 0x8000
dcache_bsize  = 0x40
icache_bsize  = 0x40
cpu_features  = 0x0003008001b4
  possible= 0x0003009003b6
  always  = 0x0003008003b4
cpu_user_features = 0xcc008000 0x0800
mmu_features  = 0x000a0010
firmware_features = 0x
-
qemu_e500_setup_arch()
barrier-nospec: using isync; sync as speculation barrier
Zone ranges:
  DMA  [mem 0x-0x7fff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x7fff]
Initmem setup node 0 [mem 0x-0x7fff]
MMU: Allocated 2112 bytes of context maps for 255 contexts
percpu: Embedded 28 pages/cpu s77400 r0 d37288 u1048576
Built 1 zonelists, mobility grouping on.  Total pages: 517120
Kernel command line: noreboot
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1977432K/2097152K available (12048K kernel code, 2204K rwdata, 3788K 
rodata, 460K init, 321K bss, 119720K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
rcu: Hierarchical RCU implementation.
rcu:RCU event tracing is enabled.
rcu:RCU restricting CPUs from NR_CPUS=24 to nr_cpu_ids=1.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
mpic: ISU size: 256, shift: 8, mask: ff
mpic: Initializing for 256 sources
random: get_random_u64 called from .start_kernel+0x498/0x70c with crng_init=0
clocksource: timebase: mask: 0x max_cycles: 0x5c4093a7d1, 
max_idle_ns: 440795210635 ns
clocksource: timebase mult[280] shift[24] registered
Console: colour dummy device 80x25
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
e500 family performance monitor hardware support registered
rcu: Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 1 CPU
devtmpfs: initialized
clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 
764504178510 ns
futex hash table entries: 

[PATCH 6/6] powerpc/64: irq replay remove decrementer overflow check

2020-09-15 Thread Nicholas Piggin
This is an ad-hoc way to catch some cases of decrementer overflow. It
won't catch cases where interrupts were hard disabled before any soft
masked interrupts fired, for example. And it doesn't catch cases that
have overflowed an even number of times.

It's not clear what exactly what problem s being solved here. A lost
timer when we have an IRQ off latency of more than ~4.3 seconds could
be avoided (so long as it's also less than ~8.6s) but this is already
a hard lockup order of magnitude event, and the decrementer will wrap
again and provide a timer interrupt within the same latency magnitdue.

So the test catches some cases of lost decrementers in very exceptional
(buggy) latency event cases, reducing timer interrupt latency in that
case by up to 4.3 seconds. And for large decrementer, it's useless. It
is performed in potentially quite a hot path, reading the TB can be
a noticable overhead.

Perhaps more importantly it allows the clunky MSR[EE] vs
PACA_IRQ_HARD_DIS incoherency to be removed.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/irq.c | 50 +--
 1 file changed, 1 insertion(+), 49 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 631e6d236c97..d7162f142f24 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -102,14 +102,6 @@ static inline notrace unsigned long get_irq_happened(void)
return happened;
 }
 
-static inline notrace int decrementer_check_overflow(void)
-{
-   u64 now = get_tb_or_rtc();
-   u64 *next_tb = this_cpu_ptr(_next_tb);
- 
-   return now >= *next_tb;
-}
-
 #ifdef CONFIG_PPC_BOOK3E
 
 /* This is called whenever we are re-enabling interrupts
@@ -142,35 +134,6 @@ notrace unsigned int __check_irq_replay(void)
trace_hardirqs_on();
trace_hardirqs_off();
 
-   /*
-* We are always hard disabled here, but PACA_IRQ_HARD_DIS may
-* not be set, which means interrupts have only just been hard
-* disabled as part of the local_irq_restore or interrupt return
-* code. In that case, skip the decrementr check becaus it's
-* expensive to read the TB.
-*
-* HARD_DIS then gets cleared here, but it's reconciled later.
-* Either local_irq_disable will replay the interrupt and that
-* will reconcile state like other hard interrupts. Or interrupt
-* retur will replay the interrupt and in that case it sets
-* PACA_IRQ_HARD_DIS by hand (see comments in entry_64.S).
-*/
-   if (happened & PACA_IRQ_HARD_DIS) {
-   local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-
-   /*
-* We may have missed a decrementer interrupt if hard disabled.
-* Check the decrementer register in case we had a rollover
-* while hard disabled.
-*/
-   if (!(happened & PACA_IRQ_DEC)) {
-   if (decrementer_check_overflow()) {
-   local_paca->irq_happened |= PACA_IRQ_DEC;
-   happened |= PACA_IRQ_DEC;
-   }
-   }
-   }
-
if (happened & PACA_IRQ_DEC) {
local_paca->irq_happened &= ~PACA_IRQ_DEC;
return 0x900;
@@ -229,18 +192,6 @@ void replay_soft_interrupts(void)
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
WARN_ON_ONCE(mfmsr() & MSR_EE);
 
-   if (happened & PACA_IRQ_HARD_DIS) {
-   /*
-* We may have missed a decrementer interrupt if hard disabled.
-* Check the decrementer register in case we had a rollover
-* while hard disabled.
-*/
-   if (!(happened & PACA_IRQ_DEC)) {
-   if (decrementer_check_overflow())
-   happened |= PACA_IRQ_DEC;
-   }
-   }
-
/*
 * Force the delivery of pending soft-disabled interrupts on PS3.
 * Any HV call will have this side effect.
@@ -345,6 +296,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
WARN_ON_ONCE(!(mfmsr() & MSR_EE));
__hard_irq_disable();
+   local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
} else {
/*
 * We should already be hard disabled here. We had bugs
-- 
2.23.0