On Wed, 2021-05-05 at 13:59 -0400, Bruce Ashfield wrote:
> On Wed, May 5, 2021 at 12:36 PM Richard Purdie
> wrote:
> >
> > We've been seeing a lot of the qemuppc shutdown issue and I decided to look
> > into it. The really worrying thing looking at the logs locally is the serial
> > ports are showing irq issues and becoming disabled as nobody would handle
> > them.
> > Errors like:
> >
> > [9.194886] irq 36: nobody cared (try booting with the "irqpoll" option)
> > [9.198712] CPU: 0 PID: 127 Comm: bootlogd Not tainted
> > 5.10.30-yocto-standard #1
> > [9.202283] Call Trace:
> > [9.205611] [d1005f00] [c00a0da8] __report_bad_irq+0x50/0x138
> > (unreliable)
> > [9.209347] [d1005f30] [c00a0cc0] note_interrupt+0x324/0x378
> > [9.212855] [d1005f70] [c009d138] handle_irq_event+0xe8/0x104
> > [9.216353] [d1005fa0] [c00a1d9c] handle_fasteoi_irq+0xc0/0x29c
> > [9.219960] [d1005fc0] [c009b798] generic_handle_irq+0x40/0x5c
> > [9.223496] [d1005fd0] [c00075d0] __do_irq+0x58/0x188
> > [9.226948] [d1005ff0] [c0010040] call_do_irq+0x20/0x38
> > [9.230391] [d29eda60] [c0007788] do_IRQ+0x88/0xfc
> > [9.233860] [d29eda90] [c0016454] ret_from_except+0x0/0x14
> > [9.237288] --- interrupt: 501 at __setup_irq+0x3c4/0x838
> > [9.237288] LR = __setup_irq+0x790/0x838
> > [9.244155] [d29edb88] [c009f0a4] request_threaded_irq+0x114/0x1c8
> > [9.247672] [d29edbb8] [c07a5a18] pmz_startup+0x17c/0x32c
> > [9.251203] [d29edbd8] [c07a1140] uart_port_startup+0x184/0x2f8
> > [9.254651] [d29edc08] [c07a1974] uart_port_activate+0x78/0xf4
> > [9.258141] [d29edc28] [c07839f8] tty_port_open+0xd4/0x170
> > [9.261579] [d29edc58] [c079db74] uart_open+0x2c/0x48
> > [9.265116] [d29edc68] [c077a288] tty_open+0x168/0x640
> > [9.268574] [d29edcd8] [c0280be8] chrdev_open+0x138/0x2a4
> > [9.272123] [d29edd18] [c027421c] do_dentry_open+0x228/0x410
> > [9.275643] [d29edd48] [c028e9f4] path_openat+0xb04/0xf28
> > [9.279184] [d29eddd8] [c02917e4] do_filp_open+0x120/0x164
> > [9.282535] [d29ede98] [c0276238] do_sys_openat2+0xd8/0x19c
> > [9.285790] [d29edee8] [c0276574] sys_openat+0x88/0xdc
> > [9.289096] [d29edf38] [c00160d8] ret_from_syscall+0x0/0x34
> > [9.292620] --- interrupt: c01 at 0xfec3738
> > [9.292620] LR = 0xfec36e0
> > [9.299035] handlers:
> > [9.302312] [<7f7f7da8>] pmz_interrupt
> > [9.305541] Disabling IRQ #36
> >
> > (and the irqpoll option does not help)
> >
> > This is problematic as the shutdown test uses the serial interface to shut
> > down
> > the system. If the serial interface fails to login or run the command, game
> > over
> > for the test.
> >
> > Looking at the defconfig, the serial stuff is handled through
> > CONFIG_SERIAL_PMACZILOG
> > and the pmac_zliog.c driver. There is an interesting section of code in
> > there for
> > CONFIG_SERIAL_PMACZILOG_CONSOLE which had me asking "really?" about what it
> > does
> > with interrupts.
> >
> > Chucking:
> >
> > # CONFIG_SERIAL_PMACZILOG_CONSOLE is not set
> >
> > into a .cfg for the kernel meant the above traces went away. Whether than
> > was due to
> > the console logging changing or it really fixed some kind of race, hard to
> > say.
> >
> > My instinct says we should disable the above and test for a bit, see if
> > the shutdown issues don't happen so much.
> >
> > Bruce: Any thoughts on that?
> >
>
> It all looks reasonable to me. I rolled the change into the BSP and
> just sent the SRCREV bump.
Unfortunately, whilst this does help, presumably by removing a number of writes
from
the serial port which helps, we've still a issue here.
https://autobuilder.yoctoproject.org/typhoon/#/builders/63/builds/3392
In this case it "only" threw a warning about being unable to login to the
serial port. Looking at dmesg from that run, I see:
[0.00] Linux version 5.10.34-yocto-standard (oe-user@oe-host)
(powerpc-poky-linux-gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1.20210209) #1
PREEMPT Mon May 3 02:17:32 UTC 2021
[...]
[ 16.845298] ext4 filesystem being remounted at / supports timestamps until
2038 (0x7fff)
[ 24.895128] irq 37: nobody cared (try booting with the "irqpoll" option)
[ 24.895349] CPU: 0 PID: 495 Comm: getty Not tainted 5.10.34-yocto-standard #1
[ 24.895592] Call Trace:
[ 24.895718] [e1005f10] [c00a1180] __report_bad_irq+0x50/0x138 (unreliable)
[ 24.895899] [e1005f40] [c00a1098] note_interrupt+0x310/0x364
[ 24.895979] [e1005f70] [c009d530] handle_irq_event+0xe8/0x104
[ 24.896017] [e1005fa0] [c00a2174] handle_fasteoi_irq+0xc0/0x29c
[ 24.896051] [e1005fc0] [c009bafc] generic_handle_irq+0x40/0x5c
[ 24.896088] [e1005fd0] [c00075cc] __do_irq+0x58/0x188
[ 24.896121] [e1005ff0] [c0010178] call_do_irq+0x20/0x38
[ 24.896152] [e10d1cb0] [c0007784] do_IRQ+0x88/0xfc
[ 24.896186] [e10d1ce0] [c0016454] ret_from_except+0x0/0x14
[ 24.896242] --- interrupt: 501 at uart_write+0x124/0x2c0
[