On Sun, 2023-10-08 at 09:23 +0100, Richard Purdie via lists.openembedded.org wrote: > On Sat, 2023-10-07 at 23:05 +0100, Richard Purdie via > lists.openembedded.org wrote: > > I thought I'd summarise where things are at with the 6.5 kernel. > > > > We've fixed: > > * the ARM LTP OOM lockup (kernel patch) > > * the locale ARM selftest failure which was OOM due to silly buffer > > allocations in 6.5 (kernel commandline) > > * the ARM jitterentropy errors (kernel patch) > > * the cryptodev build failures (recipe updated) > > > > We've also: > > * disabled the strace tests that fail with 6.5. > > * made sure the serial ports and getty counts match > > * added ttyrun which wraps serial consoles and avoids hacks > > * made the qemurunner logging save all the port logs > > * made the qemurunner write the binary data it is sent verbatim > > * made sure to use nodelay on qemu's tcpserial > > > > This leaves an annoying serial console problem where ttyS1 never has > > the getty login prompt appear. > > > > What we know: > > > > * We've only seen this on x86 more recently (yesterday/today) but have > > seen it on ARM in the days before that. > > > > * It affects both sysvinit and systemd images. > > > > * Systemd does print that it started a getty on ttyS0 and ttyS1 when > > the failure occurs > > > > * There is a getty running according to "ps" when the failure occurs > > > > * There are only ever one or three characters received to ttyS1 in the > > failure case (0x0d and 0x0a chars, i.e. CR and LF) > > > > * It can't be any kind of utf-8 conversion issue since the login prompt > > isn't visible in the binary log > > > > * the kernel boot logs do show the serial port created with the same > > ioport and irq on x86. > > > > Previously we did see some logs with timing issues on the ttyS0 port > > but the nodelay parameter may have helped with that. > > > > There are debug patches in master-next against qemurunner which try and > > poke around to gather more debug when things fail using ttyS0. > > > > The best failure log we have is now this one: > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/5874/steps/14/logs/stdio > > > > where I've saved the logs: > > > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/j/qemu_boot_log.20231007084853 > > and > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/j/qemu_boot_log.20231007084853.2 > > > > You can see ttyS1 times out after 1000 seconds and the port only has a > > single byte (in the .2 file). The other log shows ps output showing the > > getty running for ttyS1. > > > > Ideas welcome on where from here. > > > > I've tweaked master-next to keep reading the ttyS1 port after we poke > > it from ttyS0 to see if that reveals anything next time it fails (build > > running). > > Testing overnight with the new debug yielded: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/87/builds/5895/steps/14/logs/stdio > > The interesting bit being: > > """ > WARNING: core-image-full-cmdline-1.0-r0 do_testimage: Extra read data: > Poky (Yocto Project Reference Distro) 4.2+snapshot- > 7cb4ffbd8380b0509d7fac9191095379af321686 qemux86-64 ttyS1 > > qemux86-64 login: helloA > > Poky (Yocto Project Reference Distro) 4.2+snapshot- > 7cb4ffbd8380b0509d7fac9191095379af321686 qemux86-64 ttyS1 > qemux86-64 login: > > """ > > i.e. the getty didn't appear in 1000s but sometime in shutdown the > original prompt, the "helloA" and the new getty prompt did. > > So the data *is* there but stuck in a buffer somehow. Kernel or qemu > side, I don't know.
To update, the latest debug is: https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/5836/steps/14/logs/stdio which shows that it is enough to echo "helloA" to /dev/ttyS1 to cause the data to be flushed, we don't need to restart the getty. It also shows there can be other system console messages "stuck" in the queue. I did also test forcing a read onto the socket but that just triggers a BlockingIOError exception and doesn't help. I'm going to try writing "\n\n" to the port after 120s if the getty hasn't appeared, see if that helps. Cheers, Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#188835): https://lists.openembedded.org/g/openembedded-core/message/188835 Mute This Topic: https://lists.openembedded.org/mt/101824562/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
