On Mon, 2023-12-18 at 23:01 +0000, Richard Purdie via lists.openembedded.org wrote: > On Mon, 2023-12-18 at 10:07 -0800, Khem Raj wrote: > > On Mon, Dec 18, 2023 at 9:58 AM Richard Purdie > > <[email protected]> wrote: > > > > > > On Mon, 2023-12-18 at 09:45 -0800, Khem Raj wrote: > > > > I tried the two patches in this series. It did improve the situation > > > > but I am still getting SSH timeouts. But this time its 13 tests > > > > earlier it used to be 40+ > > > > btw. my images are using systemd. So it might be good to see if we see > > > > this with poky-altcfg as well or not. > > > > > > Do you have the log.do_testimage and the ${WORKDIR}/testimage/ files? > > > > yes, further I ran the failing tests in loop one after another still > > one test gzip fails with ssh timeouts > > > > https://busybox.net/~kraj/log.do_testimage.503 > > https://busybox.net/~kraj/testimage/ > > > > there are two runs in the testimages folder. In one you see the RCU > > stall and in second you do not > > but it fails with same ssh timeout issue. > > > > > > > > Did you still see rcu stalls in the logs? > > What is interesting is there is ~3MB of nulls in the .2 serial log. The > rcu stall is also: > > [ 88.261687] serial8250_tx_chars+0xea/0x2b0 > [ 88.261689] serial8250_handle_irq+0x1e9/0x330 > [ 88.261691] serial8250_default_handle_irq+0x4a/0x90 > [ 88.261693] serial8250_interrupt+0x66/0xc0 > [ 88.261696] __handle_irq_event_percpu+0x54/0x1c0 > [ 88.261701] handle_irq_event+0x3d/0x80 > > i.e. it is stalled in the serial TX path. > > The big question is why is there so many nulls on the serial port. I > see a few on my local x86 test runs but only ~4kb, not megabytes of > them. I hadn't worked out where/what they are from yet. > > I suspect something in the serial/kernel/qemu space isn't interacting > correctly. > > Find the cause of the nulls and we might make progress on this.
I went digging and it is a mess. That 3MB file is actually a wtmp file from /var/log. The reason is a failure of the "target_dumper" commands and that the simplistic serial command handling doesn't like binary files. I've a few patches which should improve the situation but we're clearly multiple issues at play here where issues are stacking on top of other issues. My rough list of issues: a) stdout logging from qemu wasn't being read and could overflow buffers locking qemu b) serial data from commands wasn't being fully read and could overflow buffers locking qemu c) the dumper log reading command is reading binary files like wtmp d) the dumper logs overwrite each other so are useless anyway e) the dumper logs run on every failed command f) qmp monitor dump command also run on every failed command I think we'd be much better off if we drop the dumper/monitor stuff and qemu runs might be quite a bit faster too. Cheers, Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#192724): https://lists.openembedded.org/g/openembedded-core/message/192724 Mute This Topic: https://lists.openembedded.org/mt/103244414/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
