On Fri, 2023-08-25 at 09:27 +0300, Mikko Rapeli wrote: > Hi, > > On Thu, Aug 24, 2023 at 09:18:03PM +0100, Richard Purdie wrote: > > On Thu, 2023-08-24 at 15:04 +0100, Richard Purdie via > > lists.openembedded.org wrote: > > > On Wed, 2023-08-23 at 22:16 +0100, Richard Purdie via > > > lists.openembedded.org wrote: > > > > On Tue, 2023-08-22 at 23:01 +0100, Richard Purdie via > > > > lists.openembedded.org wrote: > > > > > so the commands are stopping mid flow for unknown reasons or the ssh > > > > > connection fails. I can't tell if this coincides with an rcu stall or > > > > > not. Both logs do have rcu stalls in. > > > > > > > > > > After these failures the system does continue to otherwise work > > > > > normally and subsequent tests pass. > > > > > > > > > > I wonder if the slow emulation might be causing the networking to > > > > > glitch and break the ssh connection. > > > > > > > > > > I'm at a bit of a loss on where from here. > > > > > > > > I thought I'd update the thread with new information. > > > > > > > > I went back to the start with this and looked again and what is going > > > > on. Interestingly, I found one of the autobuilder workers would > > > > consistently fail the qemuppc-alt configuration for core-image-sato- > > > > sdk. I paused the worker and experimented. > > > > > > > > I saw two different failures (included below). One shows systemd-udevd > > > > timing out on it's watchdog after 3 minutes and resetting, including > > > > taking out an ssh session running the cpio configure command. There was > > > > no RCU stall reported. > > > > > > > > The second failure shows systemd-logind as well as systemd-udevd with > > > > the 3 minute time out, the kernel complaining about missed IRQs, an RCU > > > > stall and lots of breakage following including cut ssh commands. > > > > > > > > I could not get the cpio build test to complete. > > > > > > > > Interestingly, I came back to the same image/worker later this evening > > > > and now it all works fine. The difference is earlier there was a world > > > > build running on the worker, which continued to wind down even after I > > > > paused the worker. By the evening, that background load was no longer > > > > present and the ppc image works in isolation. This tells us the issue > > > > is system load dependent and only occurs on loaded systems. > > > > > > > > I suspect I need to replicate the load and retry locally, see if I can > > > > reliably reproduce the hang. The watchdog won't be present on sysvinit > > > > systems which also show the issues but I'd guess there is still some > > > > other starvation/timeout occurring. > > > > > > I've now seen the failure on the autobuilder: > > > > > > * with linux-yocto 6.1.38 > > > * with linux-yocto 6.1.46 > > > * with qemu 8.0.4 > > > * with qemu 8.0.3 > > > * with qemu 8.0.0 > > > > > > I was a little suspicious of: > > > > > > "hw/ppc: Fix clock update drift" > > > https://gitlab.com/qemu-project/qemu/-/commit/73d6ac24c81f1aeae554d469616c9181511e6523 > > > > > > but we've tested with and without that. > > > > > > qemu has just released 8.1.0 so perhaps we should try that next. > > > > qemu 8.1.0 brings with it a new set of problems but I've reproduced the > > hang with 8.1.0 so it does not solve that. > > > > I'm really struggling to understand which change brought in these > > issues for qemuppc. > > Are these issues visible on mickledore branch? Maybe mickledore with kernel > 6.1 stable update or > qemu 7.2 update to 8.y.x could be tested too. At least then kernel or qemu > could be blamed > for the issues.
Not that I know of. I have now also reproduced the failure with glibc 2.37 instead of 2.38 including the fortify sources change and the 6.1.34 kernel so there is something else causing this. I've wondered if we need to try going back to qemu 7.2. It may also be worth ruling out binutils. It shouldn't be systemd as the sysvinit images show the issue too. Cheers, Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#186691): https://lists.openembedded.org/g/openembedded-core/message/186691 Mute This Topic: https://lists.openembedded.org/mt/100733646/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
