[Dilemma on changes - merge or not to merge (e.g. 6.4)] On 14/08/2023 (Mon 10:54) Richard Purdie wrote:
> I'm becoming a little weary/wary of some of the changes that are coming > in. The challenge is that once they merge, issues become the problem of > a very small number of people. > > My current dilemma is the 6.4 kernel. People would like it, we'd really > ideally use it for the next release but there are issues. > > I've worked through a few, at least pinning down where the issues were > then resolving them with the help of others (thanks Bruce, Jon, Ross). > > Remaining are: > * an error upon boot on preempt-rt on qemux86-64 > (e.g. > https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/7616/steps/36/logs/stdio) > We'll probably just have to ignore it in parselogs as it has been?? > around for a while and nobody seems interested in fixing it upstream. Just back from vacation and I see an internal report of 10-ish at boot NOHZ tick-stop error: local softirq work is pending, handler #80!!! ..on the 6.1.43-rt10-yocto-preempt-rt kernel, on real hardware. So it seems we can't blame that one entirely on v6.4 kernel (or qemu). We used to get (late 3.x and 4.x era) pretty common "NOHZ: local softirq pending" messages even on common/popular distro kernels. But I haven't seen those for a long time and they didn't scream "error" or have the alarmist three exclamation marks either. I'll see if I can dig into that further. This instance is new to me, so any additional context or information I might not turn up myself would be useful. > * some random hangs: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/349/steps/12/logs/stdio > > https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/354/steps/12/logs/stdio > > The latter are rare and intermittent, mainly taking out CI test builds. > Most people aren't affected by them, find them hard to reproduce let > alone fix and will ignore them. That will leave me/Bruce/PaulG holding > the pieces. Ugh. The RCU one is ugly and the Silent Boot Death one is no better. Nobody likes SBD cases. They suck. > > I know Bruce spends a ton of time debugging weird things just to get > the kernel to the point we can even consider merging and nobody ever > really sees or appreciates that work :(. Well, not "nobody". There are at least two people who have a good idea of what Bruce does. :-P Paul. -- > > Systemd was a similar challenge recently, multiple patches causing > multiple issues with a significant impact on CI. In that case the > issues weren't intermittent so resolution wasn't so bad. > > Rust and reproducibility??was given a pass so the rest of the changes > could merge for it. That just meant there was less pressure and the > reproducibility issue is still there with people saying its too hard. > That issue is now spreading down the chain to other recipes. > > The toolchain test reports have thousands of failures nobody is really > looking at. Similarly the now consistent ltp controllers failures > (previously the reports weren't even consistent!). > > I'm worried the access control patches changing the tar format are > going to destablise and once merged, people will move on to other > things leaving any remaining intermittent issues to me. Already we're > seeing things like sstate being blamed as it is easiest to do that. I > end up having to "prove" it isn't that. > > There are intermittent ptests on the autobuilder too. I took mdadm > ptest patches on the basis there was help to fix them. We are still see > a lot of failures in CI from there. The glib-networking intermittent > failures continue, I know Trevor has tried to dig into those but he is > alone in doing it in code which isn't easy to navigate (and I don't > know how to help there). > > As an idea of impact, every time one of these things fails in CI, > someone has triage that failure. The bug triage team has to triage the > bugs too. > > I don't know how we fix this but we really could do with more people > able to dive in and help with these intermittent issues. I'm really > really apprehensive about merging some patches as I can just tell > they're going to cause pain :(. > > Cheers, > > Richard >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#186063): https://lists.openembedded.org/g/openembedded-core/message/186063 Mute This Topic: https://lists.openembedded.org/mt/100733646/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
