[Dilemma on changes - merge or not to merge (e.g. 6.4)] On 14/08/2023 (Mon 
10:54) Richard Purdie wrote:

> I'm becoming a little weary/wary of some of the changes that are coming
> in. The challenge is that once they merge, issues become the problem of
> a very small number of people.
> 
> My current dilemma is the 6.4 kernel. People would like it, we'd really
> ideally use it for the next release but there are issues.
> 
> I've worked through a few, at least pinning down where the issues were
> then resolving them with the help of others (thanks Bruce, Jon, Ross).
> 
> Remaining are:
>   * an error upon boot on preempt-rt on qemux86-64
>      (e.g. 
> https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/7616/steps/36/logs/stdio)
>      We'll probably just have to ignore it in parselogs as it has been??
>      around for a while and nobody seems interested in fixing it upstream.

Just back from vacation and I see an internal report of 10-ish at boot

  NOHZ tick-stop error: local softirq work is pending, handler #80!!!

..on the 6.1.43-rt10-yocto-preempt-rt kernel, on real hardware.  So it
seems we can't blame that one entirely on v6.4 kernel (or qemu).

We used to get (late 3.x and 4.x era) pretty common "NOHZ: local softirq
pending" messages even on common/popular distro kernels.  But I haven't
seen those for a long time and they didn't scream "error" or have the
alarmist three exclamation marks either.

I'll see if I can dig into that further.  This instance is new to me, so
any additional context or information I might not turn up myself would
be useful.

>   * some random hangs:
>      
> https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/349/steps/12/logs/stdio
>      
> https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/354/steps/12/logs/stdio
> 
> The latter are rare and intermittent, mainly taking out CI test builds.
> Most people aren't affected by them, find them hard to reproduce let
> alone fix and will ignore them. That will leave me/Bruce/PaulG holding
> the pieces.

Ugh.  The RCU one is ugly and the Silent Boot Death one is no better.
Nobody likes SBD cases.  They suck.

> 
> I know Bruce spends a ton of time debugging weird things just to get
> the kernel to the point we can even consider merging and nobody ever
> really sees or appreciates that work :(.

Well, not "nobody".  There are at least two people who have a good idea
of what Bruce does.  :-P

Paul.
--

> 
> Systemd was a similar challenge recently, multiple patches causing
> multiple issues with a significant impact on CI. In that case the
> issues weren't intermittent so resolution wasn't so bad.
> 
> Rust and reproducibility??was given a pass so the rest of the changes
> could merge for it. That just meant there was less pressure and the
> reproducibility issue is still there with people saying its too hard.
> That issue is now spreading down the chain to other recipes.
> 
> The toolchain test reports have thousands of failures nobody is really
> looking at. Similarly the now consistent ltp controllers failures
> (previously the reports weren't even consistent!).
> 
> I'm worried the access control patches changing the tar format are
> going to destablise and once merged, people will move on to other
> things leaving any remaining intermittent issues to me. Already we're
> seeing things like sstate being blamed as it is easiest to do that. I
> end up having to "prove" it isn't that.
> 
> There are intermittent ptests on the autobuilder too. I took mdadm
> ptest patches on the basis there was help to fix them. We are still see
> a lot of failures in CI from there. The glib-networking intermittent
> failures continue, I know Trevor has tried to dig into those but he is
> alone in doing it in code which isn't easy to navigate (and I don't
> know how to help there).
> 
> As an idea of impact, every time one of these things fails in CI,
> someone has triage that failure. The bug triage team has to triage the
> bugs too.
> 
> I don't know how we fix this but we really could do with more people
> able to dive in and help with these intermittent issues. I'm really
> really apprehensive about merging some patches as I can just tell
> they're going to cause pain :(.
> 
> Cheers,
> 
> Richard
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#186063): 
https://lists.openembedded.org/g/openembedded-core/message/186063
Mute This Topic: https://lists.openembedded.org/mt/100733646/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to