On Fri, 2026-02-06 at 07:42 +0000, Peter Marko via lists.openembedded.org wrote: > > On Fri Feb 6, 2026 at 6:24 AM CET, Hemanth.KumarMD via > > lists.openembedded.org wrote: > > > Hello Peter, > > > > (Yoann here) > > > > > We sometimes see regressions in glibc test runs depending on the local > > > environment. In such cases, it can help to re-trigger the tests to > > > check whether the failures are consistently reproducible. It may also > > > be useful to cross-check the results with autobuilder runs, which > > > generally provide a more stable baseline before concluding on > > > regressions. > > I am planning to run the testsuites again during weekend. > I also find it weird that every update shows different "before" results than > previous one showin in "after" result. > Not sure if it's flakiness or local conditions like CPU overload at time of > running tests. >
For context, many of the 'toolchain' testsuites have some flaky tests in them and we don't always see consistent output. For example, the 5.0.14 regression report mentions: https://downloads.yoctoproject.org/releases/yocto/yocto-5.0.14/testresults/testresult-regressions-report.txt Regression: oeselftest_almalinux-8.10_qemuarm_20251014024649 oeselftest_almalinux-9.7_qemuarm_20251118011611 Total: 3 new regression(s): 1 regression(s) for ptestresult.gcc-libstdc++-v3-user ptestresult.gcc-libstdc++-v3-user.30_threads/async/async.cc execution test: PASS -> FAIL 2 regression(s) for ptestresult.glibc-user ptestresult.glibc-user.misc/tst-linux-mremap1: UNSUPPORTED -> FAIL ptestresult.glibc-user.misc/tst-pidfd: UNSUPPORTED -> FAIL or: 5 regression(s) for ptestresult.glibc ptestresult.glibc.elf/ifuncmain8: PASS -> No matching test result ptestresult.glibc.elf/tst-tls20: PASS -> FAIL ptestresult.glibc.iconvdata/mtrace-tst-loading: PASS -> FAIL ptestresult.glibc.iconvdata/tst-loading: PASS -> FAIL ptestresult.glibc.nptl/tst-thread-affinity-pthread: PASS -> FAIL over time we've been trying to investigate and resolve these kinds of issues but we're obviously not there yet. I can say we've made big improvements, you can see it in the numbers, e.g. comparing: https://downloads.yoctoproject.org/releases/yocto/yocto-4.0/testreport.txt - 142,413 failures https://downloads.yoctoproject.org/releases/yocto/milestones/yocto-5.3_M3/testreport.txt 1,636 failures which shows a definite improvement! The number of flaky test results has also decreased but is obviously not as easy to measure. We have a policy with ptests and rust of no failures. We're trying to get there with gcc/binutils/glibc/ltp. The test results are stored in the testresults.json files and resulttool knows how to generate reports and compare files for regression reports. The plus side of the autobuilder results is that we have a long baseline for comparison and a relatively stbale testing environment which should be consistent. Cheers, Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#230601): https://lists.openembedded.org/g/openembedded-core/message/230601 Mute This Topic: https://lists.openembedded.org/mt/117452497/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
