On Fri, 2026-02-06 at 07:42 +0000, Peter Marko via lists.openembedded.org wrote:
> > On Fri Feb 6, 2026 at 6:24 AM CET, Hemanth.KumarMD via
> > lists.openembedded.org wrote:
> > > Hello Peter,
> > 
> > (Yoann here)
> > 
> > > We sometimes see regressions in glibc test runs depending on the local
> > > environment. In such cases, it can help to re-trigger the tests to
> > > check whether the failures are consistently reproducible. It may also
> > > be useful to cross-check the results with autobuilder runs, which
> > > generally provide a more stable baseline before concluding on
> > > regressions.
> 
> I am planning to run the testsuites again during weekend.
> I also find it weird that every update shows different "before" results than 
> previous one showin in "after" result.
> Not sure if it's flakiness or local conditions like CPU overload at time of 
> running tests.
> 

For context, many of the 'toolchain' testsuites have some flaky tests
in them and we don't always see consistent output.

For example, the 5.0.14 regression report mentions:

https://downloads.yoctoproject.org/releases/yocto/yocto-5.0.14/testresults/testresult-regressions-report.txt

Regression:  oeselftest_almalinux-8.10_qemuarm_20251014024649
             oeselftest_almalinux-9.7_qemuarm_20251118011611
    Total: 3 new regression(s):
    1 regression(s) for ptestresult.gcc-libstdc++-v3-user
        ptestresult.gcc-libstdc++-v3-user.30_threads/async/async.cc execution 
test: PASS -> FAIL
    2 regression(s) for ptestresult.glibc-user
        ptestresult.glibc-user.misc/tst-linux-mremap1: UNSUPPORTED -> FAIL
        ptestresult.glibc-user.misc/tst-pidfd: UNSUPPORTED -> FAIL

or:

    5 regression(s) for ptestresult.glibc
        ptestresult.glibc.elf/ifuncmain8: PASS -> No matching test result
        ptestresult.glibc.elf/tst-tls20: PASS -> FAIL
        ptestresult.glibc.iconvdata/mtrace-tst-loading: PASS -> FAIL
        ptestresult.glibc.iconvdata/tst-loading: PASS -> FAIL
        ptestresult.glibc.nptl/tst-thread-affinity-pthread: PASS -> FAIL

over time we've been trying to investigate and resolve these kinds of
issues but we're obviously not there yet.


I can say we've made big improvements, you can see it in the numbers,
e.g. comparing:

https://downloads.yoctoproject.org/releases/yocto/yocto-4.0/testreport.txt - 
142,413 failures
https://downloads.yoctoproject.org/releases/yocto/milestones/yocto-5.3_M3/testreport.txt
 1,636 failures

which shows a definite improvement! The number of flaky test results
has also decreased but is obviously not as easy to measure.

We have a policy with ptests and rust of no failures. We're trying to
get there with gcc/binutils/glibc/ltp.

The test results are stored in the testresults.json files and
resulttool knows how to generate reports and compare files for
regression reports.

The plus side of the autobuilder results is that we have a long
baseline for comparison and a relatively stbale testing environment
which should be consistent.

Cheers,

Richard




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#230601): 
https://lists.openembedded.org/g/openembedded-core/message/230601
Mute This Topic: https://lists.openembedded.org/mt/117452497/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to