Hi Richard, sorry for the late reply. I see that you have decided to apply the patch in the mean time, but here are my comments
On 11/4/23 04:13, Richard Purdie wrote: > On Fri, 2023-11-03 at 13:50 -0700, Alexis Lothoré via > lists.openembedded.org wrote: >> From: Alexis Lothoré <[email protected]> >> 5 regression(s) for oescripts >> oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED >> oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> >> SKIPPED >> >> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: >> PASSED -> SKIPPED >> >> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: >> PASSED -> SKIPPED >> >> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: >> PASSED -> SKIPPED >> > > Do you have a different example? This one is tricky as I happen to know > that test depends on the host distro and the things available there. > There are some distros it will pass on, there are some where it is > always skipped. There was a recipetool test added recently which will > do something similar depending upon host python version. If not already done, you can take a look at the report I have manually generated for 4.3.rc2 (https://pastebin.com/fvRcqes4) which relies on this patch > The challenge is we run the test on different host distros so it is > hard to see it as a regression. I don't know what we can do to make > this "clear" to the report reader... > > The patch is probably ok as it doesn't make the output worse but it has > probably already obfuscated things a bit. I am not sure if this patch obfuscates things more, since it is mostly about grouping regressions to avoid the repeated "1 regression(s) for <..>". I would say it _may_ obfuscate some things if there are a lot of regressions for the same test kind (so the display limit is triggered) AND some tests of this specific kind are meant to run on specific distros while some others must run on multiple. In this case, indeed, there may be a mix of (legitimately) skipped tests and real failing tests, hidden by the display limit. But is it the case ? Anyway, I agree with you about the main issue (false positive due to tests not meant to be compared between some distros/machines), but I did not find time yet to take a better look at this and propose something relevant while making sure not to loose any relevant comparison. The "dumb" way could be to detect that all tests in a result have a "SKIPPED" status on target side, which hints about those tests not being relevant for the target (in this case, we could simply silently discard the comparison), but I have to ensure it is valid for most cases. Alexis > > Cheers, > > Richar -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#190249): https://lists.openembedded.org/g/openembedded-core/message/190249 Mute This Topic: https://lists.openembedded.org/mt/102373214/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
