Hi Richard,
sorry for the late reply. I see that you have decided to apply the patch in the
mean time, but here are my comments

On 11/4/23 04:13, Richard Purdie wrote:
> On Fri, 2023-11-03 at 13:50 -0700, Alexis Lothoré via
> lists.openembedded.org wrote:
>> From: Alexis Lothoré <[email protected]>
>> 5 regression(s) for oescripts
>>     oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED
>>     oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> 
>> SKIPPED
>>     
>> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output:
>>  PASSED -> SKIPPED
>>     
>> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output:
>>  PASSED -> SKIPPED
>>     
>> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output:
>>  PASSED -> SKIPPED
>>
> 
> Do you have a different example? This one is tricky as I happen to know
> that test depends on the host distro and the things available there.
> There are some distros it will pass on, there are some where it is
> always skipped. There was a recipetool test added recently which will
> do something similar depending upon host python version.

If not already done, you can take a look at the report I have manually generated
for 4.3.rc2 (https://pastebin.com/fvRcqes4) which relies on this patch

> The challenge is we run the test on different host distros so it is
> hard to see it as a regression. I don't know what we can do to make
> this "clear" to the report reader...
> 
> The patch is probably ok as it doesn't make the output worse but it has
> probably already obfuscated things a bit.

I am not sure if this patch obfuscates things more, since it is mostly about
grouping regressions to avoid the repeated "1 regression(s) for <..>". I would
say it _may_ obfuscate some things if there are a lot of regressions for the
same test kind  (so the display limit is triggered) AND some tests of this
specific kind are meant to run on specific distros while some others must run on
multiple. In this case, indeed, there may be a mix of (legitimately) skipped
tests and real failing tests, hidden by the display limit. But is it the case ?

Anyway, I agree with you about the main issue (false positive due to tests not
meant to be compared between some distros/machines), but I did not find time yet
to take a better look at this and propose something relevant while making sure
not to loose any relevant comparison. The "dumb" way could be to detect that all
tests in a result have a "SKIPPED" status on target side, which hints about
those tests not being relevant for the target (in this case, we could simply
silently discard the comparison), but I have to ensure it is valid for most 
cases.

Alexis

> 
> Cheers,
> 
> Richar

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#190249): 
https://lists.openembedded.org/g/openembedded-core/message/190249
Mute This Topic: https://lists.openembedded.org/mt/102373214/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to