Re: Plan / proposal: enable openQA update testing and potentially gating on Rawhide updates

2023-06-09 Thread Adam Williamson
On Thu, 2022-06-09 at 12:48 -0700, Adam Williamson wrote:
> Hi folks!
> 
> More significantly, I'd also propose that we turn on gating on openQA
> results for Rawhide updates. This would mean Rawhide updates would be
> held from going 'stable' (and included in the next compose) until the
> gating openQA tests had run and passed. We may want to do this a bit
> after turning on the tests; perhaps Fedora 37 branch point would be a
> natural time to do it.

Hi again folks! A quick update here. Now the Rawhide update testing has
been running in production for over a year - and Kevin and I have been
"shadow gating" Rawhide for several months, untagging updates where
openQA tests indicate genuine bugs - I think it's time to go ahead and
enable gating for Rawhide updates. I've worked to make sure the tests
are reliable and failures are promptly investigated, and that Bodhi
provides accurate information on test and gating status. I've proposed
this as a FESCo ticket just to get some visibility and sign-off on the
idea:

https://pagure.io/fesco/issue/3011

thanks everyone!
-- 
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @ad...@fosstodon.org
https://www.happyassassin.net



___
test mailing list -- test@lists.fedoraproject.org
To unsubscribe send an email to test-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: Plan / proposal: enable openQA update testing and potentially gating on Rawhide updates

2022-07-28 Thread Adam Williamson
On Thu, 2022-06-09 at 17:05 -0700, Kevin Fenzi wrote:
> Big +1 from me... I think this would be great to enable. 

Update here: as the feedback was broadly positive, today I went ahead
and hit the big switch and enabled the *tests* on openQA production.

I did not yet enable *gating*.

So, you will now see automated test results from openQA for Rawhide
updates in Bodhi, but no updates will be gated on them. If your Rawhide
update is gated it is not because of openQA. Yet. :D

Rawhide test results are still a bit messier than stable release ones,
but the level of messiness seems to be tolerable after a few months of
practice on staging. The debug kernels do seem to cause somewhat more
flakiness, and we do get more not-correctly-bundled updates on Rawhide,
and more change generally means more brokenness. But it doesn't seem
unmanageable. I'll plan to run things this way for a few more weeks,
and look into the idea that came up about improving the side tag chain
build experience a bit, before considering turning on gating.

Thanks folks!
-- 
Adam Williamson
Fedora QA
IRC: adamw | Twitter: adamw_ha
https://www.happyassassin.net

___
test mailing list -- test@lists.fedoraproject.org
To unsubscribe send an email to test-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: Plan / proposal: enable openQA update testing and potentially gating on Rawhide updates

2022-06-22 Thread Kevin Fenzi
On Wed, Jun 22, 2022 at 06:18:08PM -0700, Adam Williamson wrote:
> On Thu, 2022-06-09 at 12:48 -0700, Adam Williamson wrote:
> > Hi folks!
> ...
> > I think doing this could really help us keep Rawhide solid and avoid
> > introducing major compose-breaking bugs, at minimal cost. But it's a
> > significant change and I wanted to see what folks think. In particular,
> > if you find the existing gating of updates for stable/branched releases
> > to cause problems in any way, I'd love to hear about it.
> > 
> > Thanks folks!
> 
> One thing I forgot to mention in the original email, the benefit here
> isn't theoretical - I've already caught several Rawhide-breaking bugs
> early, or been able to identify the cause more easily, because we have
> the tests running in staging. Here's an example I just caught: a new
> popt version that was sent out today seems to break authselect, which
> is a critical problem and breaks all new installs:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=2100287
> 
> if nirik catches my message in time before the next compose runs, he'll
> be able to untag the new build and the compose won't be completely
> broken. If we had this testing deployed in prod and gating turned on,
> the update would be blocked automatically.

It's been untagged from rawhide and eln. 

kevin


signature.asc
Description: PGP signature
___
test mailing list -- test@lists.fedoraproject.org
To unsubscribe send an email to test-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: Plan / proposal: enable openQA update testing and potentially gating on Rawhide updates

2022-06-22 Thread Adam Williamson
On Thu, 2022-06-09 at 12:48 -0700, Adam Williamson wrote:
> Hi folks!
...
> I think doing this could really help us keep Rawhide solid and avoid
> introducing major compose-breaking bugs, at minimal cost. But it's a
> significant change and I wanted to see what folks think. In particular,
> if you find the existing gating of updates for stable/branched releases
> to cause problems in any way, I'd love to hear about it.
> 
> Thanks folks!

One thing I forgot to mention in the original email, the benefit here
isn't theoretical - I've already caught several Rawhide-breaking bugs
early, or been able to identify the cause more easily, because we have
the tests running in staging. Here's an example I just caught: a new
popt version that was sent out today seems to break authselect, which
is a critical problem and breaks all new installs:

https://bugzilla.redhat.com/show_bug.cgi?id=2100287

if nirik catches my message in time before the next compose runs, he'll
be able to untag the new build and the compose won't be completely
broken. If we had this testing deployed in prod and gating turned on,
the update would be blocked automatically.
-- 
Adam Williamson
Fedora QA
IRC: adamw | Twitter: adamw_ha
https://www.happyassassin.net

___
test mailing list -- test@lists.fedoraproject.org
To unsubscribe send an email to test-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: Plan / proposal: enable openQA update testing and potentially gating on Rawhide updates

2022-06-17 Thread Adam Williamson
On Fri, 2022-06-17 at 19:33 +, Dan Čermák wrote:
> Sounds like a great addition Adam!
> 
> Just to double check: if you have not enabled gating, then openQA will not be 
> run at all?

No, the two are separable. We can enable the tests before we turn on
gating. Right now, the tests are already running, but only on the
stg/lab instance of openQA -
https://openqa.stg.fedoraproject.org/group_overview/2 (that shows all
update tests, not just Rawhide ones). We can turn them on in production
without turning on gating. Although when we don't gate, there can be an
effect where an update that makes the tests fail gets pushed stable,
and then *all* subsequent updates start failing the same way - this
happened recently with the 389-ds-base update that broke FreeIPA
deployment, after that was pushed stable, all Rawhide updates failed
the FreeIPA tests until the fixed 389-ds-base reached a compose
yesterday. This is one big reason I would like to enable gating. :D
Without gating, I have to run around cleaning up cases like that
manually - getting the issues fixed as fast as possible and then re-
running failed tests once the fix lands.
-- 
Adam Williamson
Fedora QA
IRC: adamw | Twitter: adamw_ha
https://www.happyassassin.net

___
test mailing list -- test@lists.fedoraproject.org
To unsubscribe send an email to test-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: Plan / proposal: enable openQA update testing and potentially gating on Rawhide updates

2022-06-10 Thread Kamil Paral
On Thu, Jun 9, 2022 at 9:48 PM Adam Williamson 
wrote:

> So, I'd like to propose that we enable Rawhide update testing on the
> production openQA instance also. This would cause results to appear on
> the Automated Tests tab in Bodhi, but they would be only informational
> (and unless the update was gated by a CI test, or somehow otherwise
> configured not to be pushed automatically, updates would continue to be
> pushed 'stable' almost immediately on creation, regardless of the
> openQA results).
>
> More significantly, I'd also propose that we turn on gating on openQA
> results for Rawhide updates. This would mean Rawhide updates would be
> held from going 'stable' (and included in the next compose) until the
> gating openQA tests had run and passed. We may want to do this a bit
> after turning on the tests; perhaps Fedora 37 branch point would be a
> natural time to do it.
>

+1 from me, thanks for working on this. Crossing fingers for a smooth ride.
___
test mailing list -- test@lists.fedoraproject.org
To unsubscribe send an email to test-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Plan / proposal: enable openQA update testing and potentially gating on Rawhide updates

2022-06-09 Thread Adam Williamson
Hi folks!

We've had openQA testing of updates for stable and branched releases,
and gating based on those tests, enabled for a while now. I believe
this is going quite well, and I think we addressed the issues reported
when we first enabled gating - Bodhi's gating status updates work more
smoothly now, and openQA respects Bodhi's "re-run tests" button so
failed tests can be re-triggered.

A few weeks ago, I enabled testing of Rawhide updates in the openQA
lab/stg instance. This was to see how smoothly the tests run, how often
we run into unexpected failures or problems, and whether the hardware
resources we have are sufficient for the extra load.

So far this has been going more smoothly than I anticipated, if
anything. The workers seem to keep up with the test load, even though
one out of three worker systems for the stg instance is currently out
of commission (we're using it to investigate a bug). We do get
occasional failures which seem to be related to Rawhide kernel slowness
(e.g. operations timing out that usually don't otherwise time out), but
on the whole, the level of false failures is (I would say) acceptably
low, enough that my current regime of checking the test results daily
and restarting failed ones that don't seem to indicate a real bug
should be sufficient.

So, I'd like to propose that we enable Rawhide update testing on the
production openQA instance also. This would cause results to appear on
the Automated Tests tab in Bodhi, but they would be only informational
(and unless the update was gated by a CI test, or somehow otherwise
configured not to be pushed automatically, updates would continue to be
pushed 'stable' almost immediately on creation, regardless of the
openQA results).

More significantly, I'd also propose that we turn on gating on openQA
results for Rawhide updates. This would mean Rawhide updates would be
held from going 'stable' (and included in the next compose) until the
gating openQA tests had run and passed. We may want to do this a bit
after turning on the tests; perhaps Fedora 37 branch point would be a
natural time to do it.

Currently this would usually mean a wait from update submission to
'stable push' (which really means that the build goes into the
buildroot, and will go into the next Rawhide compose when it happens)
of somewhere between 45 minutes and a couple of hours. It would also
mean that if Rawhide updates for inter-dependent packages are not
correctly grouped, the dependent update(s) will fail testing and be
gated until the update they depend on has passed testing and been
pushed. The tests for the dependent update(s) would then need to be re-
run, either by someone hitting the button in Bodhi or an openQA admin
noticing and restarting them, before the dependent update(s) could be
pushed.

In the worst case, if updated packages A and B both need the other to
work correctly but the updates are submitted separately, both updates
may fail tests and be blocked. This could only be resolved by waiving
the failures, or replacing the separate updates with an update
containing both packages.

All of those considerations are already true for stable and branched
releases, but people are probably more used to grouping updates for
stable and branched than doing it for Rawhide, and the typical flow of
going from a build to an update provides more opportunity to create
grouped updates for branched/stable. For Rawhide the easiest way to do
it if you need to do it is to do the builds in a side tag and use
Bodhi's ability to create updates from a side tag.

As with branched/stable, only critical path updates would have the
tests run and be gated on the results. Non-critpath updates would be
unaffected. (There's a small allowlist of non-critpath packages for
which the tests are also run, but they are not currently gated on the
results).

I think doing this could really help us keep Rawhide solid and avoid
introducing major compose-breaking bugs, at minimal cost. But it's a
significant change and I wanted to see what folks think. In particular,
if you find the existing gating of updates for stable/branched releases
to cause problems in any way, I'd love to hear about it.

Thanks folks!
-- 
Adam Williamson
Fedora QA
IRC: adamw | Twitter: adamw_ha
https://www.happyassassin.net
___
test mailing list -- test@lists.fedoraproject.org
To unsubscribe send an email to test-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure