Hi Raoul and all,

Sorry for joining this discussion late!

Raoul Scarazzini <ra...@redhat.com> wrote:
TL;DR: we would like to change the way HA is tested upstream to avoid
being hitten by evitable bugs that the CI process should discover.

Long version:

Today HA testing in upstream consist only in verifying that a three
controllers setup comes up correctly and can spawn an instance. That's
something, but it’s far from being enough since we continuously see "day
two" bugs.
We started covering this more than a year ago in internal CI and today
also on rdocloud using a project named tripleo-quickstart-utils [1].
Apart from his name, the project is not limited to tripleo-quickstart,
it covers three principal roles:

1 - stonith-config: a playbook that can be used to automate the creation
of fencing devices in the overcloud;
2 - instance-ha: a playbook that automates the seventeen manual steps
needed to configure instance HA in the overcloud, test them via rally
and verify that instance HA works;
3 - validate-ha: a playbook that runs a series of disruptive actions in
the overcloud and verifies it always behaves correctly by deploying a
heat-template that involves all the overcloud components;

Yes, a more rigorous approach to HA testing obviously has huge value,
not just for TripleO deployments, but also for any type of OpenStack
deployment.

To make this usable upstream, we need to understand where to put this
code. Here some choices:

[snipped]

I do not work on TripleO, but I'm part of the wider OpenStack
sub-communities which focus on HA[0] and more recently,
self-healing[1].  With that hat on, I'd like to suggest that maybe
it's possible to collaborate on this in a manner which is agnostic to
the deployment mechanism.  There is an open spec on this:

   https://review.openstack.org/#/c/443504/

which was mentioned in the Denver PTG session on destructive testing
which you referenced[2].

As mentioned in the self-healing SIG's session in Dublin[3], the OPNFV
community has already put a lot of effort into testing HA scenarios,
and it would be great if this work was shared across the whole
OpenStack community.  In particular they have a project called
Yardstick:

   https://www.opnfv.org/community/projects/yardstick

which contains a bunch of HA test cases:

   
http://docs.opnfv.org/en/latest/submodules/yardstick/docs/testing/user/userguide/15-list-of-tcs.html#h-a

Currently each sub-community and vendor seems to be reinventing HA
testing by itself to some extent, which is easier to accomplish in the
short-term, but obviously less efficient in the long-term.  It would
be awesome if we could break these silos down and join efforts! :-)

Cheers,
Adam

[0] #openstack-ha on Freenode IRC
[1] https://wiki.openstack.org/wiki/Self-healing_SIG
[2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing
[3] https://etherpad.openstack.org/p/self-healing-ptg-rocky

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to