Re: [openstack-dev] [TripleO][CI][QA][HA] Validating HA on upstream

2018-03-06 Thread Raoul Scarazzini
On 06/03/2018 13:27, Adam Spiers wrote:
> Hi Raoul and all,
> Sorry for joining this discussion late!
[...]
> I do not work on TripleO, but I'm part of the wider OpenStack
> sub-communities which focus on HA[0] and more recently,
> self-healing[1].  With that hat on, I'd like to suggest that maybe
> it's possible to collaborate on this in a manner which is agnostic to
> the deployment mechanism.  There is an open spec on this>    
> https://review.openstack.org/#/c/443504/
> which was mentioned in the Denver PTG session on destructive testing
> which you referenced[2].
[...]
>    https://www.opnfv.org/community/projects/yardstick
[...]
> Currently each sub-community and vendor seems to be reinventing HA
> testing by itself to some extent, which is easier to accomplish in the
> short-term, but obviously less efficient in the long-term.  It would
> be awesome if we could break these silos down and join efforts! :-)

Hi Adam,
First of all thanks for your detailed answer. Then let me be honest
while saying that I didn't know yardstick. I need to start from scratch
here to understand what this project is. In any case, the exact meaning
of this thread is to involve people and have a more comprehensive look
at what's around.
The point here is that, as you can see from the tripleo-ha-utils spec
[1] I've created, the project is meant for TripleO specifically. On one
side this is a significant limitation, but on the other one, due to the
pluggable nature of the project, I think that integrations with other
software like you are proposing is not impossible.
Feel free to add your comments to the review. In the meantime, I'll
check yardstick to see which kind of bridge we can build to avoid
reinventing the wheel.

Thanks a lot again for your involvement,

[1] https://review.openstack.org/#/c/548874/

-- 
Raoul Scarazzini
ra...@redhat.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][CI][QA][HA] Validating HA on upstream

2018-03-06 Thread Adam Spiers

Hi Raoul and all,

Sorry for joining this discussion late!

Raoul Scarazzini  wrote:

TL;DR: we would like to change the way HA is tested upstream to avoid
being hitten by evitable bugs that the CI process should discover.

Long version:

Today HA testing in upstream consist only in verifying that a three
controllers setup comes up correctly and can spawn an instance. That's
something, but it’s far from being enough since we continuously see "day
two" bugs.
We started covering this more than a year ago in internal CI and today
also on rdocloud using a project named tripleo-quickstart-utils [1].
Apart from his name, the project is not limited to tripleo-quickstart,
it covers three principal roles:

1 - stonith-config: a playbook that can be used to automate the creation
of fencing devices in the overcloud;
2 - instance-ha: a playbook that automates the seventeen manual steps
needed to configure instance HA in the overcloud, test them via rally
and verify that instance HA works;
3 - validate-ha: a playbook that runs a series of disruptive actions in
the overcloud and verifies it always behaves correctly by deploying a
heat-template that involves all the overcloud components;


Yes, a more rigorous approach to HA testing obviously has huge value,
not just for TripleO deployments, but also for any type of OpenStack
deployment.


To make this usable upstream, we need to understand where to put this
code. Here some choices:


[snipped]

I do not work on TripleO, but I'm part of the wider OpenStack
sub-communities which focus on HA[0] and more recently,
self-healing[1].  With that hat on, I'd like to suggest that maybe
it's possible to collaborate on this in a manner which is agnostic to
the deployment mechanism.  There is an open spec on this:

   https://review.openstack.org/#/c/443504/

which was mentioned in the Denver PTG session on destructive testing
which you referenced[2].

As mentioned in the self-healing SIG's session in Dublin[3], the OPNFV
community has already put a lot of effort into testing HA scenarios,
and it would be great if this work was shared across the whole
OpenStack community.  In particular they have a project called
Yardstick:

   https://www.opnfv.org/community/projects/yardstick

which contains a bunch of HA test cases:

   
http://docs.opnfv.org/en/latest/submodules/yardstick/docs/testing/user/userguide/15-list-of-tcs.html#h-a

Currently each sub-community and vendor seems to be reinventing HA
testing by itself to some extent, which is easier to accomplish in the
short-term, but obviously less efficient in the long-term.  It would
be awesome if we could break these silos down and join efforts! :-)

Cheers,
Adam

[0] #openstack-ha on Freenode IRC
[1] https://wiki.openstack.org/wiki/Self-healing_SIG
[2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing
[3] https://etherpad.openstack.org/p/self-healing-ptg-rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev