Looks like I forgot to add the link to [1] in the first email: [1] https://github.com/stackforge/haos
On Wed, Jun 3, 2015 at 12:50 PM, Timur Nurlygayanov < [email protected]> wrote: > Hi team, > > I'm working on HA / destructive / recovery automated tests [1] for > OpenStack clouds and I want to get some expectations from users, operators > and developers for the speed of OpenStack recovery after some destructive > actions. > For example, how long cluster should be unavailable if one of three > controller will be destroyed? I think that the right answer is '0 seconds, > no downtime' - users shouldn't see anything strange when we lost one > controller in our cloud (if it is 'true' HA configuration). > In the real world I can see that such destructive scenarios require some > time to recover the cloud (1-15 minutes in different cases) - and I just > want to get your expectations or the requirements. > > How fast we can / should fully recover the cloud in the following cases: > 1. Restart RabbitMQ services > 2. Restart MySQL / Galera services > 3. Restart Neutron services (like L3 agents) > 4. Hard shutdown of any OpenStack controllers > 5. Shutdown of the ethernet interfaces of management / data networks > > Of course, it depends on the configuration, but we can describe some > common, 'expected', asseptance values (SLA) for downtime in differrent > destructive cases and use them to verify the clouds today and in the future. > We will use these values in HAOS project [1], which will allow to validate > any clouds with the same scenarios and with the same SLA for recovery time. > > Any comments are welcome :) > Thank you! > > -- > > Timur, > Senior QA Engineer > OpenStack Projects > Mirantis Inc > -- Timur, Senior QA Engineer OpenStack Projects Mirantis Inc
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
