Re: [openstack-dev] [Openstack-operators] Destructive / HA / fail-over scenarios

2016-12-01 Thread Jorge Cardoso (Cloud Operations and Analytics, IT R Division)

Hi Timur,

You can also consider Stepler from Mirantis https://github.com/Mirantis/stepler.
I never tried it, but the documentation states: "Stepler framework is intended 
to provide the community with a testing framework that is capable of perform 
advanced scenario and destructive test cases, like batch instances launching, 
instances migration, services restarts and different HA-specific cases."

Cheers,
Jorge


-Original Message-
From: Adam Spiers [mailto:aspi...@suse.com] 
Sent: Monday, November 28, 2016 3:09 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Openstack-operators] Destructive / HA / fail-over 
scenarios

Timur Nurlygayanov  wrote:
> Hi OpenStack developers and operators,
> 
> we are going to create the test suite for destructive testing of 
> OpenStack clouds. We want to hear your feedback and ideas about 
> possible destructive and failover scenarios which we need to check.
> 
> Which scenarios we need to check if we want to make sure that some 
> OpenStack cluster is configured in High Availability mode and can be 
> published as a "production/enterprise" cluster.
> 
> Your ideas are welcome, let's discuss the ideas of test scenarios in 
> this email thread.

I applaud the effort to boost automated testing of failure scenarios!
And thanks a lot for polling the list before starting any work on this.

Regarding the implementation, did you consider reusing Cloud 99, and if not, 
please could you? :-) Obviously it would be good to avoid reinventing wheels 
where possible.


https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/high-availability-and-resiliency-testing-strategies-for-openstack-clouds

https://github.com/cisco-oss-eng/Cloud99

If there are some gaps between Cloud99 and what is needed then it would be 
worth evaluating them in order to determine whether it makes sense to start 
from scratch versus simply develop Cloud99 further.

Also it would be great if you could join the #openstack-ha IRC channel where 
you will find friendly folks from the broader OpenStack HA sub-community who 
I'm sure will be happy to discuss this further.

You are also very welcome to join our weekly IRC meetings:

https://wiki.openstack.org/wiki/Meetings/HATeamMeeting

Thanks!
Adam

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Cloud Reliability and Resilience for OpenStack (Fault Injection, Chaos Engineering, and Google SRE)

2016-09-01 Thread Jorge Cardoso (Cloud Operations and Analytics, IT R Division)

Hi all,

Is there any work being done on Reliability for OpenStack using e.g. 
fault-injection, Chaos Engineering from Netflix, and Site Reliability 
Engineering principles from Google?

I only found this page in the documentation 
http://docs.openstack.org/developer/performance-docs/test_results/reliability/index.html#openstack-reliability-testing.

I am working on Cloud Reliability and Resilience and I would like to explore 
this area for OpenStack.
You can check some of my interests and work at: 
http://jorge-cardoso.github.io/research/

Any interest from you guys?
Any suggestions on how to proceed?


Best Regards,
Jorge Cardoso


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev