Re: [openstack-dev] Cloud Reliability and Resilience for OpenStack (Fault Injection, Chaos Engineering, and Google SRE)

2016-09-01 Thread Ilya Shakhat
Hi Jorge,

Reliability testing automation is the thing that OpenStack Performance team
[1] is working right now. The whole solution will consist of:
 * Fault injection library os-failures [2] that provides an API to do
different kind of faults on different OpenStack clouds. The lib is
currently under active development, our primary goal is to support
Fuel-based clouds and then DevStack-based.
 * Rally as an engine to run different types of test scenarios. As Boris
mentioned the main feature is called "hooks" - it allows to execute
arbitrary code at predefined points of scenario. In our case we will have
plugin that uses os-failures library.
 * Set of scenarios and reports to them - this will go into OpenStack
Performance Docs [3].
 * Rally plugin for results processing. Basically we are interested in
calculating the following metrics:
 * Count errors appeared during scenario execution (e.g. number of
failed requests)
 * Performance degradation - compare performance (e.g. operation
duration) after the failure against sample data collected before
 * MTTR - how long does it takes for all errors to disappear and how
long does it takes for performance to become normal

If you are interesting in contribution, we have meetings by Tuesday at
15:30 UTC at #openstack-performance IRC channel.

Thanks,
Ilya

[1] https://wiki.openstack.org/wiki/Performance_Team
[2] https://github.com/openstack/os-failures
[3] http://docs.openstack.org/developer/performance-docs/


2016-09-01 10:33 GMT+03:00 Boris Pavlovic :

> Hi Jorge,
>
> Rally team is working on feature called "Hooks".
> "Hooks" are going to allow to use Rally to run workloads and inject any
> actions (including using existing Chaos frameworks)
>
> Here is the patch: https://review.openstack.org/#/c/352276/
> Here is merged spec: https://github.com/openstack/rally/blob/master/
> doc/specs/in-progress/hook_section.rst
>
>
> You are very welcome to join this effort and help Rally team deliver it
> faster.
>
> Thanks!
>
> Best regards,
> Boris Pavlovic
>
> On Wed, Aug 31, 2016 at 11:55 PM, Jorge Cardoso (Cloud Operations and
> Analytics, IT R Division)  wrote:
>
>>
>>
>> Hi all,
>>
>>
>>
>> Is there any work being done on Reliability for OpenStack using e.g.
>> fault-injection, Chaos Engineering from Netflix, and Site Reliability
>> Engineering principles from Google?
>>
>>
>>
>> I only found this page in the documentation
>> http://docs.openstack.org/developer/performance-docs/test_
>> results/reliability/index.html#openstack-reliability-testing.
>>
>>
>>
>> I am working on Cloud Reliability and Resilience and I would like to
>> explore this area for OpenStack.
>>
>> You can check some of my interests and work at:
>> http://jorge-cardoso.github.io/research/
>>
>>
>>
>> Any interest from you guys?
>>
>> Any suggestions on how to proceed?
>>
>>
>>
>>
>>
>> Best Regards,
>>
>> Jorge Cardoso
>>
>>
>>
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Cloud Reliability and Resilience for OpenStack (Fault Injection, Chaos Engineering, and Google SRE)

2016-09-01 Thread Boris Pavlovic
Hi Jorge,

Rally team is working on feature called "Hooks".
"Hooks" are going to allow to use Rally to run workloads and inject any
actions (including using existing Chaos frameworks)

Here is the patch: https://review.openstack.org/#/c/352276/
Here is merged spec:
https://github.com/openstack/rally/blob/master/doc/specs/in-progress/hook_section.rst


You are very welcome to join this effort and help Rally team deliver it
faster.

Thanks!

Best regards,
Boris Pavlovic

On Wed, Aug 31, 2016 at 11:55 PM, Jorge Cardoso (Cloud Operations and
Analytics, IT R Division)  wrote:

>
>
> Hi all,
>
>
>
> Is there any work being done on Reliability for OpenStack using e.g.
> fault-injection, Chaos Engineering from Netflix, and Site Reliability
> Engineering principles from Google?
>
>
>
> I only found this page in the documentation http://docs.openstack.org/
> developer/performance-docs/test_results/reliability/index.html#openstack-
> reliability-testing.
>
>
>
> I am working on Cloud Reliability and Resilience and I would like to
> explore this area for OpenStack.
>
> You can check some of my interests and work at:
> http://jorge-cardoso.github.io/research/
>
>
>
> Any interest from you guys?
>
> Any suggestions on how to proceed?
>
>
>
>
>
> Best Regards,
>
> Jorge Cardoso
>
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev