On 11/30/2017 6:05 PM, Nematollah Bidokhti wrote:
Hi,

Our [Fault-Genes WG] has been working on defining the fault classifications for key OpenStack projects in an effort to support OpenStack fault management & self-healing.

We have been using machine learning (unsupervised data) as a method to look into all bugs and issues submitted by the community and it has been very challenging to define the classification completely by the machine.

We have decided to go with supervised data set. In order to do this, we need to come up with our training data.

We need your help to generate the training data set. *Basically, we only need 2 or 3 unique fault classifications with a short description and the associated mitigations _from each member who is familiar with OpenStack design & operation_. This way we can build a focused library of faults & mitigations for each project.*

Once this data is accumulated, we will develop our own specific algorithms that can be applied to all future OpenStack issues.

Thanks in advance for your support.

*No.*

        

*Project*

        

*Fault Classification*

        

*Description*

        

*Root Cause*

        

*Mitigation*

*1*

        

**

        

**

        

**

        

**

        

**

*2*

        

**

        

**

        

**

        

**

        

**

*3*

        

**

        

**

        

**

        

**

        

**

Below are examples of what a couple of developers in Neutron have provided. I am sure there are other types of fault classifications in Neurton that have not been captured in this table.

*Fault Classification*

        

*Root Cause*

        

*Mitigation*

Network Connectivity Issues

        

Virtual interface in the VM admin down

        

Un-shut the virtual interface

Virtual interface does not have IP address via DHCP

        

Depends on lower level root cause

Virtual network does not have interface to the router

        

Add virtual network as one of the router interfaces

vNICport of VM not active (stuck in build)

        

Depends on lower level root cause

Security group lock in traffic

        

Fix the security group to allow relevant traffic

Unable to Add Port to Bridge

        

Libvirtdin Apparmor is blocking

        

allow Libvirtd profile in Appamor

No Valid Host Found/insufficient hypervisor resources

        

Compute nodes do not have sufficient resources

        

free up required compute storage and memory resources on compute node

No Resource

        

Configuration issues

        

Change config setting

Authentication/permissions error

        

Configuration error such as port # or Password

        

Make sure end points are properly configured

Gateway access not reachable

                

Use custom keep-alive health-check

Design issue of OpenStack Network node

                

Out of band health checking mechanism

Security Group Mis-configuration

        

The security group

        

Change security rules/Programming the security group

DNS Attack

                

Implement CERT alerts updates

Network design issue

        

Network storm

        

Reduce L2 broadcast domain

Nemat



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


I'm not entirely sure how you classify some of this stuff.

For example, here is a nova/neutron bug in triage:

https://bugs.launchpad.net/nova/+bug/1730637

In this case, the user tries to attach a port to an instance and it fails with a port binding failure.

From the nova side, we have no idea if this is a user error or a problem in the networking backend. Therefore I wouldn't know how to classify this, or describe the root cause or how to mitigate it.

--

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to