[openstack-dev] [Neutron] Crash Issue: OVS-Agent status needs to be fully represented/processed

Robin Wang Fri, 01 Aug 2014 09:57:06 -0700

Recently we encountered some ovs-agent crash issues.  [1][2][3]

*[Root cause]*
1. Currently only a 'restarted' flag is used in rpc_loop() to identify ovs
status.
* ovs_restarted = self.check_ovs_restart() *


*True*: ovs is running, but a restart happened before this loop. rpc_loop()
reset bridges and re-process ports.
*False*: ovs is running since last loop, rpc_loop() continue to process in
a normal way.

But if ovs is dead, or is not up yet during a restart, check_ovs_restart()
will incorrectly returns "True". Then rpc_loop() continues to reset
bridges, and apply other ovs operations, till causing exceptions/crash.
 Related Bug: [1] [2]

2. Also, during agent boot up, ovs status is not checked at all. Agent
crashes without no useful log info, when ovs is dead. Related Bug: [3]

*[Proposal]*
1. Add const {NORMAL, DEAD, RESTARTED} to represent ovs status.
NORMAL - ovs is running since last loop, rpc_loop() continue to process in
a normal way.
RESTARTED - ovs is running, but a restart happened before this loop.
rpc_loop() reset bridges and re-process ports.
DEAD - keep agent running, but rpc_loop() doesn't apply ovs operations to
prevent unnecessary exceptions/crash. When ovs is up, it enters RESTARTED
mode;

2. Check ovs status during agent boot up, if it's DEAD, exit graceful since
subsequent operations causes a crash, and write log to remind that ovs_dead
causes agent termination.

*[Code Review]* https://review.openstack.org/#/c/110538/   Will be
appreciated if you could share some thoughts or do a quick code review.
Thanks.

Best,
Robin

[1] https://bugs.launchpad.net/neutron/+bug/1296202
[2] https://bugs.launchpad.net/neutron/+bug/1350179
[3] https://bugs.launchpad.net/neutron/+bug/1351135

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Neutron] Crash Issue: OVS-Agent status needs to be fully represented/processed

Reply via email to