** Description changed:

  If ovsdb-server is down for a while and we are connecting via SSL,
  python-ovs will raise
  
  OpenSSL.SSL.SysCallError: (111, 'ECONNREFUSED')
  
  instead of just returning an error type. If this goes on for a bit, then
  the Connection thread will exit and be unrecoverable without restarting
  neutron-server.
  
  +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  
  SRU:
  
  [Impact]
  Any intermittent connection issues between neutron-server and ovsdb nb/sb 
resulted in neutron-server not handling any more ovsdb transactions due to 
improper exception handling during reconnections. This further creates failures 
in post commit updates of resources and results in neutron/ovn db 
inconsistencies.
  This fix catches the exceptions and retries to connect to ovsdb.
  
  [Test plan]
  * Deploy bionic-ussuri with neutron-server and ovn-central as HA using juju 
charms.
  * Launch few instances and check if instances are in active state
  * Simulated the network communication issues by modifying iptables related to 
ports 6641 6643 6644 16642
  
    - On ovn-central/0, Dropping packets from ovn-central/2 and neutron-server/2
    - On ovn-central/1, Dropping packets from ovn-central/2 and neutron-server/2
    - On ovn-central/2, Dropping packets from ovn-central/0, ovn-central/1, 
neutron-server/0, neutron-server/1
  
  DROP_PKTS_FROM_OVN_CENTRAL=
  DROP_PKTS_FROM_NEUTRON_SERVER=
  for ip in $DROP_PKTS_FROM_OVN_CENTRAL; do for port in 6641 6643 6644 16642; 
do iptables -I ufw-before-input 1 -s $ip -p tcp --dport $port -j REJECT; done; 
done
  for ip in $DROP_PKTS_FROM_NEUTRON_SERVER; do for port in 6641 16642; do 
iptables -I ufw-before-input 1 -s $ip -p tcp --dport $port -j REJECT; done; done
  
  * After a minute, drop the new REJECT rules added.
  * Launch around 5 new VMs (5 to ensure some post creations to be landed on 
neutron-server/2) and look for Timeout Exceptions on neutron-server/2
    If there are any Timeout exceptions, the neutron-server ovsdb connections 
are stale and not handling any more ovsdb transactions.
    No Timeout exceptions and any port status updates from ovsdb implies 
neutron-server is successful in reconnection and started handling updates.
  
  [Where problems could occur]
  
  The fix passed the upstream zuul gates (tempest tests etc) and the patch
- just adds reconnection tries to ovsdbapp. So not expecting to introduce
- any regressions.
+ just adds reconnection tries to ovsdbapp. The fix increases the
+ reconnection attempts for every 4 minutes (3 min connection timeout + 1
+ min sleep) until the connection is successful. I dont see any
+ regressions can happen with this change.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1895727

Title:
  OpenSSL.SSL.SysCallError: (111, 'ECONNREFUSED') and Connection thread
  stops

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1895727/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to