** Description changed:

  We are using Openstack Rocky as well as rabbitmq 3.7.4 in our
  production.
  
  Occasionally I saw many following lines in log
  
  2020-06-11 02:03:06.753 3877409 WARNING oslo.messaging._drivers.impl_rabbit 
[-] Unexpected error during heartbeart thread processing, retrying...: 
ConnectionForced: Too many heartbeats missed
  2020-06-11 02:03:21.754 3877409 WARNING oslo.messaging._drivers.impl_rabbit 
[-] Unexpected error during heartbeart thread processing, retrying...: 
ConnectionForced: Too many heartbeats missed
  2020-06-11 02:03:36.755 3877409 WARNING oslo.messaging._drivers.impl_rabbit 
[-] Unexpected error during heartbeart thread processing, retrying...: 
ConnectionForced: Too many heartbeats missed
  2020-06-11 02:03:51.756 3877409 WARNING oslo.messaging._drivers.impl_rabbit 
[-] Unexpected error during heartbeart thread processing, retrying...: 
ConnectionForced: Too many heartbeats missed
  2020-06-11 02:04:06.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit 
[-] Unexpected error during heartbeart thread processing, retrying...: 
ConnectionForced: Too many heartbeats missed
  2020-06-11 02:04:21.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit 
[-] Unexpected error during heartbeart thread processing, retrying...: 
ConnectionForced: Too many heartbeats missed
  2020-06-11 02:04:36.758 3877409 WARNING oslo.messaging._drivers.impl_rabbit 
[-] Unexpected error during heartbeart thread processing, retrying...: 
ConnectionForced: Too many heartbeats missed
  2020-06-11 02:04:51.759 3877409 WARNING oslo.messaging._drivers.impl_rabbit 
[-] Unexpected error during heartbeart thread processing, retrying...: 
ConnectionForced: Too many heartbeats missed
  
  heartbeart interval is 60s and rate is 2. Although it is screaming for
  missing hearbeats seems rabbitmq server is running fine and messages are
  received and processed successfully.
+ 
+ ***************************************************
+ 
+ SRU Details
+ -----------
+ 
+ [Impact]
+ AMQP messages are dropped sometimes resulted in resource creation errors 
(happened on an environment twice in a week).
+ Catching the ConnectionForced AMQP connection and reestablish the connection 
immediately will remediate the issue.
+ 
+ [Test Case]
+ Reproducing the issue is trickysome. Here are the steps that might help in 
reproducing the issue.
+ 
+ 1. Deploy OpenStack 
+     (If stsstack-bundles project is used, run command ./generate-bundle.sh -s 
bionic -r stein -n ddmi:stsstack --run)
+ 2. Change heartbeat_timeout_threshold to 20s in nova.conf and restart nova-api
+ On nova-cloud-controller,
+ 
+ [oslo_messaging_rabbit]
+ heartbeat_timeout_threshold = 20
+ 
+ systemctl restart apache2.service
+ 
+ 3. Create and delete instances continuously
+ 
+ ./tools/instance_launch.sh 10 cirros  # command on stsstack-bundles
+ openstack server list -c ID -f value | xargs openstack server delete
+ 
+ 4. On rabbitmq server, drop packets from nova-api -> rabbitmq and allow them 
randomly
+ sudo iptables -A INPUT -p tcp --dport 5672 -s 10.5.1.55 -j DROP
+ sudo iptables -D INPUT 1
+ 
+ 5. Perform steps 3,4 until you see the following message in nova-api log
+ WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during 
heartbeart thread processing, retrying...: amqp.exceptions.ConnectionForced: 
Too many heartbeats missed
+ 
+ 6. Install the fixed python-oslo.messaging package on nova-cloud-controller
+    And restart apache service.
+ 
+ 7. Perform steps 3,4 and verify nova-api log for the following INFO message.
+ INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel 
error occurred, trying to reconnect: Too many heartbeats missed
+ 
+ As the above test case is random in nature to reproduce, as additional
+ measure, continuous integration tests for nova-cloud-controller will be
+ run against the packages that are in -proposed.
+ 
+ [Regression Potential]
+ I do not foresee any regression potential as the patch just adds a new 
exception and reconnects to AMQP server immediately.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1883038

Title:
  Excessive number of ConnectionForced: Too many heartbeats missed in
  logs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1883038/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to