** Description changed:

+ [Impact]
+ 
+ When the heartbeat connection times out it is not treated as a
+ recoverable error nor attempts to reconnect calling ensure_connection().
+ This leaves the heartbeat thread attempting to reconnect to the same
+ host over and over again.
+ 
+ [Test Case]
+ 
+ * deploy openstack
+   bzr branch lp:openstack-charm-testing
+   cd openstack-charm-testing
+   juju deployer -c default.yaml -d -v artful-pike
+   juju add-unit rabbitmq-server
+ * Force timeout using iptables in a rabbitmq-server node
+   sudo iptables -I INPUT -p tcp --dport 5672 -j DROP
+ 
+ Expected result:
+ once the timeout happens, the heartbeat thread reconnects (picking a new 
rabbit host if needed).
+ 
+ Actual result:
+ the heartbeat thread is left in a loop (connect, socket closed, retry, 
connect...)
+ 
+ [Regression Potential]
+ 
+ Without this patch when the heartbeat connection times out, and it does
+ not attempt to connect to the next configured rabbit host. So the risk
+ is that situations where currently the daemons using this library made
+ it to reconnect to the same host (e.g. the disconnection from the host
+ is only for a few seconds) with this change they will reconnect to the
+ next host, so users may see the connections flapping between two (or
+ more) rabbit hosts.
+ 
+ [Other Info]
  I have a rabbitmq cluster of 3 nodes
  
  root@47704165d2bb:/# rabbitmqctl cluster_status
  Cluster status of node rabbit@47704165d2bb ...
  [{nodes,[{disc,[rabbit@0482398a286e,rabbit@3709521b608a,
-                 rabbit@47704165d2bb]}]},
-  
{running_nodes,[rabbit@0482398a286e,rabbit@3709521b608a,rabbit@47704165d2bb]},
-  {cluster_name,<<"rabbit@47704165d2bb">>},
-  {partitions,[]},
-  {alarms,[{rabbit@0482398a286e,[]},
-           {rabbit@3709521b608a,[]},
-           {rabbit@47704165d2bb,[]}]}]
- root@47704165d2bb:/# rabbitmqctl list_policies      
+                 rabbit@47704165d2bb]}]},
+  
{running_nodes,[rabbit@0482398a286e,rabbit@3709521b608a,rabbit@47704165d2bb]},
+  {cluster_name,<<"rabbit@47704165d2bb">>},
+  {partitions,[]},
+  {alarms,[{rabbit@0482398a286e,[]},
+           {rabbit@3709521b608a,[]},
+           {rabbit@47704165d2bb,[]}]}]
+ root@47704165d2bb:/# rabbitmqctl list_policies
  Listing policies ...
  /       ha-all  all     ^ha\\.  {"ha-mode":"all"}       0
- 
  
  My oslo_message client configuration
  [oslo_messaging_rabbit]
  rabbit_hosts=120.0.0.56:5671,120.0.0.57:5671,120.0.0.55:5671
  rabbit_userid=cloud
  rabbit_password=cloud
  rabbit_ha_queues=True
  rabbit_retry_interval=1
  rabbit_retry_backoff=2
  rabbit_max_retries=0
  rabbit_durable_queues=False
  
  When I run "service rabbitmq-server stop" on one node to simulating a
  failure, I got following error logs, and the consumer can't failover
  from the bad node. It will reconnect the failure node forever instead of
  other nodes. "kombu_failover_strategy" is default value of "round-
  robin".
  
- 
  2009-01-13 18:32:42.785 17 ERROR oslo.messaging._drivers.impl_rabbit [-] 
[4e976d46-ceee-4617-b9be-5e4821990738] AMQP server 120.0.0.56:5671 closed the 
connection. Check login credentials: Socket closed
  2009-01-13 18:32:43.819 17 ERROR oslo.messaging._drivers.impl_rabbit [-] 
Unable to connect to AMQP server on 120.0.0.56:5671 after None tries: Socket 
closed
  2009-01-13 18:32:43.819 17 WARNING oslo.messaging._drivers.impl_rabbit [-] 
Unexpected error during heartbeart thread processing, retrying...
  2009-01-13 18:32:58.874 17 ERROR oslo.messaging._drivers.impl_rabbit [-] 
[4e976d46-ceee-4617-b9be-5e4821990738] AMQP server 120.0.0.56:5671 closed the 
connection. Check login credentials: Socket closed
  2009-01-13 18:32:59.907 17 ERROR oslo.messaging._drivers.impl_rabbit [-] 
Unable to connect to AMQP server on 120.0.0.56:5671 after None tries: Socket 
closed
  2009-01-13 18:32:59.907 17 WARNING oslo.messaging._drivers.impl_rabbit [-] 
Unexpected error during heartbeart thread processing, retrying...
  
- 
  Who can help me. Thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1657444

Title:
  Can't failover when rabbit_hosts is configured as 3 hosts

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1657444/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to