Reviewed: https://review.opendev.org/c/openstack/oslo.messaging/+/819142 Committed: https://opendev.org/openstack/oslo.messaging/commit/7b3968d9b012e873a9b393fcefa578c46fca18c6 Submitter: "Zuul (22348)" Branch: master
commit 7b3968d9b012e873a9b393fcefa578c46fca18c6 Author: Balazs Gibizer <[email protected]> Date: Tue Nov 23 16:58:05 2021 +0100 [rabbit] use retry parameters during notification sending The rabbit backend now applies the [oslo_messaging_notifications]retry, [oslo_messaging_rabbit]rabbit_retry_interval, rabbit_retry_backoff and rabbit_interval_max configuration parameters when tries to establish the connection to the message bus during notification sending. This patch also clarifies the differences between the behavior of the kafka and the rabbit drivers in this regard. Closes-Bug: #1917645 Change-Id: Id4ccafc95314c86ae918336e42cca64a6acd4d94 ** Changed in: oslo.messaging Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1917645 Title: Nova can't create instances if RabbitMQ notification cluster is down Status in OpenStack Compute (nova): Confirmed Status in oslo.messaging: Fix Released Bug description: We use independent RabbitMQ clusters for each OpenStack project, Nova Cells and also for notifications. Recently, I noticed in our test infrastructure that if the RabbitMQ cluster for notifications has an outage, Nova can't create new instances. Possibly other operations will also hang. Not being able to send a notification/connect to the RabbitMQ cluster shouldn't stop new instances to be created. (If this is actually an use-case for some deployments, the operator should have the possibility to configure it.) Tested against the master branch. If the notification RabbitMQ is stooped, when creating an instance, nova-scheduler is stuck with: ``` Mar 01 21:16:28 devstack nova-scheduler[18384]: DEBUG nova.scheduler.request_filter [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Request filter 'accelerators_filter' took 0.0 seconds {{(pid=18384) wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}} Mar 01 21:16:32 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:35 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 4.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:42 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 6.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:51 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 8.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:17:02 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 10.0 seconds): OSError: [Errno 113] EHOSTUNREACH (...) ``` Because the notification RabbitMQ cluster is down, Nova gets stuck in: https://github.com/openstack/nova/blob/5b66caab870558b8a7f7b662c01587b959ad3d41/nova/scheduler/filter_scheduler.py#L85 because oslo messaging never gives up: https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_drivers/impl_rabbit.py#L736 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1917645/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

