** Tags added: sts ** Description changed:
+ [Impact] + + + Affected + Bionic + Not affected + Focal + + [Test Case] + TBD + + + [Where problems could occur] + TBD + + [Others] + + + // original description + Input: - - OpenStack Pike cluster with ~500 nodes - - DVR enabled in neutron - - Lots of messages + - OpenStack Pike cluster with ~500 nodes + - DVR enabled in neutron + - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1789177 Title: RabbitMQ fails to synchronize exchanges under high load To manage notifications about this bug go to: https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs