It turns out a number of people are hitting - https://bugs.launchpad.net/oslo.messaging/+bug/1436769 (I tripped over it this morning as well).
Under a currently unknown set of conditions you can get into a heartbeat loop with oslo.messaging 1.8.1 which basically shuts down the RPC bus as every service is heartbeat looping 100% of the time. I had py-amqp < 1.4.0, and 1.4.0 seems to have a bug fix for one of the issues here. However, after chatting with silent in IRC this morning it sounded like the safer option might be to disable the rabbit heartbeat by default, because this sort of heartbeat storm can kill the entire OpenStack environment, and is not really clear how you recover from it. All of which is recorded in the bug. Proposed actions are to do both of: - oslo.messaging release with heartbeats off by default (simulates 1.8.0 behavior before the heartbeat code landed) - oslo.messaging requiring py-amqp >= 1.4.0, so that if you enable the heartbeating, at least you are protected from the known bug This would still let operators use the feature, we'd consider it experimental, until we're sure there aren't any other dragons hidden in there. I think the goal would be to make it default on again for Marmoset. -Sean -- Sean Dague http://dague.net __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev