It turns out a number of people are hitting -
https://bugs.launchpad.net/oslo.messaging/+bug/1436769 (I tripped over
it this morning as well).

Under a currently unknown set of conditions you can get into a heartbeat
loop with oslo.messaging 1.8.1 which basically shuts down the RPC bus as
every service is heartbeat looping 100% of the time.

I had py-amqp < 1.4.0, and 1.4.0 seems to have a bug fix for one of the
issues here.

However, after chatting with silent in IRC this morning it sounded like
the safer option might be to disable the rabbit heartbeat by default,
because this sort of heartbeat storm can kill the entire OpenStack
environment, and is not really clear how you recover from it.

All of which is recorded in the bug.

Proposed actions are to do both of:

- oslo.messaging release with heartbeats off by default (simulates 1.8.0
behavior before the heartbeat code landed)
- oslo.messaging requiring py-amqp >= 1.4.0, so that if you enable the
heartbeating, at least you are protected from the known bug

This would still let operators use the feature, we'd consider it
experimental, until we're sure there aren't any other dragons hidden in
there. I think the goal would be to make it default on again for Marmoset.

        -Sean

-- 
Sean Dague
http://dague.net

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to