Hey folks,
Apologies if any of this has been discussed on the list already.  I've tried to 
check everything ahead of time.

We recently had two bugs combine to hit us in some of our regions as we rolled 
out some new code.  The result of them was rabbit servers not accept 
connections and/or crashing with OOM errors.   I wanted to pass them along as I 
know from the Large Deployments Team, there are more and more folks using cells 
to manage larger regions.   Here are the specific bugs:

Cells doesn't properly track RabbitMQ connection pools:
https://review.openstack.org/#/c/152667/

Oslo messaging bgt in version 1.5.1 that leaks channels :
Upstream bug: https://bugs.launchpad.net/oslo.messaging/+bug/1406629
Upstream fix: 
https://review.openstack.org/#/c/145232/9/oslo_messaging/_drivers/impl_rabbit.py


We are deploying patches for both in our problem areas now and the rest of the 
fleet in the immediate future, but this gave us quite a run for our money last 
week.  I wanted to share in case anyone else is chasing these issues and/or 
might after an upcoming code update.

Thanks!
Matt
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to