Re: [openstack-dev] [Openstack-operators] [nova] Rabbit-mq 3.4 crashing (anyone else seen this?)

Joshua Harlow Tue, 05 Jul 2016 10:46:34 -0700

Ah, those sets of command sound pretty nice to run periodically,

Sounds like a useful script that could be placed in the ops tools repo(I forget where this repo exists at, but pretty sure it does exist?).

Some other oddness though is that this issue seems to go away when wedon't run cross-release; do you see that also?

Another hypothesis was that the following fix may be triggering part ofthis @ https://bugs.launchpad.net/oslo.messaging/+bug/1495568

So that if we have some queues being set up as auto-delete and somebeign set up with expiry that perhaps the combination of these causesmore work (and therefore eventually it falls behind and falls over) forthe management database.


Matt Fischer wrote:

Yes! This happens often but I'd not call it a crash, just the mgmt db
gets behind then eats all the memory. We've started monitoring it and
have runbooks on how to bounce just the mgmt db. Here are my notes on that:

restart rabbitmq mgmt server - this seems to clear the memory usage.

rabbitmqctl eval 'application:stop(rabbitmq_management).'
rabbitmqctl eval 'application:start(rabbitmq_management).'

run GC on rabbit_mgmt_db:
rabbitmqctl eval
'(erlang:garbage_collect(global:whereis_name(rabbit_mgmt_db)))'

status of rabbit_mgmt_db:
rabbitmqctl eval 'sys:get_status(global:whereis_name(rabbit_mgmt_db)).'

Rabbitmq mgmt DB how much memory is used:
/usr/sbin/rabbitmqctl status | grep mgmt_db

Unfortunately I didn't see that an upgrade would fix for sure and any
settings changes to reduce the number of monitored events also require a
restart of the cluster. The other issue with an upgrade for us is the
ancient version of erlang shipped with trusty. When we upgrade to Xenial
we'll upgrade erlang and rabbit and hope it goes away. I'll also
probably tweak the settings on retention of events then too.

Also for the record the GC doesn't seem to help at all.

On Jul 5, 2016 11:05 AM, "Joshua Harlow" <harlo...@fastmail.com
<mailto:harlo...@fastmail.com>> wrote:

    Hi ops and dev-folks,

    We over at godaddy (running rabbitmq with openstack) have been
    hitting a issue that has been causing the `rabbit_mgmt_db` consuming
    nearly all the processes memory (after a given amount of time),

    We've been thinking that this bug (or bugs?) may have existed for a
    while and our dual-version-path (where we upgrade the control plane
    and then slowly/eventually upgrade the compute nodes to the same
    version) has somehow triggered this memory leaking bug/issue since
    it has happened most prominently on our cloud which was running
    nova-compute at kilo and the other services at liberty (thus using
    the versioned objects code path more frequently due to needing
    translations of objects).

    The rabbit we are running is 3.4.0 on CentOS Linux release 7.2.1511
    with kernel 3.10.0-327.4.4.el7.x86_64 (do note that upgrading to
    3.6.2 seems to make the issue go away),

    # rpm -qa | grep rabbit

    rabbitmq-server-3.4.0-1.noarch

    The logs that seem relevant:

    ```
    **********************************************************
    *** Publishers will be blocked until this alarm clears ***
    **********************************************************

    =INFO REPORT==== 1-Jul-2016::16:37:46 ===
    accepting AMQP connection <0.23638.342> (127.0.0.1:51932
    <http://127.0.0.1:51932> -> 127.0.0.1:5671 <http://127.0.0.1:5671>)

    =INFO REPORT==== 1-Jul-2016::16:37:47 ===
    vm_memory_high_watermark clear. Memory used:29910180640
    allowed:47126781542
    ```

    This happens quite often, the crashes have been affecting our cloud
    over the weekend (which made some dev/ops not so happy especially
    due to the july 4th mini-vacation),

    Looking to see if anyone else has seen anything similar?

    For those interested this is the upstream bug/mail that I'm also
    seeing about getting confirmation from the upstream users/devs
    (which also has erlang crash dumps attached/linked),

    https://groups.google.com/forum/#!topic/rabbitmq-users/FeBK7iXUcLg

    Thanks,

    -Josh

    _______________________________________________
    OpenStack-operators mailing list
    openstack-operat...@lists.openstack.org
    <mailto:openstack-operat...@lists.openstack.org>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [nova] Rabbit-mq 3.4 crashing (anyone else seen this?)

Reply via email to