Re: [openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Dmitry Mescheryakov Wed, 02 Dec 2015 04:09:44 -0800

2015-12-02 12:48 GMT+03:00 Sergii Golovatiuk <sgolovat...@mirantis.com>:


> Hi,
>
>
> On Tue, Dec 1, 2015 at 11:34 PM, Peter Lemenkov <lemen...@gmail.com>
> wrote:
>
>> Hello All!
>>
>> Well, side-effects (or any other effects) are quite obvious and
>> predictable - this will decrease availability of RPC queues a bit.
>> That's for sure.
>>
>
> Imagine the case when user creates VM instance, and some nova messages are
> lost. I am not sure we want half-created instances. Who is going to clean
> up them? Since we do not have results of destructive tests, I vote -2 for
> FFE for this feature.
>

Sergii, actually messaging layer can not provide any guarantee that it will
not happen even if all messages are preserved. Assume the following
scenario:

 * nova-scheduler (or conductor?) sends request to nova-compute to spawn a
VM
 * nova-compute receives the message and spawned the VM
 * due to some reason (rabbitmq unavailable, nova-compute lagged)
nova-compute did not respond within timeout (1 minute, I think)
 * nova-scheduler does not get response within 1 minute and marks the VM
with Error status.

In that scenario no message was lost, but still we have a VM half spawned
and it is up to Nova to handle the error and do the cleanup in that case.

Such issue already happens here and there when something glitches. For
instance our favorite MessagingTimeout exception could be caused by such
scenario. Specifically, in that example when nova-scheduler times out
waiting for reply, it will throw exactly that exception.

My point is simple - lets increase our architecture scalability by 2-3
times by _maybe_ causing more errors for users during failover. The
failover time itself should not get worse (to be tested by me) and errors
should be correctly handler by services anyway.


>> However, Dmitry's guess is that the overall messaging backplane
>> stability increase (RabitMQ won't fail too often in some cases) would
>> compensate for this change. This issue is very much real - speaking of
>> me I've seen an awful cluster's performance degradation when a failing
>> RabbitMQ node was killed by some watchdog application (or even worse
>> wasn't killed at all). One of these issues was quite recently, and I'd
>> love to see them less frequently.
>>
>> That said I'm uncertain about the stability impact of this change, yet
>> I see a reasoning worth discussing behind it.
>>
>> 2015-12-01 20:53 GMT+01:00 Sergii Golovatiuk <sgolovat...@mirantis.com>:
>> > Hi,
>> >
>> > -1 for FFE for disabling HA for RPC queue as we do not know all side
>> effects
>> > in HA scenarios.
>> >
>> > On Tue, Dec 1, 2015 at 7:34 PM, Dmitry Mescheryakov
>> > <dmescherya...@mirantis.com> wrote:
>> >>
>> >> Folks,
>> >>
>> >> I would like to request feature freeze exception for disabling HA for
>> RPC
>> >> queues in RabbitMQ [1].
>> >>
>> >> As I already wrote in another thread [2], I've conducted tests which
>> >> clearly show benefit we will get from that change. The change itself
>> is a
>> >> very small patch [3]. The only thing which I want to do before
>> proposing to
>> >> merge this change is to conduct destructive tests against it in order
>> to
>> >> make sure that we do not have a regression here. That should take just
>> >> several days, so if there will be no other objections, we will be able
>> to
>> >> merge the change in a week or two timeframe.
>> >>
>> >> Thanks,
>> >>
>> >> Dmitry
>> >>
>> >> [1] https://review.openstack.org/247517
>> >> [2]
>> >>
>> http://lists.openstack.org/pipermail/openstack-dev/2015-December/081006.html
>> >> [3] https://review.openstack.org/249180
>> >>
>> >>
>> __________________________________________________________________________
>> >> OpenStack Development Mailing List (not for usage questions)
>> >> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>
>> >
>> >
>> >
>> __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>>
>>
>> --
>> With best regards, Peter Lemenkov.
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Reply via email to