Re: [openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Konstantin Kalin Wed, 02 Dec 2015 06:56:27 -0800

I would add on top of that Dmirty said that HA queues also increases 
probability to have messages duplications under certain scenarios (besides of 
that they are ~10x slower). Would Openstack services tolerate if RPC request 
will be duplicated? What I've already learned - No. Also if 
cluster_partition_handling=autoheal (what we currently have) the messages may 
be lost as well during the failover scenarios like non-HA queues. Honestly I 
believe there is no difference between HA queues and non HA-queues in RPC layer 
fail-tolerance in the way how we use RabbitMQ.


Thank you,
Konstantin. 

> On Dec 2, 2015, at 4:05 AM, Dmitry Mescheryakov <dmescherya...@mirantis.com> 
> wrote:
> 
> 
> 
> 2015-12-02 12:48 GMT+03:00 Sergii Golovatiuk <sgolovat...@mirantis.com 
> <mailto:sgolovat...@mirantis.com>>:
> Hi,
> 
> 
> On Tue, Dec 1, 2015 at 11:34 PM, Peter Lemenkov <lemen...@gmail.com 
> <mailto:lemen...@gmail.com>> wrote:
> Hello All!
> 
> Well, side-effects (or any other effects) are quite obvious and
> predictable - this will decrease availability of RPC queues a bit.
> That's for sure.
> 
> Imagine the case when user creates VM instance, and some nova messages are 
> lost. I am not sure we want half-created instances. Who is going to clean up 
> them? Since we do not have results of destructive tests, I vote -2 for FFE 
> for this feature.
> 
> Sergii, actually messaging layer can not provide any guarantee that it will 
> not happen even if all messages are preserved. Assume the following scenario:
> 
>  * nova-scheduler (or conductor?) sends request to nova-compute to spawn a VM
>  * nova-compute receives the message and spawned the VM
>  * due to some reason (rabbitmq unavailable, nova-compute lagged) 
> nova-compute did not respond within timeout (1 minute, I think)
>  * nova-scheduler does not get response within 1 minute and marks the VM with 
> Error status.
> 
> In that scenario no message was lost, but still we have a VM half spawned and 
> it is up to Nova to handle the error and do the cleanup in that case.
> 
> Such issue already happens here and there when something glitches. For 
> instance our favorite MessagingTimeout exception could be caused by such 
> scenario. Specifically, in that example when nova-scheduler times out waiting 
> for reply, it will throw exactly that exception. 
> 
> My point is simple - lets increase our architecture scalability by 2-3 times 
> by _maybe_ causing more errors for users during failover. The failover time 
> itself should not get worse (to be tested by me) and errors should be 
> correctly handler by services anyway.
> 
> 
> However, Dmitry's guess is that the overall messaging backplane
> stability increase (RabitMQ won't fail too often in some cases) would
> compensate for this change. This issue is very much real - speaking of
> me I've seen an awful cluster's performance degradation when a failing
> RabbitMQ node was killed by some watchdog application (or even worse
> wasn't killed at all). One of these issues was quite recently, and I'd
> love to see them less frequently.
> 
> That said I'm uncertain about the stability impact of this change, yet
> I see a reasoning worth discussing behind it.
> 
> 2015-12-01 20:53 GMT+01:00 Sergii Golovatiuk <sgolovat...@mirantis.com 
> <mailto:sgolovat...@mirantis.com>>:
> > Hi,
> >
> > -1 for FFE for disabling HA for RPC queue as we do not know all side effects
> > in HA scenarios.
> >
> > On Tue, Dec 1, 2015 at 7:34 PM, Dmitry Mescheryakov
> > <dmescherya...@mirantis.com <mailto:dmescherya...@mirantis.com>> wrote:
> >>
> >> Folks,
> >>
> >> I would like to request feature freeze exception for disabling HA for RPC
> >> queues in RabbitMQ [1].
> >>
> >> As I already wrote in another thread [2], I've conducted tests which
> >> clearly show benefit we will get from that change. The change itself is a
> >> very small patch [3]. The only thing which I want to do before proposing to
> >> merge this change is to conduct destructive tests against it in order to
> >> make sure that we do not have a regression here. That should take just
> >> several days, so if there will be no other objections, we will be able to
> >> merge the change in a week or two timeframe.
> >>
> >> Thanks,
> >>
> >> Dmitry
> >>
> >> [1] https://review.openstack.org/247517 
> >> <https://review.openstack.org/247517>
> >> [2]
> >> http://lists.openstack.org/pipermail/openstack-dev/2015-December/081006.html
> >>  
> >> <http://lists.openstack.org/pipermail/openstack-dev/2015-December/081006.html>
> >> [3] https://review.openstack.org/249180 
> >> <https://review.openstack.org/249180>
> >>
> >> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
> >> <http://openstack-dev-requ...@lists.openstack.org/?subject:unsubscribe>
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> >> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> >>
> >
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
> > <http://openstack-dev-requ...@lists.openstack.org/?subject:unsubscribe>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> > <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> >
> 
> 
> 
> --
> With best regards, Peter Lemenkov.
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
> <http://openstack-dev-requ...@lists.openstack.org/?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
> <http://openstack-dev-requ...@lists.openstack.org/?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Fuel][FFE] Disabling HA for RPC queues in RabbitMQ

Reply via email to