Your test simulated controller split brain and you want the cluster to stay up?

Sounds strange to me...

/Hans

> -----Original Message-----
> From: praveen malviya [mailto:[email protected]]
> Sent: den 20 augusti 2013 13:31
> To: Hans Feldt
> Cc: Suryanarayana Garlapati; [email protected]
> Subject: Re: [devel] [PATCH 1 of 1] amfd: exit at MDS quiesced event in wrong 
> state [#516]
> 
> 
> On 20-Aug-13 2:58 PM, Hans Feldt wrote:
> > On 08/19/2013 04:33 PM, Suryanarayana Garlapati wrote:
> >> In my perspective, this should be done at the Active Only and standby 
> >> should just drop it. Atleast we will have the
> >> standby which gets promoted to active and continues to provide the 
> >> service. We should not be performing a cluster reset.
> >>
> >> Thoughts?.
> > The AMF code in question (event handling for MDS callback QUIESCED) can 
> > only be invoked in AMFD state QUIESCED AND
> > SWITCH-OVER state. See 
> > http://devel.opensaf.org/~hafe/AMF/ControllerSwitchover.png
> >
> > It is not designed to be invoked in STANDBY state. Besides 
> > STANDBY->QUIESCED is not a valid transition.
> Applied the patch and simulated TIPC flickering by executing command "
> tipc-config -bd=eth:eth0;  tipc-config -be=eth:eth0".
> Both AVD become active and then get  avd_mds_qsd_role_evh(). Finally
> cluster is reset.
> I think one AVD should remain active to keep the cluster up and running.
> 
> Thanks,
> Praveen
> > /Hans
> >
> >>
> >> On Friday 16 August 2013 07:03 PM, Hans Feldt wrote:
> >>>    osaf/services/saf/avsv/avd/avd_role.cc |  9 +++++++++
> >>>    1 files changed, 9 insertions(+), 0 deletions(-)
> >>>
> >>>
> >>> MDS can force an active vdest into quiesced state (see docs). Reasons for 
> >>> this
> >>> happening is unclear. The logic avd_mds_qsd_role_evh() can only handle 
> >>> this
> >>> event in context of a controller switch-over. Otherwise it could e.g. 
> >>> hang in
> >>> using IMM which eventually times out and calls abort() generating a core 
> >>> dump.
> >>>
> >>> Instead exit the amfd process when this event happens in non controller
> >>> switch-over state. amfnd will failfast reboot the node when it detects 
> >>> this.
> >>>
> >>> diff --git a/osaf/services/saf/avsv/avd/avd_role.cc 
> >>> b/osaf/services/saf/avsv/avd/avd_role.cc
> >>> --- a/osaf/services/saf/avsv/avd/avd_role.cc
> >>> +++ b/osaf/services/saf/avsv/avd/avd_role.cc
> >>> @@ -569,6 +569,15 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb,
> >>>        TRACE_ENTER();
> >>> +    /* Only accept this event in controller switch-over state, in other
> >>> +     * states it is invalid and indicates severe cluster problems.
> >>> +     */
> >>> +    if (cb->swap_switch == SA_FALSE) {
> >>> +        LOG_NO("%s: MDS unexpectedly changed role to QUIESCED", 
> >>> __FUNCTION__);
> >>> +        LOG_CR("Controller split brain detected, exiting");
> >>> +        _exit(EXIT_FAILURE); // should never get here...
> >>> +    }
> >>> +
> >>>        /* Give up IMM OI implementer role */
> >>>        if ((rc = immutil_saImmOiImplementerClear(cb->immOiHandle)) != 
> >>> SA_AIS_OK) {
> >>>            LOG_ER("FAILOVER Active --> Quiesced FAILED, ImplementerClear 
> >>> failed %u", rc);
> >>>
> >>> ------------------------------------------------------------------------------
> >>> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> >>> It's a free troubleshooting tool designed for production.
> >>> Get down to code-level detail for bottlenecks, with <2% overhead.
> >>> Download for free and get started troubleshooting in minutes.
> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> >>> _______________________________________________
> >>> Opensaf-devel mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> >>
> >>
> > ------------------------------------------------------------------------------
> > Introducing Performance Central, a new site from SourceForge and
> > AppDynamics. Performance Central is your source for news, insights,
> > analysis and resources for efficient Application Performance Management.
> > Visit us today!
> > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Opensaf-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-devel


------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to