Your test simulated controller split brain and you want the cluster to stay up?
Sounds strange to me... /Hans > -----Original Message----- > From: praveen malviya [mailto:[email protected]] > Sent: den 20 augusti 2013 13:31 > To: Hans Feldt > Cc: Suryanarayana Garlapati; [email protected] > Subject: Re: [devel] [PATCH 1 of 1] amfd: exit at MDS quiesced event in wrong > state [#516] > > > On 20-Aug-13 2:58 PM, Hans Feldt wrote: > > On 08/19/2013 04:33 PM, Suryanarayana Garlapati wrote: > >> In my perspective, this should be done at the Active Only and standby > >> should just drop it. Atleast we will have the > >> standby which gets promoted to active and continues to provide the > >> service. We should not be performing a cluster reset. > >> > >> Thoughts?. > > The AMF code in question (event handling for MDS callback QUIESCED) can > > only be invoked in AMFD state QUIESCED AND > > SWITCH-OVER state. See > > http://devel.opensaf.org/~hafe/AMF/ControllerSwitchover.png > > > > It is not designed to be invoked in STANDBY state. Besides > > STANDBY->QUIESCED is not a valid transition. > Applied the patch and simulated TIPC flickering by executing command " > tipc-config -bd=eth:eth0; tipc-config -be=eth:eth0". > Both AVD become active and then get avd_mds_qsd_role_evh(). Finally > cluster is reset. > I think one AVD should remain active to keep the cluster up and running. > > Thanks, > Praveen > > /Hans > > > >> > >> On Friday 16 August 2013 07:03 PM, Hans Feldt wrote: > >>> osaf/services/saf/avsv/avd/avd_role.cc | 9 +++++++++ > >>> 1 files changed, 9 insertions(+), 0 deletions(-) > >>> > >>> > >>> MDS can force an active vdest into quiesced state (see docs). Reasons for > >>> this > >>> happening is unclear. The logic avd_mds_qsd_role_evh() can only handle > >>> this > >>> event in context of a controller switch-over. Otherwise it could e.g. > >>> hang in > >>> using IMM which eventually times out and calls abort() generating a core > >>> dump. > >>> > >>> Instead exit the amfd process when this event happens in non controller > >>> switch-over state. amfnd will failfast reboot the node when it detects > >>> this. > >>> > >>> diff --git a/osaf/services/saf/avsv/avd/avd_role.cc > >>> b/osaf/services/saf/avsv/avd/avd_role.cc > >>> --- a/osaf/services/saf/avsv/avd/avd_role.cc > >>> +++ b/osaf/services/saf/avsv/avd/avd_role.cc > >>> @@ -569,6 +569,15 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb, > >>> TRACE_ENTER(); > >>> + /* Only accept this event in controller switch-over state, in other > >>> + * states it is invalid and indicates severe cluster problems. > >>> + */ > >>> + if (cb->swap_switch == SA_FALSE) { > >>> + LOG_NO("%s: MDS unexpectedly changed role to QUIESCED", > >>> __FUNCTION__); > >>> + LOG_CR("Controller split brain detected, exiting"); > >>> + _exit(EXIT_FAILURE); // should never get here... > >>> + } > >>> + > >>> /* Give up IMM OI implementer role */ > >>> if ((rc = immutil_saImmOiImplementerClear(cb->immOiHandle)) != > >>> SA_AIS_OK) { > >>> LOG_ER("FAILOVER Active --> Quiesced FAILED, ImplementerClear > >>> failed %u", rc); > >>> > >>> ------------------------------------------------------------------------------ > >>> Get 100% visibility into Java/.NET code with AppDynamics Lite! > >>> It's a free troubleshooting tool designed for production. > >>> Get down to code-level detail for bottlenecks, with <2% overhead. > >>> Download for free and get started troubleshooting in minutes. > >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > >>> _______________________________________________ > >>> Opensaf-devel mailing list > >>> [email protected] > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > >> > >> > > ------------------------------------------------------------------------------ > > Introducing Performance Central, a new site from SourceForge and > > AppDynamics. Performance Central is your source for news, insights, > > analysis and resources for efficient Application Performance Management. > > Visit us today! > > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk > > _______________________________________________ > > Opensaf-devel mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
