Hi Praveen, I think the scope of the patch should be to fix Test case: - an 'active node hung at shutdown' - a new active controller comes up Expected result: - The AMF on the hung node should force itself out of the cluster(by an exit or forced reboot, etc) Thanks, Mathi.
> -----Original Message----- > From: praveen malviya > Sent: Tuesday, August 20, 2013 5:01 PM > To: Hans Feldt > Cc: [email protected] > Subject: Re: [devel] [PATCH 1 of 1] amfd: exit at MDS quiesced event in > wrong state [#516] > > > On 20-Aug-13 2:58 PM, Hans Feldt wrote: > > On 08/19/2013 04:33 PM, Suryanarayana Garlapati wrote: > >> In my perspective, this should be done at the Active Only and standby > >> should just drop it. Atleast we will have the standby which gets promoted > to active and continues to provide the service. We should not be performing > a cluster reset. > >> > >> Thoughts?. > > The AMF code in question (event handling for MDS callback QUIESCED) > > can only be invoked in AMFD state QUIESCED AND SWITCH-OVER state. > See > > http://devel.opensaf.org/~hafe/AMF/ControllerSwitchover.png > > > > It is not designed to be invoked in STANDBY state. Besides STANDBY- > >QUIESCED is not a valid transition. > Applied the patch and simulated TIPC flickering by executing command " > tipc-config -bd=eth:eth0; tipc-config -be=eth:eth0". > Both AVD become active and then get avd_mds_qsd_role_evh(). Finally > cluster is reset. > I think one AVD should remain active to keep the cluster up and running. > > Thanks, > Praveen > > /Hans > > > >> > >> On Friday 16 August 2013 07:03 PM, Hans Feldt wrote: > >>> osaf/services/saf/avsv/avd/avd_role.cc | 9 +++++++++ > >>> 1 files changed, 9 insertions(+), 0 deletions(-) > >>> > >>> > >>> MDS can force an active vdest into quiesced state (see docs). > >>> Reasons for this happening is unclear. The logic > >>> avd_mds_qsd_role_evh() can only handle this event in context of a > >>> controller switch-over. Otherwise it could e.g. hang in using IMM which > eventually times out and calls abort() generating a core dump. > >>> > >>> Instead exit the amfd process when this event happens in non > >>> controller switch-over state. amfnd will failfast reboot the node when it > detects this. > >>> > >>> diff --git a/osaf/services/saf/avsv/avd/avd_role.cc > >>> b/osaf/services/saf/avsv/avd/avd_role.cc > >>> --- a/osaf/services/saf/avsv/avd/avd_role.cc > >>> +++ b/osaf/services/saf/avsv/avd/avd_role.cc > >>> @@ -569,6 +569,15 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb, > >>> TRACE_ENTER(); > >>> + /* Only accept this event in controller switch-over state, in other > >>> + * states it is invalid and indicates severe cluster problems. > >>> + */ > >>> + if (cb->swap_switch == SA_FALSE) { > >>> + LOG_NO("%s: MDS unexpectedly changed role to QUIESCED", > __FUNCTION__); > >>> + LOG_CR("Controller split brain detected, exiting"); > >>> + _exit(EXIT_FAILURE); // should never get here... > >>> + } > >>> + > >>> /* Give up IMM OI implementer role */ > >>> if ((rc = immutil_saImmOiImplementerClear(cb->immOiHandle)) != > SA_AIS_OK) { > >>> LOG_ER("FAILOVER Active --> Quiesced FAILED, > >>> ImplementerClear failed %u", rc); > >>> > >>> -------------------------------------------------------------------- > >>> ---------- Get 100% visibility into Java/.NET code with AppDynamics > >>> Lite! > >>> It's a free troubleshooting tool designed for production. > >>> Get down to code-level detail for bottlenecks, with <2% overhead. > >>> Download for free and get started troubleshooting in minutes. > >>> > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg > >>> .clktrk _______________________________________________ > >>> Opensaf-devel mailing list > >>> [email protected] > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > >> > >> > > ---------------------------------------------------------------------- > > -------- Introducing Performance Central, a new site from SourceForge > > and AppDynamics. Performance Central is your source for news, > > insights, analysis and resources for efficient Application Performance > > Management. > > Visit us today! > > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.c > > lktrk _______________________________________________ > > Opensaf-devel mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > > ------------------------------------------------------------------------------ > Introducing Performance Central, a new site from SourceForge and > AppDynamics. Performance Central is your source for news, insights, analysis > and resources for efficient Application Performance Management. > Visit us today! > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clk > trk > _______________________________________________ > Opensaf-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
