We dont support "split brain" over MDS/TIPC in OpenSAF. So no I suggest that the cluster not gå on and keep running since the consistency/integrity of the cluster can no longer be guaranteed. Instead of trying to "support" the impossible we should cluster fail fast here and give as clear an error indication as possible as to why. This so the deployer can fix the cause of the flicker (when it is not a test).
/AndersBj praveen malviya wrote: > On 20-Aug-13 2:58 PM, Hans Feldt wrote: > >> On 08/19/2013 04:33 PM, Suryanarayana Garlapati wrote: >> >>> In my perspective, this should be done at the Active Only and standby >>> should just drop it. Atleast we will have the >>> standby which gets promoted to active and continues to provide the service. >>> We should not be performing a cluster reset. >>> >>> Thoughts?. >>> >> The AMF code in question (event handling for MDS callback QUIESCED) can only >> be invoked in AMFD state QUIESCED AND >> SWITCH-OVER state. See >> http://devel.opensaf.org/~hafe/AMF/ControllerSwitchover.png >> >> It is not designed to be invoked in STANDBY state. Besides STANDBY->QUIESCED >> is not a valid transition. >> > Applied the patch and simulated TIPC flickering by executing command " > tipc-config -bd=eth:eth0; tipc-config -be=eth:eth0". > Both AVD become active and then get avd_mds_qsd_role_evh(). Finally > cluster is reset. > I think one AVD should remain active to keep the cluster up and running. > > Thanks, > Praveen > >> /Hans >> >> >>> On Friday 16 August 2013 07:03 PM, Hans Feldt wrote: >>> >>>> osaf/services/saf/avsv/avd/avd_role.cc | 9 +++++++++ >>>> 1 files changed, 9 insertions(+), 0 deletions(-) >>>> >>>> >>>> MDS can force an active vdest into quiesced state (see docs). Reasons for >>>> this >>>> happening is unclear. The logic avd_mds_qsd_role_evh() can only handle this >>>> event in context of a controller switch-over. Otherwise it could e.g. hang >>>> in >>>> using IMM which eventually times out and calls abort() generating a core >>>> dump. >>>> >>>> Instead exit the amfd process when this event happens in non controller >>>> switch-over state. amfnd will failfast reboot the node when it detects >>>> this. >>>> >>>> diff --git a/osaf/services/saf/avsv/avd/avd_role.cc >>>> b/osaf/services/saf/avsv/avd/avd_role.cc >>>> --- a/osaf/services/saf/avsv/avd/avd_role.cc >>>> +++ b/osaf/services/saf/avsv/avd/avd_role.cc >>>> @@ -569,6 +569,15 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb, >>>> TRACE_ENTER(); >>>> + /* Only accept this event in controller switch-over state, in other >>>> + * states it is invalid and indicates severe cluster problems. >>>> + */ >>>> + if (cb->swap_switch == SA_FALSE) { >>>> + LOG_NO("%s: MDS unexpectedly changed role to QUIESCED", >>>> __FUNCTION__); >>>> + LOG_CR("Controller split brain detected, exiting"); >>>> + _exit(EXIT_FAILURE); // should never get here... >>>> + } >>>> + >>>> /* Give up IMM OI implementer role */ >>>> if ((rc = immutil_saImmOiImplementerClear(cb->immOiHandle)) != >>>> SA_AIS_OK) { >>>> LOG_ER("FAILOVER Active --> Quiesced FAILED, ImplementerClear >>>> failed %u", rc); >>>> >>>> ------------------------------------------------------------------------------ >>>> Get 100% visibility into Java/.NET code with AppDynamics Lite! >>>> It's a free troubleshooting tool designed for production. >>>> Get down to code-level detail for bottlenecks, with <2% overhead. >>>> Download for free and get started troubleshooting in minutes. >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Opensaf-devel mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>> >>> >> ------------------------------------------------------------------------------ >> Introducing Performance Central, a new site from SourceForge and >> AppDynamics. Performance Central is your source for news, insights, >> analysis and resources for efficient Application Performance Management. >> Visit us today! >> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >> _______________________________________________ >> Opensaf-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >> > > > ------------------------------------------------------------------------------ > Introducing Performance Central, a new site from SourceForge and > AppDynamics. Performance Central is your source for news, insights, > analysis and resources for efficient Application Performance Management. > Visit us today! > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk > _______________________________________________ > Opensaf-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
