Hi Praveen,

I think the scope of the patch should be to fix
Test case:
- an 'active node hung at shutdown'
- a new active controller comes up
Expected result:
- The AMF on the hung node should force itself out of the cluster(by an exit or 
forced reboot, etc)
Thanks,
Mathi.

> -----Original Message-----
> From: praveen malviya
> Sent: Tuesday, August 20, 2013 5:01 PM
> To: Hans Feldt
> Cc: [email protected]
> Subject: Re: [devel] [PATCH 1 of 1] amfd: exit at MDS quiesced event in
> wrong state [#516]
> 
> 
> On 20-Aug-13 2:58 PM, Hans Feldt wrote:
> > On 08/19/2013 04:33 PM, Suryanarayana Garlapati wrote:
> >> In my perspective, this should be done at the Active Only and standby
> >> should just drop it. Atleast we will have the standby which gets promoted
> to active and continues to provide the service. We should not be performing
> a cluster reset.
> >>
> >> Thoughts?.
> > The AMF code in question (event handling for MDS callback QUIESCED)
> > can only be invoked in AMFD state QUIESCED AND SWITCH-OVER state.
> See
> > http://devel.opensaf.org/~hafe/AMF/ControllerSwitchover.png
> >
> > It is not designed to be invoked in STANDBY state. Besides STANDBY-
> >QUIESCED is not a valid transition.
> Applied the patch and simulated TIPC flickering by executing command "
> tipc-config -bd=eth:eth0;  tipc-config -be=eth:eth0".
> Both AVD become active and then get  avd_mds_qsd_role_evh(). Finally
> cluster is reset.
> I think one AVD should remain active to keep the cluster up and running.
> 
> Thanks,
> Praveen
> > /Hans
> >
> >>
> >> On Friday 16 August 2013 07:03 PM, Hans Feldt wrote:
> >>>    osaf/services/saf/avsv/avd/avd_role.cc |  9 +++++++++
> >>>    1 files changed, 9 insertions(+), 0 deletions(-)
> >>>
> >>>
> >>> MDS can force an active vdest into quiesced state (see docs).
> >>> Reasons for this happening is unclear. The logic
> >>> avd_mds_qsd_role_evh() can only handle this event in context of a
> >>> controller switch-over. Otherwise it could e.g. hang in using IMM which
> eventually times out and calls abort() generating a core dump.
> >>>
> >>> Instead exit the amfd process when this event happens in non
> >>> controller switch-over state. amfnd will failfast reboot the node when it
> detects this.
> >>>
> >>> diff --git a/osaf/services/saf/avsv/avd/avd_role.cc
> >>> b/osaf/services/saf/avsv/avd/avd_role.cc
> >>> --- a/osaf/services/saf/avsv/avd/avd_role.cc
> >>> +++ b/osaf/services/saf/avsv/avd/avd_role.cc
> >>> @@ -569,6 +569,15 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb,
> >>>        TRACE_ENTER();
> >>> +    /* Only accept this event in controller switch-over state, in other
> >>> +     * states it is invalid and indicates severe cluster problems.
> >>> +     */
> >>> +    if (cb->swap_switch == SA_FALSE) {
> >>> +        LOG_NO("%s: MDS unexpectedly changed role to QUIESCED",
> __FUNCTION__);
> >>> +        LOG_CR("Controller split brain detected, exiting");
> >>> +        _exit(EXIT_FAILURE); // should never get here...
> >>> +    }
> >>> +
> >>>        /* Give up IMM OI implementer role */
> >>>        if ((rc = immutil_saImmOiImplementerClear(cb->immOiHandle)) !=
> SA_AIS_OK) {
> >>>            LOG_ER("FAILOVER Active --> Quiesced FAILED,
> >>> ImplementerClear failed %u", rc);
> >>>
> >>> --------------------------------------------------------------------
> >>> ---------- Get 100% visibility into Java/.NET code with AppDynamics
> >>> Lite!
> >>> It's a free troubleshooting tool designed for production.
> >>> Get down to code-level detail for bottlenecks, with <2% overhead.
> >>> Download for free and get started troubleshooting in minutes.
> >>>
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg
> >>> .clktrk _______________________________________________
> >>> Opensaf-devel mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> >>
> >>
> > ----------------------------------------------------------------------
> > -------- Introducing Performance Central, a new site from SourceForge
> > and AppDynamics. Performance Central is your source for news,
> > insights, analysis and resources for efficient Application Performance
> > Management.
> > Visit us today!
> > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.c
> > lktrk _______________________________________________
> > Opensaf-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> 
> 
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights, analysis
> and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clk
> trk
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to