Hi Nagu,

This patch is just for a corner case, where failover happens in between 
AVD_INIT_DONE and AVD_APP_STATE, we still have to reboot the node if out 
of cold sync happens.
So I think we still have to keep that sentence in Compliance Table.

Thanks,
Minh

On 16/02/17 17:24, Nagendra Kumar wrote:
>
> Hi Minh,
>
> Good catch !! Yes, please push, but as such we have documented in 
> Compliance Table that “Before the timer expiry, failover and 
> switchover are not supported.”
>
> Thanks
>
> -Nagu
>
> *From:*minh chau [mailto:minh.c...@dektech.com.au]
> *Sent:* 16 February 2017 07:36
> *To:* Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; 
> gary....@dektech.com.au
> *Cc:* opensaf-devel@lists.sourceforge.net
> *Subject:* Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless 
> sync before standby AMFD comes up [#2162]
>
> Hi Nagu,
>
> Thanks for reminding, there's one change in the patch that could 
> affect on upgrade too, it is:
>
> +                // The cb->init_state must be AVD_INIT_DONE or 
> AVD_APP_STATE
> +                // If AVD_INIT_DONE, there was a SC failover during 
> cluster
> +                // instantiation phase in cluster (after all NCS SU 
> is assigned)
> +                // If AVD_APP_STATE, this should be come from 2N-MW 
> SI swap
> *+                if (cb->init_state >= AVD_INIT_DONE) {*
> +                    if (cluster_su_instantiation_done(cb, nullptr) == 
> true) {
> + cluster_startup_expiry_event_generate(cb);
> +                    } else {
> +                        m_AVD_CLINIT_TMR_START(cb);
> +                    }
> +                }
>
> So, I would like to make it for AVD_INIT_DONE only, it looks like
>
> +                // The cb->init_state must be AVD_INIT_DONE or 
> AVD_APP_STATE
> +                // If AVD_INIT_DONE, there was a SC failover during 
> cluster
> +                // instantiation phase in cluster (after all NCS SU 
> is assigned)
> *+                if (cb->init_state == AVD_INIT_DONE) {*
> +                    if (cluster_su_instantiation_done(cb, nullptr) == 
> true) {
> + cluster_startup_expiry_event_generate(cb);
> +                    } else {
> +                        m_AVD_CLINIT_TMR_START(cb);
> +                    }
> +                }
>
> If you agree, I can push the patches with new change.
>
> Thanks,
> Minh
>
> On 15/02/17 15:13, Nagendra Kumar wrote:
>
>     Yes, ack for both the patches. I assume you would have tested upgrade 
> scenarios.
>
>     Thanks
>
>     -Nagu
>
>         -----Original Message-----
>
>         From: minh chau [mailto:minh.c...@dektech.com.au]
>
>         Sent: 15 February 2017 08:52
>
>         To: Nagendra Kumar;hans.nordeb...@ericsson.com 
> <mailto:hans.nordeb...@ericsson.com>; Praveen Malviya;
>
>         gary....@dektech.com.au <mailto:gary....@dektech.com.au>
>
>         Cc:opensaf-devel@lists.sourceforge.net
>         <mailto:opensaf-devel@lists.sourceforge.net>
>
>         Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless 
> sync
>
>         before standby AMFD comes up [#2162]
>
>         Hi Nagu,
>
>         The #2162 has two patches. I think your ack is for [PATCH 2 of 2] 
> AMFND:
>
>         Fix SC failover during headless sync before standby AMFD comes up 
> [#2162].
>
>         Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during 
> headless
>
>         sync at INIT_DONE state [#2162]) look ok?
>
>         Thanks,
>
>         Minh
>
>         On 14/02/17 20:40, Nagendra Kumar wrote:
>
>             Ack.
>
>             Tested the scenarios.
>
>             Thanks
>
>             -Nagu
>
>                 -----Original Message-----
>
>                 From: minh chau [mailto:minh.c...@dektech.com.au]
>
>                 Sent: 23 January 2017 16:24
>
>                 To: Nagendra Kumar;hans.nordeb...@ericsson.com
>                 <mailto:hans.nordeb...@ericsson.com>; Praveen Malviya;
>
>                 gary....@dektech.com.au <mailto:gary....@dektech.com.au>
>
>                 Cc:opensaf-devel@lists.sourceforge.net
>                 <mailto:opensaf-devel@lists.sourceforge.net>
>
>                 Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during 
> headless
>
>                 sync before standby AMFD comes up [#2162]
>
>                 Hi Nagu,
>
>                 I am checking the logs now.
>
>                 Thanks, Minh
>
>                 On 23/01/17 17:47, Nagendra Kumar wrote:
>
>                     The logs (Logs-tc.rar) attached in the ticket.
>
>                     Thanks
>
>                     -Nagu
>
>                         -----Original Message-----
>
>                         From: minh chau [mailto:minh.c...@dektech.com.au]
>
>                         Sent: 16 January 2017 05:47
>
>                         To: Nagendra Kumar;hans.nordeb...@ericsson.com
>                         <mailto:hans.nordeb...@ericsson.com>; Praveen Malviya;
>
>                         gary....@dektech.com.au
>                         <mailto:gary....@dektech.com.au>
>
>                         Cc:opensaf-devel@lists.sourceforge.net
>                         <mailto:opensaf-devel@lists.sourceforge.net>
>
>                         Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover 
> during headless
>
>                         sync before standby AMFD comes up [#2162]
>
>                         Hi Nagu,
>
>                         I misunderstood your point, and now I get it.
>
>                         In my test I see it works as expected - SU2 becomes 
> Act and no
>
>                         assignment for SU1 I guess in your test some how the 
> cluster
>
>                         initiation timer has not been started on SC2 (new 
> active), there
>
>                         could be a
>
>                 missing case in the patch.
>
>                         Could you please share me the trace?
>
>                         Thanks,
>
>                         Minh
>
>                         On 13/01/17 21:48, Nagendra Kumar wrote:
>
>                             Hi Minh,
>
>                                 Please check my response inlined with [Nagu].
>
>                             Thanks
>
>                             -Nagu
>
>                                 -----Original Message-----
>
>                                 From: minh chau 
> [mailto:minh.c...@dektech.com.au]
>
>                                 Sent: 13 January 2017 03:53
>
>                                 To: Nagendra Kumar;hans.nordeb...@ericsson.com
>                                 <mailto:hans.nordeb...@ericsson.com>; Praveen
>
>         Malviya;
>
>                                 gary....@dektech.com.au
>                                 <mailto:gary....@dektech.com.au>
>
>                                 Cc:opensaf-devel@lists.sourceforge.net
>                                 <mailto:opensaf-devel@lists.sourceforge.net>
>
>                                 Subject: Re: [PATCH 2 of 2] AMFND: Fix SC 
> failover during
>
>                                 headless sync before standby AMFD comes up 
> [#2162]
>
>                                 Hi Nagu,
>
>                                 Thanks for reviewing, please see comments 
> inline.
>
>                                 Thanks,
>
>                                 Minh
>
>                                 On 12/01/17 21:48, Nagendra Kumar wrote:
>
>                                     Hi Minh,
>
>                                            Though I am not able to simulate 
> the problem, I tested as
>
>         below:
>
>                                     1. Start SC1, SC2, PL-3 and PL-4. 
> Configure SU1 on PL-3 as Act
>
>                                     and
>
>                                     SU2 on
>
>                                 PL-4 as Standby.
>
>                                     2. Stop SC1 and SC2 and then stop PL-3.
>
>                                     3. Start SC-1 and SC-2. When SC-2 prints 
> Cold sync complete,
>
>                                     stop SC1. SC2
>
>                                 becomes Act.
>
>                                 [M]: As SU1 is on PL3, SU2 is on PL4, and If 
> PL-3 is stopped,
>
>                                 then only
>
>                                 SU2 has active assignment
>
>                             [Nagu]: PL-3 is stopped in step #2.
>
>                                     In this case, SC-2 contains both SU1(Act) 
> and SU2(Standby)
>
>                 assignments.
>
>                                     Ideally, SU2 assignments should have been 
> Act and there
>
>                                     shouldn't be
>
>                                     SU1
>
>                                 assignment.
>
>                                 [M]: This seems to be another test where SU1 
> and SU2 are hosted
>
>                                 on SC2, then both SU1 and SU2 should get 
> assignment
>
>                             [Nagu]: I mean to say command 'amf-state siass' 
> run on SC-1
>
>                             displays both
>
>                         SU1 and SU2 assignments.
>
>                                                 SU1 and SU2 are hosted on 
> PL-3 and PL-4 respectively.
>
>                             This is similar test case, which is mentioned in 
> the ticket?
>
>         safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>
>                                 mo,safApp=AmfDemo1
>
>                                                  saAmfSISUHAState=ACTIVE(1)
>
>                                                  
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
>         safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>
>                                 mo,safApp=AmfDemo1
>
>                                                  saAmfSISUHAState=STANDBY(2)
>
>                                                  
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
>                                     Please check.
>
>                                     Thanks
>
>                                     -Nagu
>
>                                         -----Original Message-----
>
>                                         From: Minh Hon Chau 
> [mailto:minh.c...@dektech.com.au]
>
>                                         Sent: 08 November 2016 08:53
>
>                                         To:hans.nordeb...@ericsson.com
>                                         <mailto:hans.nordeb...@ericsson.com>; 
> Nagendra Kumar; Praveen
>
>                 Malviya;
>
>                                         gary....@dektech.com.au
>                                         
> <mailto:gary....@dektech.com.au>;minh.c...@dektech.com.au
>                                         <mailto:minh.c...@dektech.com.au>
>
>                                         Cc:opensaf-devel@lists.sourceforge.net
>                                         
> <mailto:opensaf-devel@lists.sourceforge.net>
>
>                                         Subject: [PATCH 2 of 2] AMFND: Fix SC 
> failover during headless
>
>                                         sync before standby AMFD comes up 
> [#2162]
>
>                                               
> osaf/services/saf/amf/amfnd/di.cc   |  7 +++++--
>
>                                               
> osaf/services/saf/amf/amfnd/susm.cc |  6 ++++++
>
>                                               2 files changed, 11 
> insertions(+), 2 deletions(-)
>
>                                         This case of SC failover causes new 
> active AMFD getting stuck
>
>                                         in node_up messages
>
>                                         Say first active controller is SC1, 
> which goes down during
>
>                                         headless
>
>                 sync.
>
>                                         Therefore, the amfnd on SC2 receives 
> mds_down of AVD, then
>
>         both
>
>                                         is_avd_down and amfd_sync_required 
> are set to true. When SC2
>
>                                         takes over active role, amfnd on SC2 
> receives mds_up, but only
>
>                                         is_avd_down is set to false and the 
> variable amfd_sync_required
>
>                         remains true.
>
>                                         When amfnd-SC2 finishes initiating 
> middleware SU, it needs to
>
>                                         send su_oper message to AMFD, but it 
> is failed to send out due
>
>                                         to
>
>                                 amfd_sync_required.
>
>                                         In this scenario of SC failover, 
> amfd_sync_required needs to
>
>                                         set to false when amfnd on SC2 
> receives su_pres message on
>
>                 middleware
>
>                         SUs.
>
>                                         That means amfnd on active controller 
> does not need to wait for
>
>                                         set_leds message, to be informed that 
> cluster initiation is
>
>                                         done, so that amfnd can sen su_oper 
> messages to AMFD. This
>
>                                         logic also aligns with normal 
> headless scenario, where amfnd on
>
>                                         active controller has 
> amfd_sync_required initially marked as
>
>                                         false because no middleware SUs are 
> initiated. When
>
>                                         amfd_sync_required is true that means 
> amfnd all middleware SUs
>
>                                         are initiated and assigned before 
> headless, thus amfnd needs to
>
>                                         wait for cluster initiation after
>
>                         headless.
>
>                                         diff --git 
> a/osaf/services/saf/amf/amfnd/di.cc
>
>                                         b/osaf/services/saf/amf/amfnd/di.cc
>
>                                         --- 
> a/osaf/services/saf/amf/amfnd/di.cc
>
>                                         +++ 
> b/osaf/services/saf/amf/amfnd/di.cc
>
>                                         @@ -748,7 +748,8 @@ uint32_t 
> avnd_di_oper_send(AVND_CB
>
>         *cb,
>
>                                                         if 
> (avnd_diq_rec_add(cb, &msg) == nullptr) {
>
>                                                              rc = 
> NCSCC_RC_FAILURE;
>
>                                                         }
>
>                                         -          
> LOG_NO("avnd_di_oper_send() deferred as AMF
>
>                 director is
>
>                                         offline");
>
>                                         +          
> LOG_NO("avnd_di_oper_send() deferred as AMF
>
>                 director is
>
>                                         offline(%d),"
>
>                                         +               " or sync is 
> required(%d)", cb->is_avd_down,
>
>                                         +cb->amfd_sync_required);
>
>                                                    } else {
>
>                                                         // We are in normal 
> cluster, send msg to director
>
>                                                         
> msg.info.avd->msg_info.n2d_opr_state.msg_id =
>
>                 ++(cb-
>
>                                             snd_msg_id); @@ -881,7 +882,9 @@ 
> uint32_t
>
>                                         avnd_di_susi_resp_send(AVND_CB
>
>                                                                 rc = 
> NCSCC_RC_FAILURE;
>
>                                                            }
>
>                                                            
> m_AVND_SU_ALL_SI_RESET(su);
>
>                                         -             
> LOG_NO("avnd_di_susi_resp_send() deferred as AMF
>
>                                 director is
>
>                                         offline");
>
>                                         +                
> LOG_NO("avnd_di_susi_resp_send() deferred as
>
>                                         + AMF director is
>
>                                         offline(%d),"
>
>                                         +                        " or sync is 
> required(%d)",
>
>                                         + cb->is_avd_down,
>
>                                         + cb->amfd_sync_required);
>
>                                         +
>
>                                                       } else {
>
>                                                            // We are in 
> normal cluster, send msg to director
>
>                                                            
> msg.info.avd->msg_info.n2d_su_si_assign.msg_id =
>
>                                         ++(cb-
>
>                                             snd_msg_id); diff --git 
> a/osaf/services/saf/amf/amfnd/susm.cc
>
>                                         b/osaf/services/saf/amf/amfnd/susm.cc
>
>                                         --- 
> a/osaf/services/saf/amf/amfnd/susm.cc
>
>                                         +++ 
> b/osaf/services/saf/amf/amfnd/susm.cc
>
>                                         @@ -1345,6 +1345,12 @@ uint32_t
>
>                         avnd_evt_avd_su_pres_evh(AVND_C
>
>                                                                   goto done;
>
>                                                         }
>
>                                                    } else { /* => instantiate 
> the su */
>
>                                         +          // Do not need to wait for 
> headless sync if there is no
>
>                                         application SUs
>
>                                         +          // initiated. This is 
> known because here we are
>
>                 receiving
>
>                                         su_pres message
>
>                                         +          // for NCS SUs
>
>                                         +          if (su->is_ncs == true)
>
>                                         +               
> cb->amfd_sync_required = false;
>
>                                         +
>
>                                                         AVND_EVT *evt_ir = 0;
>
>                                                         TRACE("Sending to Imm 
> thread.");
>
>                                                         evt_ir = 
> avnd_evt_create(cb, AVND_EVT_IR, 0,
>
>                 nullptr,
>
>                                         &info-
>
>                                             su_name, 0, 0);
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to