Hi Praveen, I think we need to come back a bit to non-headless feature. The cluster init timer expiry ensures all nodes having MW SUs assigned and node state are PRESENT. It's the unique entry point to non-ncs SU assignment phase. We also need to keep this principle in headless for #1725. ... else { // this node is already up avd_node_state_set(avnd, AVD_AVND_STATE_PRESENT); avd_node_oper_state_set(avnd, SA_AMF_OPERATIONAL_ENABLED); * avnd->veteran = true;* // Update readiness state of all SUs which are waiting for node // oper state ... * ** // At this point, one one has become PRESENT, its MW SUs should be synced ** ** // We can do:** ** m_AVD_CLINIT_TMR_START(cb);** **** /* Check if all SUs are in 'in-service' cluster-wide, if so start assignments */** ** if ((cb->amf_init_tmr.is_active == true) && ** ** (cluster_su_instantiation_done(cb, nullptr) == true)) {** ** avd_stop_tmr(cb, &cb->amf_init_tmr);** ** cluster_startup_expiry_event_generate(cb);** ** }* goto node_joined; } ... }
Also, we just need to send set led only if cluster timer init expires. Do you think should it work? Thanks, Minh On 25/08/16 17:03, praveen malviya wrote: > Hi Minh, > > One minor correction is still needed. > node_up event comes very early. In case atleast one node up event has > come from all amfnds then AMFD stops Node sync timer very early even > before cluster timer has started: > if (rc_node_up == sync_nd_size) { > if (cb->node_sync_tmr.is_active) { > avd_stop_tmr(cb, &cb->node_sync_tmr); > TRACE("stop NodeSync timer"); > } > cb->all_nodes_synced = true; > > But AMFD does not process all these node up event because it is not in > INIT state by this time: if ((n2d_msg->msg_info.n2d_node_up.node_id != > cb->node_id_avd) && (cb->init_state < AVD_INIT_DONE)) { > TRACE("invalid init state (%u), node %x", > cb->init_state, > n2d_msg->msg_info.n2d_node_up.node_id); > goto done; > } > > When AMFD moves to INIT state it starts the cluster startup timer. If > this is very low then it will expire and it will see node sync timer > not running and it will send led messages. By this some nodes may be > in syncing state. So already synced amfnds will be sending the > assignment messages when some amfnds are still syncing and they may > host some SUs. > > we need to bring two things at same level: > 1)When led set message is sent to amfnds, all amfnds should be in > PRESENT state. Means theere SUs are enabled and amfnd can process > assignments. > 2)All other amfnd which do not join till this time will be rebooted. > > Problem: How to hold amfnds from sending assignment events from buffer > until all of them are in PRESENT state and non PRESENT will never join > the cluster. Neither Node sync timer nor cluster startup timer ensures > that all amfnds are synced and non synced will be rebooted. > > One idea: What if we do not stop the node sync timer when "if > (rc_node_up == sync_nd_size)" is hit and mark node_sync_window_closed > true as done when it really expires in avd_node_sync_tmr_evh()? What > are the implications of it. But AMFD will still have to make sure > fresh node_up_event() sending nodes will be rebooted. So if cluster > timer expires early then it will see node sync timer running and will > not send led set. But will all the payloads really move to PRESENT > state in 10 seconds, I am relying on the chosen value of 10 seconds. > > Thanks, > Praveen > > > > > On 24-Aug-16 5:35 PM, praveen malviya wrote: >> Hi Minh, >> >> Any assignment message should be processed after cluster timer expiry >> and node sync timer expiry. The bug fix patch >> 1725_02_V2_bugfix_resend_buffer_in_set_leds.diff honors cluster timer >> expiry but not node sync timer. >> After node sync timer expiry, delayed payloads will be rebooted and if >> these payloads host any SU/SUSIs, they will be deleted. So admin op will >> finish gracefully. >> I think for loop can be added in both timers' expiry events with a check >> on exipry of other timer: >> >> diff --git a/osaf/services/saf/amf/amfd/cluster.cc >> b/osaf/services/saf/amf/amfd/cluster.cc >> --- a/osaf/services/saf/amf/amfd/cluster.cc >> +++ b/osaf/services/saf/amf/amfd/cluster.cc >> @@ -74,12 +74,13 @@ void avd_cluster_tmr_init_evh(AVD_CL_CB >> m_AVSV_SEND_CKPT_UPDT_ASYNC_UPDT(cb, cb, >> AVSV_CKPT_AVD_CB_CONFIG); >> >> // Resend set_leds to veteran node >> - >> - for (std::map<std::string, AVD_AVND *>::const_iterator it = >> node_name_db->begin(); >> - it != node_name_db->end(); it++) { >> - node = it->second; >> - if (node->veteran) >> - avd_snd_set_leds_msg(cb, node); >> + if (cb->node_sync_tmr.is_active == false) { >> + for (std::map<std::string, AVD_AVND *>::const_iterator >> it = node_name_db->begin(); >> + it != node_name_db->end(); it++) { >> + node = it->second; >> + if (node->veteran) >> + avd_snd_set_leds_msg(cb, node); >> + } >> } >> >> /* call the realignment routine for each of the SGs in the >> @@ -143,6 +144,17 @@ void avd_node_sync_tmr_evh(AVD_CL_CB *cb >> // Setting true here to indicate the node sync window has closed >> // Further node up message will be treated specially >> cb->node_sync_window_closed = true; >> + // Resend set_leds to veteran node >> + if (cb->amf_init_tmr.is_active == false) { >> + AVD_AVND *node = nullptr; >> + for (std::map<std::string, AVD_AVND *>::const_iterator >> it = node_name_db->begin(); >> + it != node_name_db->end(); it++) { >> + node = it->second; >> + if (node->veteran) >> + avd_snd_set_leds_msg(cb, node); >> + } >> + } >> + >> >> TRACE_LEAVE(); >> } >> >> >> >> Thanks, >> Praveen >> On 24-Aug-16 4:58 PM, Nagendra Kumar wrote: >>> The below is the assignments after the test case (SU2 has standby >>> assignment): >>> >>> PM_SC-1:/home/nagu/views/staging-1725 # /etc/init.d/opensafd status >>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 >>> >>> >>> >>> saAmfSISUHAState=STANDBY(2) >>> safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF >>> >>> >>> saAmfSISUHAState=ACTIVE(1) >>> safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF >>> >>> >>> saAmfSISUHAState=ACTIVE(1) >>> safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF >>> >>> >>> saAmfSISUHAState=ACTIVE(1) >>> safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF >>> >>> >>> saAmfSISUHAState=ACTIVE(1) >>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF >>> saAmfSISUHAState=STANDBY(2) >>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF >>> saAmfSISUHAState=ACTIVE(1) >>> >>> Thanks >>> -Nagu >>> >>>> -----Original Message----- >>>> From: Nagendra Kumar >>>> Sent: 24 August 2016 16:55 >>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>> Cc: opensaf-devel@lists.sourceforge.net >>>> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Admin operation >>>> continuation if >>>> csi callback completes during headless [#1725 part 1] V1 >>>> >>>> Hi Minh, >>>> With 1725_phase_1_V2.tgz, the below email TC has failed. Please >>>> find the traces attached along with the configuration in the ticket. >>>> >>>> Thanks >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel