I don't have any good ideas right now. The old check isn't reliable, but we can think about it for later.
It should be a rare case anyway, and 10s is probably not that noticeable when you consider start up times of other apps. > On 30 Oct 2018, at 6:29 pm, Minh Hon Chau <minh.c...@dektech.com.au> wrote: > > Hi Nagu, > > It's fine with me too, or Gary knows how to exclude the wait of SC-2. > > Thanks > > Minh > >> On 30/10/18 6:14 pm, Nagendra Kumar wrote: >> Hi Minh, >> I thought, it would be rare. But if you find that it is breaking >> your existing functionality(backward compatibility) i.e. delaying the >> Cluster Startup i.e. earlier it used to take say 2 seconds, now it takes 10 >> seconds with one controller, then Gary can retain the older code. It is fine >> with me. >> >> Thanks >> -Nagu >> >> -----Original Message----- >> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >> Sent: 30 October 2018 12:40 >> To: Nagendra Kumar; 'Gary Lee'; hans.nordeb...@ericsson.com >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946] >> >> Hi Nagu, >> >> The subsequent procedures will be delayed, including the assignments >> too, to wait for SC2 to join, it is nearly 10 secs I guess. The headless >> sync does not need to wait for standby amfd, so here the code was not >> expecting to wait for SC-2. I think this scenario is rare? >> >> Thanks >> >> Minh >> >> >>> On 30/10/18 4:43 pm, Nagendra Kumar wrote: >>> Hi Minh, >>> I had noticed that point while review. But, if both SCs have gone >>> down, then expected is both should join. >>> If only one SC starts, then yes timeout will happen. Do you see any major >>> implications than assignments delay, which I think should be fine because, >>> the expected delay is waiting for SC-2 to join? >>> >>> Thanks >>> -Nagu >>> >>> -----Original Message----- >>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >>> Sent: 30 October 2018 02:41 >>> To: Nagendra Kumar; 'Gary Lee'; hans.nordeb...@ericsson.com >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946] >>> >>> Hi Gary, Nagu >>> >>> One notice you may know from the patch. >>> >>> If we have two SCs cluster, go headless, only start SC1, now the >>> headless sync will be always timeout to wait for SC2 up. >>> >>> Thanks >>> >>> Minh >>> >>>> On 29/10/18 7:19 pm, Nagendra Kumar wrote: >>>> Hi Gary, >>>> Great simplification!. Ack. >>>> >>>> Thanks >>>> -Nagu >>>> >>>> -----Original Message----- >>>> From: Gary Lee [mailto:gary....@dektech.com.au] >>>> Sent: 29 October 2018 12:36 >>>> To: minh.c...@dektech.com.au; hans.nordeb...@ericsson.com; Nagendra Kumar >>>> Cc: opensaf-devel@lists.sourceforge.net; Gary Lee >>>> Subject: [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946] >>>> >>>> If all nodes are synced after headless, the timer is stopped >>>> but node_sync_window_closed is never set to true. >>>> >>>> Later on, if a node becomes split from the main network and >>>> rejoins, it will send a headless sync to amfd. >>>> >>>> amfd will go into a never ending loop of processing the message, >>>> putting back into the queue, etc. >>>> >>>> When the node sync timer is stopped, ensure node_sync_window_closed >>>> is set. >>>> >>>> Also modify avd_count_node_up() not to count standby SC. >>>> Sometimes a node_up from the standby SC arrives before mds up, >>>> and the stadnby SC is incorrectly included in the node sync >>>> count. Then a legitimate node_up from a PL is not accepted >>>> because node_sync_window_closed is prematurely set. >>>> --- >>>> src/amf/amfd/ndfsm.cc | 28 +++------------------------- >>>> 1 file changed, 3 insertions(+), 25 deletions(-) >>>> >>>> diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc >>>> index edc993988..375c5c7b1 100644 >>>> --- a/src/amf/amfd/ndfsm.cc >>>> +++ b/src/amf/amfd/ndfsm.cc >>>> @@ -165,34 +165,12 @@ done: >>>> * >>>> >>>> **************************************************************************/ >>>> uint32_t avd_count_sync_node_size(AVD_CL_CB *cb) { >>>> - uint32_t twon_ncs_su_count = 0; >>>> uint32_t count = 0; >>>> TRACE_ENTER(); >>>> - for (const auto &value : *node_name_db) { >>>> - AVD_AVND *avnd = value.second; >>>> - osafassert(avnd); >>>> - for (const auto &su : avnd->list_of_ncs_su) { >>>> - if (su->sg_of_su->sg_redundancy_model == SA_AMF_2N_REDUNDANCY_MODEL) >>>> { >>>> - twon_ncs_su_count++; >>>> - continue; >>>> - } >>>> - } >>>> - } >>>> - // cluster can have 1 SC or more SCs which hosting 2N Opensaf SU >>>> - // so twon_ncs_su_count at least is 1 >>>> - osafassert(twon_ncs_su_count > 0); >>>> - >>>> - if (twon_ncs_su_count == 1) { >>>> - // 1 SC, the rest of nodes could be in sync from headless >>>> - count = node_name_db->size() - 1; >>>> - } else { >>>> - // >=2 SCs, the rest of nodes could be in sync except active/standby >>>> SC >>>> - count = node_name_db->size() - 2; >>>> - } >>>> + count = node_name_db->size() - 1; >>>> TRACE("sync node size:%d", count); >>>> - TRACE_LEAVE(); >>>> return count; >>>> } >>>> >>>> /*************************************************************************** >>>> ** >>>> @@ -218,8 +196,7 @@ uint32_t avd_count_node_up(AVD_CL_CB *cb) { >>>> for (const auto &value : *node_name_db) { >>>> node = value.second; >>>> if (node->node_up_msg_count > 0 && >>>> - node->node_info.nodeId != cb->node_id_avd && >>>> - node->node_info.nodeId != cb->node_id_avd_other) >>>> + node->node_info.nodeId != cb->node_id_avd) >>>> ++received_count; >>>> } >>>> TRACE("Number of node director(s) that director received node_up >>>> msg:%u", >>>> @@ -329,6 +306,7 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_EVT *evt) { >>>> if (cb->node_sync_tmr.is_active) { >>>> avd_stop_tmr(cb, &cb->node_sync_tmr); >>>> TRACE("stop NodeSync timer"); >>>> + cb->node_sync_window_closed = true; >>>> } >>>> cb->all_nodes_synced = true; >>>> LOG_NO("Received node_up_msg from all nodes"); >> _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel