Hi Minh,
         I thought, it would be rare. But if you find that it is breaking your 
existing functionality(backward compatibility) i.e. delaying the Cluster 
Startup i.e. earlier it used to take say 2 seconds, now it takes 10 seconds 
with one controller, then Gary can retain the older code. It is fine with me.

Thanks
-Nagu

-----Original Message-----
From: Minh Hon Chau [mailto:[email protected]] 
Sent: 30 October 2018 12:40
To: Nagendra Kumar; 'Gary Lee'; [email protected]
Cc: [email protected]
Subject: Re: [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946]

Hi Nagu,

The subsequent procedures will be delayed, including the assignments 
too, to wait for SC2 to join, it is nearly 10 secs I guess. The headless 
sync does not need to wait for standby amfd, so here the code was not 
expecting to wait for SC-2. I think this scenario is rare?

Thanks

Minh


On 30/10/18 4:43 pm, Nagendra Kumar wrote:
> Hi Minh,
>          I had noticed that point while review. But, if both SCs have gone 
> down, then expected is both should join.
> If only one SC starts, then yes timeout will happen. Do you see any major 
> implications than assignments delay, which I think should be fine because, 
> the expected delay is waiting for SC-2 to join?
>
> Thanks
> -Nagu
>
> -----Original Message-----
> From: Minh Hon Chau [mailto:[email protected]]
> Sent: 30 October 2018 02:41
> To: Nagendra Kumar; 'Gary Lee'; [email protected]
> Cc: [email protected]
> Subject: Re: [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946]
>
> Hi Gary, Nagu
>
> One notice you may know from the patch.
>
> If we have two SCs cluster, go headless, only start SC1, now the
> headless sync will be always timeout to wait for SC2 up.
>
> Thanks
>
> Minh
>
> On 29/10/18 7:19 pm, Nagendra Kumar wrote:
>> Hi Gary,
>>             Great simplification!. Ack.
>>
>> Thanks
>> -Nagu
>>
>> -----Original Message-----
>> From: Gary Lee [mailto:[email protected]]
>> Sent: 29 October 2018 12:36
>> To: [email protected]; [email protected]; Nagendra Kumar
>> Cc: [email protected]; Gary Lee
>> Subject: [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946]
>>
>> If all nodes are synced after headless, the timer is stopped
>> but node_sync_window_closed is never set to true.
>>
>> Later on, if a node becomes split from the main network and
>> rejoins, it will send a headless sync to amfd.
>>
>> amfd will go into a never ending loop of processing the message,
>> putting back into the queue, etc.
>>
>> When the node sync timer is stopped, ensure node_sync_window_closed
>> is set.
>>
>> Also modify avd_count_node_up() not to count standby SC.
>> Sometimes a node_up from the standby SC arrives before mds up,
>> and the stadnby SC is incorrectly included in the node sync
>> count. Then a legitimate node_up from a PL is not accepted
>> because node_sync_window_closed is prematurely set.
>> ---
>>    src/amf/amfd/ndfsm.cc | 28 +++-------------------------
>>    1 file changed, 3 insertions(+), 25 deletions(-)
>>
>> diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
>> index edc993988..375c5c7b1 100644
>> --- a/src/amf/amfd/ndfsm.cc
>> +++ b/src/amf/amfd/ndfsm.cc
>> @@ -165,34 +165,12 @@ done:
>>     *
>>    
>> **************************************************************************/
>>    uint32_t avd_count_sync_node_size(AVD_CL_CB *cb) {
>> -  uint32_t twon_ncs_su_count = 0;
>>      uint32_t count = 0;
>>      TRACE_ENTER();
>>    
>> -  for (const auto &value : *node_name_db) {
>> -    AVD_AVND *avnd = value.second;
>> -    osafassert(avnd);
>> -    for (const auto &su : avnd->list_of_ncs_su) {
>> -      if (su->sg_of_su->sg_redundancy_model == SA_AMF_2N_REDUNDANCY_MODEL)
>> {
>> -        twon_ncs_su_count++;
>> -        continue;
>> -      }
>> -    }
>> -  }
>> -  // cluster can have 1 SC or more SCs which hosting 2N Opensaf SU
>> -  // so twon_ncs_su_count at least is 1
>> -  osafassert(twon_ncs_su_count > 0);
>> -
>> -  if (twon_ncs_su_count == 1) {
>> -    // 1 SC, the rest of nodes could be in sync from headless
>> -    count = node_name_db->size() - 1;
>> -  } else {
>> -    // >=2 SCs, the rest of nodes could be in sync except active/standby SC
>> -    count = node_name_db->size() - 2;
>> -  }
>> +  count = node_name_db->size() - 1;
>>    
>>      TRACE("sync node size:%d", count);
>> -  TRACE_LEAVE();
>>      return count;
>>    }
>>    
>> /***************************************************************************
>> **
>> @@ -218,8 +196,7 @@ uint32_t avd_count_node_up(AVD_CL_CB *cb) {
>>      for (const auto &value : *node_name_db) {
>>        node = value.second;
>>        if (node->node_up_msg_count > 0 &&
>> -        node->node_info.nodeId != cb->node_id_avd &&
>> -        node->node_info.nodeId != cb->node_id_avd_other)
>> +        node->node_info.nodeId != cb->node_id_avd)
>>          ++received_count;
>>      }
>>      TRACE("Number of node director(s) that director received node_up 
>> msg:%u",
>> @@ -329,6 +306,7 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
>>          if (cb->node_sync_tmr.is_active) {
>>            avd_stop_tmr(cb, &cb->node_sync_tmr);
>>            TRACE("stop NodeSync timer");
>> +        cb->node_sync_window_closed = true;
>>          }
>>          cb->all_nodes_synced = true;
>>          LOG_NO("Received node_up_msg from all nodes");
>



_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to