Re: [devel] [PATCH 2 of 2] AMFND: Admin operation continuation if csi callback completes during headless [#1725 part 1] V1

praveen malviya Thu, 25 Aug 2016 00:04:29 -0700

Hi Minh,

One minor correction is still needed.
node_up event comes very early. In case atleast one node up event has 
come from all amfnds then AMFD stops Node sync timer very early even 
before cluster timer has started:
    if (rc_node_up == sync_nd_size) {
                         if (cb->node_sync_tmr.is_active) {
                                 avd_stop_tmr(cb, &cb->node_sync_tmr);
                                 TRACE("stop NodeSync timer");
                         }
                         cb->all_nodes_synced = true;


But AMFD does not process all these node up event because it is not in 
INIT state by this time: if ((n2d_msg->msg_info.n2d_node_up.node_id != 
cb->node_id_avd) && (cb->init_state < AVD_INIT_DONE)) {
                 TRACE("invalid init state (%u), node %x",
                         cb->init_state, 
n2d_msg->msg_info.n2d_node_up.node_id);
                 goto done;
         }

When AMFD moves to INIT state it starts the cluster startup timer. If 
this is very low then it will expire and it will see node sync timer not 
running and it will send led messages. By this some nodes may be in 
syncing state. So already synced amfnds will be sending the assignment 
messages when some amfnds are still syncing and they may host some SUs.

we need to bring two things at same level:
1)When led set message is sent to amfnds, all amfnds should be in 
PRESENT state. Means theere SUs are enabled and amfnd can process 
assignments.
2)All other amfnd which do not join till this time will be rebooted.

Problem: How to hold amfnds from sending assignment events from buffer 
until all of them are in PRESENT state and non PRESENT will never join 
the cluster. Neither Node sync timer nor cluster startup timer ensures 
that all amfnds are synced and non synced will be rebooted.

One idea: What if we do not stop the node sync timer when  "if 
(rc_node_up == sync_nd_size)" is hit and mark node_sync_window_closed 
true as done when it really expires in avd_node_sync_tmr_evh()? What are 
the implications of it. But AMFD will still have to make sure fresh 
node_up_event() sending nodes will be rebooted. So if cluster timer 
expires early then it will see node sync timer running and will not send 
led set. But will all the payloads really move to PRESENT state in 10 
seconds, I am relying on the chosen value of 10 seconds.

Thanks,
Praveen




On 24-Aug-16 5:35 PM, praveen malviya wrote:
> Hi Minh,
>
> Any assignment message should be processed after cluster timer expiry
> and node sync timer expiry. The bug fix patch
> 1725_02_V2_bugfix_resend_buffer_in_set_leds.diff honors cluster timer
> expiry but not node sync timer.
> After node sync timer expiry, delayed payloads will be rebooted and if
> these payloads host any SU/SUSIs, they will be deleted. So admin op will
> finish gracefully.
> I think for loop can be added in both timers' expiry events with a check
> on exipry of other timer:
>
> diff --git a/osaf/services/saf/amf/amfd/cluster.cc
> b/osaf/services/saf/amf/amfd/cluster.cc
> --- a/osaf/services/saf/amf/amfd/cluster.cc
> +++ b/osaf/services/saf/amf/amfd/cluster.cc
> @@ -74,12 +74,13 @@ void avd_cluster_tmr_init_evh(AVD_CL_CB
>         m_AVSV_SEND_CKPT_UPDT_ASYNC_UPDT(cb, cb, AVSV_CKPT_AVD_CB_CONFIG);
>
>         // Resend set_leds to veteran node
> -
> -       for (std::map<std::string, AVD_AVND *>::const_iterator it =
> node_name_db->begin();
> -                       it != node_name_db->end(); it++) {
> -               node = it->second;
> -               if (node->veteran)
> -                       avd_snd_set_leds_msg(cb, node);
> +       if (cb->node_sync_tmr.is_active == false) {
> +               for (std::map<std::string, AVD_AVND *>::const_iterator
> it = node_name_db->begin();
> +                               it != node_name_db->end(); it++) {
> +                       node = it->second;
> +                       if (node->veteran)
> +                               avd_snd_set_leds_msg(cb, node);
> +               }
>         }
>
>         /* call the realignment routine for each of the SGs in the
> @@ -143,6 +144,17 @@ void avd_node_sync_tmr_evh(AVD_CL_CB *cb
>         // Setting true here to indicate the node sync window has closed
>         // Further node up message will be treated specially
>         cb->node_sync_window_closed = true;
> +        // Resend set_leds to veteran node
> +        if (cb->amf_init_tmr.is_active == false) {
> +               AVD_AVND *node = nullptr;
> +                for (std::map<std::string, AVD_AVND *>::const_iterator
> it = node_name_db->begin();
> +                                it != node_name_db->end(); it++) {
> +                        node = it->second;
> +                        if (node->veteran)
> +                                avd_snd_set_leds_msg(cb, node);
> +                }
> +        }
> +
>
>         TRACE_LEAVE();
>  }
>
>
>
> Thanks,
> Praveen
> On 24-Aug-16 4:58 PM, Nagendra Kumar wrote:
>> The below is the assignments after the test case (SU2 has standby
>> assignment):
>>
>> PM_SC-1:/home/nagu/views/staging-1725 # /etc/init.d/opensafd  status
>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
>>
>>         saAmfSISUHAState=STANDBY(2)
>> safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
>>
>>         saAmfSISUHAState=ACTIVE(1)
>> safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
>>
>>         saAmfSISUHAState=ACTIVE(1)
>> safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
>>
>>         saAmfSISUHAState=ACTIVE(1)
>> safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
>>
>>         saAmfSISUHAState=ACTIVE(1)
>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
>>         saAmfSISUHAState=STANDBY(2)
>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
>>         saAmfSISUHAState=ACTIVE(1)
>>
>> Thanks
>> -Nagu
>>
>>> -----Original Message-----
>>> From: Nagendra Kumar
>>> Sent: 24 August 2016 16:55
>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya;
>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Admin operation
>>> continuation if
>>> csi callback completes during headless [#1725 part 1] V1
>>>
>>> Hi Minh,
>>>     With 1725_phase_1_V2.tgz, the below email TC has failed. Please
>>> find the traces attached along with the configuration in the ticket.
>>>
>>> Thanks
>

------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 2 of 2] AMFND: Admin operation continuation if csi callback completes during headless [#1725 part 1] V1

Reply via email to