Re: [devel] [PATCH 2 of 2] AMFND: Admin operation continuation if csi callback completes during headless [#1725 part 1] V1

minh chau Thu, 25 Aug 2016 03:31:50 -0700

Hi Praveen,

I think we need to come back a bit to non-headless feature.
The cluster init timer expiry ensures all nodes having MW SUs assigned 
and node state are PRESENT. It's the unique entry point to non-ncs SU 
assignment phase.
We also need to keep this principle in headless for #1725.
...
         else {
             // this node is already up
             avd_node_state_set(avnd, AVD_AVND_STATE_PRESENT);
             avd_node_oper_state_set(avnd, SA_AMF_OPERATIONAL_ENABLED);
*            avnd->veteran = true;*
             // Update readiness state of all SUs which are waiting for node
             // oper state
             ... *
**            // At this point, one one has become PRESENT, its MW SUs 
should be synced **
**            // We can do:**
**            m_AVD_CLINIT_TMR_START(cb);**
****        /* Check if all SUs are in 'in-service' cluster-wide, if so 
start assignments */**
**            if ((cb->amf_init_tmr.is_active == true) && **
**                (cluster_su_instantiation_done(cb, nullptr) == true)) {**
**                avd_stop_tmr(cb, &cb->amf_init_tmr);**
**                cluster_startup_expiry_event_generate(cb);**
**            }*
             goto node_joined;
         }
...
}


Also, we just need to send set led only if cluster timer init expires.
Do you think should it work?

Thanks,
Minh

On 25/08/16 17:03, praveen malviya wrote:
> Hi Minh,
>
> One minor correction is still needed.
> node_up event comes very early. In case atleast one node up event has 
> come from all amfnds then AMFD stops Node sync timer very early even 
> before cluster timer has started:
>    if (rc_node_up == sync_nd_size) {
>                         if (cb->node_sync_tmr.is_active) {
>                                 avd_stop_tmr(cb, &cb->node_sync_tmr);
>                                 TRACE("stop NodeSync timer");
>                         }
>                         cb->all_nodes_synced = true;
>
> But AMFD does not process all these node up event because it is not in 
> INIT state by this time: if ((n2d_msg->msg_info.n2d_node_up.node_id != 
> cb->node_id_avd) && (cb->init_state < AVD_INIT_DONE)) {
>                 TRACE("invalid init state (%u), node %x",
>                         cb->init_state, 
> n2d_msg->msg_info.n2d_node_up.node_id);
>                 goto done;
>         }
>
> When AMFD moves to INIT state it starts the cluster startup timer. If 
> this is very low then it will expire and it will see node sync timer 
> not running and it will send led messages. By this some nodes may be 
> in syncing state. So already synced amfnds will be sending the 
> assignment messages when some amfnds are still syncing and they may 
> host some SUs.
>
> we need to bring two things at same level:
> 1)When led set message is sent to amfnds, all amfnds should be in 
> PRESENT state. Means theere SUs are enabled and amfnd can process 
> assignments.
> 2)All other amfnd which do not join till this time will be rebooted.
>
> Problem: How to hold amfnds from sending assignment events from buffer 
> until all of them are in PRESENT state and non PRESENT will never join 
> the cluster. Neither Node sync timer nor cluster startup timer ensures 
> that all amfnds are synced and non synced will be rebooted.
>
> One idea: What if we do not stop the node sync timer when  "if 
> (rc_node_up == sync_nd_size)" is hit and mark node_sync_window_closed 
> true as done when it really expires in avd_node_sync_tmr_evh()? What 
> are the implications of it. But AMFD will still have to make sure 
> fresh node_up_event() sending nodes will be rebooted. So if cluster 
> timer expires early then it will see node sync timer running and will 
> not send led set. But will all the payloads really move to PRESENT 
> state in 10 seconds, I am relying on the chosen value of 10 seconds.
>
> Thanks,
> Praveen
>
>
>
>
> On 24-Aug-16 5:35 PM, praveen malviya wrote:
>> Hi Minh,
>>
>> Any assignment message should be processed after cluster timer expiry
>> and node sync timer expiry. The bug fix patch
>> 1725_02_V2_bugfix_resend_buffer_in_set_leds.diff honors cluster timer
>> expiry but not node sync timer.
>> After node sync timer expiry, delayed payloads will be rebooted and if
>> these payloads host any SU/SUSIs, they will be deleted. So admin op will
>> finish gracefully.
>> I think for loop can be added in both timers' expiry events with a check
>> on exipry of other timer:
>>
>> diff --git a/osaf/services/saf/amf/amfd/cluster.cc
>> b/osaf/services/saf/amf/amfd/cluster.cc
>> --- a/osaf/services/saf/amf/amfd/cluster.cc
>> +++ b/osaf/services/saf/amf/amfd/cluster.cc
>> @@ -74,12 +74,13 @@ void avd_cluster_tmr_init_evh(AVD_CL_CB
>>         m_AVSV_SEND_CKPT_UPDT_ASYNC_UPDT(cb, cb, 
>> AVSV_CKPT_AVD_CB_CONFIG);
>>
>>         // Resend set_leds to veteran node
>> -
>> -       for (std::map<std::string, AVD_AVND *>::const_iterator it =
>> node_name_db->begin();
>> -                       it != node_name_db->end(); it++) {
>> -               node = it->second;
>> -               if (node->veteran)
>> -                       avd_snd_set_leds_msg(cb, node);
>> +       if (cb->node_sync_tmr.is_active == false) {
>> +               for (std::map<std::string, AVD_AVND *>::const_iterator
>> it = node_name_db->begin();
>> +                               it != node_name_db->end(); it++) {
>> +                       node = it->second;
>> +                       if (node->veteran)
>> +                               avd_snd_set_leds_msg(cb, node);
>> +               }
>>         }
>>
>>         /* call the realignment routine for each of the SGs in the
>> @@ -143,6 +144,17 @@ void avd_node_sync_tmr_evh(AVD_CL_CB *cb
>>         // Setting true here to indicate the node sync window has closed
>>         // Further node up message will be treated specially
>>         cb->node_sync_window_closed = true;
>> +        // Resend set_leds to veteran node
>> +        if (cb->amf_init_tmr.is_active == false) {
>> +               AVD_AVND *node = nullptr;
>> +                for (std::map<std::string, AVD_AVND *>::const_iterator
>> it = node_name_db->begin();
>> +                                it != node_name_db->end(); it++) {
>> +                        node = it->second;
>> +                        if (node->veteran)
>> +                                avd_snd_set_leds_msg(cb, node);
>> +                }
>> +        }
>> +
>>
>>         TRACE_LEAVE();
>>  }
>>
>>
>>
>> Thanks,
>> Praveen
>> On 24-Aug-16 4:58 PM, Nagendra Kumar wrote:
>>> The below is the assignments after the test case (SU2 has standby
>>> assignment):
>>>
>>> PM_SC-1:/home/nagu/views/staging-1725 # /etc/init.d/opensafd status
>>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
>>>  
>>>
>>>
>>>         saAmfSISUHAState=STANDBY(2)
>>> safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF 
>>>
>>>
>>>         saAmfSISUHAState=ACTIVE(1)
>>> safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF 
>>>
>>>
>>>         saAmfSISUHAState=ACTIVE(1)
>>> safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF 
>>>
>>>
>>>         saAmfSISUHAState=ACTIVE(1)
>>> safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF 
>>>
>>>
>>>         saAmfSISUHAState=ACTIVE(1)
>>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
>>>         saAmfSISUHAState=STANDBY(2)
>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
>>>         saAmfSISUHAState=ACTIVE(1)
>>>
>>> Thanks
>>> -Nagu
>>>
>>>> -----Original Message-----
>>>> From: Nagendra Kumar
>>>> Sent: 24 August 2016 16:55
>>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya;
>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au
>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Admin operation
>>>> continuation if
>>>> csi callback completes during headless [#1725 part 1] V1
>>>>
>>>> Hi Minh,
>>>>     With 1725_phase_1_V2.tgz, the below email TC has failed. Please
>>>> find the traces attached along with the configuration in the ticket.
>>>>
>>>> Thanks
>>
>

------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 2 of 2] AMFND: Admin operation continuation if csi callback completes during headless [#1725 part 1] V1

Reply via email to