Hi Minh,

In the description of the ticket there is a log which is :
"
Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 
'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 
'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO 
avnd_di_susi_resp_send() deferred as AMF director is offline
"
Last line in above log means AMFND was sending the message when it new 
about SC absence state. I think this issue is already fixed during #1725 
and this published patch is not required. Why? After led set message 
amfnd will anyway send this message.

The logs that I have attached can be ignored. I was simulating the bug 
on different assumptions.

One question regarding the patch:
If the goal is to fix the issue when the message is being sent and 
system has become SC-less. In this situation, then avnd_mds_send() will 
return, most probably,  a failure as MDS will not find the destination. 
In mds failure case,  rec->no_retries will not be incremented and will 
remain zero. Now AMFND will process down of SC and it will call 
avnd_diq_del(). In this function, since no_retries is zero for this 
message(first message),  the message will be deleted.


Thanks,
Praveen


On 18-May-17 9:14 AM, minh chau wrote:
> Hi Praveen,
> 
> Do you have any idea why @is_avd_down was false that made amfnd to send 
> susi_resp at 12:37:20.453974?
> It should be true by the end of avnd_evt_mds_avd_dn_evh() at 
> 12:37:16.741518, is it right?
> 
> Thanks,
> Minh
> On 17/05/17 21:31, minh chau wrote:
>> Hi Praveen,
>>
>> Thanks for looking at the issue.
>> Here is what I am observing
>>
>> amfnd-PL3 received NCSMDS_DOWN indicating no active amfd
>>
>> May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] 
>> >> avnd_evt_mds_avd_dn_evh
>> May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] 
>> WA AMF director unexpectedly crashed
>> May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701] 
>> TR Delete all pending messages to be sent to AMFD
>> May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
>> NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
>> May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
>> NO Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending 
>> messages
>> May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
>> NO Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending 
>> messages
>> May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083] 
>> TR SC absence timer started
>> May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742] 
>> << avnd_evt_mds_avd_dn_evh
>>
>> But a bit later, susi got assigned, and amfnd-PL3 did send this susi 
>> response (it should not send out and buffer it, since the @is_avd_down 
>> should be true)
>>
>> May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866] 
>> >> avnd_di_susi_resp_send: Sending Resp 
>> su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, 
>> si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1
>> ...
>> May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482] 
>> >> avnd_mds_send: Msg type '1'
>> May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537] 
>> ER ncsmds_api for 0 FAILED, dest=0
>>
>> When SC1 restarted, amfd received the very first messages from PL3 
>> starting with msg_id=1 (it should be starting from 0)
>>
>> May 17 12:37:28.398633 osafamfd 
>> [7686:7686:src/amf/amfd/ndproc.cc:0330] NO Receive message with event 
>> type:12, msg_type:31, from node:2030f, msg_id:1
>> May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334] 
>> NO Received node_up_msg from all nodes
>> May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254] 
>> NO Received node_up from 2030f: msg_id 2
>>
>> Looks to me something should not happen inside 
>> avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(), 
>> @is_avd_down should be true, the msg counter should be reset to 0, but 
>> I do see the SC absence timer started. I couldn't figure how it 
>> happened for now
>>
>> Thanks,
>> Minh
>>
>> On 17/05/17 20:03, praveen malviya wrote:
>>> What I see is avnd_diq_del() is called as soon as system becomes 
>>> headless. This will delete all pending messages. But when component 
>>> will respond during SCs absence a new message will be generated and 
>>> buffered.
>>> For node_up AMFD will ack the message, but amfnd calls 
>>> avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process().
>>> We need to call avnd_diq_del() in ack message so that msg_id gets 
>>> updated.
>>> Further looking into it..
>>>
>>>
>>> Thanks.
>>> Praveen
>>>
>>>
>>>
>>> On 17-May-17 1:50 PM, praveen malviya wrote:
>>>> Hi Minh,
>>>>
>>>> While testing this, I am observing that amfd is dropping the assignment
>>>> message because of message id mismatch:
>>>> May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171]
>>>>   >> avd_su_si_assign_evh: id:1, node:2030f, act:5,
>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0
>>>> ....
>>>> ....
>>>> May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075]
>>>> WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f 
>>>> should be 3
>>>> May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777]
>>>> << avd_su_si_assign_evh
>>>>
>>>> I am also looking into this. For your reference I had attached amfd and
>>>> amfnd traces from SC-1 and PL-3 respectively in the ticket.
>>>> I am working with one controller and one payload.
>>>>
>>>>
>>>> Thanks
>>>> Praveen
>>>>
>>>> On 15-May-17 1:06 PM, Minh Chau wrote:
>>>>> When amfnd-payload responds susi assignment response just before 
>>>>> both SC
>>>>> go down, and that response message does not come to director. 
>>>>> Therefore,
>>>>> the status of that assignment could be seen as "modifying" in IMM. 
>>>>> When
>>>>> SC comes back, active amfd will be waiting for that response forever.
>>>>>
>>>>> Patch checks if a susi assignment response is sent but not-ack just 
>>>>> before
>>>>> both SC come down, amfnd-payload will buffer it in a way as a susi get
>>>>> assigned during SC absence
>>>>> ---
>>>>>    src/amf/amfnd/di.cc | 53 
>>>>> +++++++++++++++++++++++++++++++++++++++++++++--------
>>>>>    1 file changed, 45 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>>>>> index e06b9260d..3776a09dc 100644
>>>>> --- a/src/amf/amfnd/di.cc
>>>>> +++ b/src/amf/amfnd/di.cc
>>>>> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, 
>>>>> uint32_t mid) {
>>>>>      Notes         : None.
>>>>> ******************************************************************************/
>>>>>  
>>>>>
>>>>>    void avnd_diq_del(AVND_CB *cb) {
>>>>> -  AVND_DND_MSG_LIST *rec = 0;
>>>>>    -  do {
>>>>> -    /* pop the record */
>>>>> -    m_AVND_DIQ_REC_POP(cb, rec);
>>>>> -    if (!rec) break;
>>>>> +  if ((cb->dnd_list.head != nullptr)) {
>>>>> +    AVND_DND_MSG_LIST *rec = 0;
>>>>> +    bool found = true;
>>>>> +    while (found) {
>>>>> +      found = false;
>>>>> +      for (rec = cb->dnd_list.head; rec != nullptr;
>>>>> +           rec = rec->next) {
>>>>> +        osafassert(rec->msg.type == AVND_MSG_AVD);
>>>>> +        // delete all pending messages that haven't been sent out
>>>>> +        if (rec->no_retries == 0) {
>>>>> +          m_AVND_DIQ_REC_POP(cb, rec);
>>>>> +          avnd_diq_rec_del(cb, rec);
>>>>> +          break;
>>>>> +        } else {
>>>>> +          // Assignment response had been sent, but not ack 
>>>>> because last
>>>>> +          // controller go down, reset msg_id and will be resent 
>>>>> later
>>>>> +          if (rec->msg.info.avd->msg_type == 
>>>>> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) {
>>>>> +            if 
>>>>> (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) {
>>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0;
>>>>> +              found = true;
>>>>> +              LOG_NO(
>>>>> +                  "Found not-ack su_si_assign msg for SU:'%s', "
>>>>> +                  "SI:'%s', ha_state:'%u', msg_act:'%u', 
>>>>> single_csi:'%u', "
>>>>> +                  "error:'%u', msg_id:'%u'",
>>>>> + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
>>>>> + .n2d_su_si_assign.su_name),
>>>>> + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
>>>>> + .n2d_su_si_assign.si_name),
>>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state,
>>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act,
>>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign
>>>>> +                      .single_csi,
>>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.error,
>>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id);
>>>>> +            }
>>>>> +          } else {
>>>>> +            // delete other messages for now
>>>>> +            m_AVND_DIQ_REC_POP(cb, rec);
>>>>> +            avnd_diq_rec_del(cb, rec);
>>>>> +            break;
>>>>> +          }
>>>>> +        }
>>>>>    -    /* delete the record */
>>>>> -    avnd_diq_rec_del(cb, rec);
>>>>> -  } while (1);
>>>>> +      }
>>>>> +    }
>>>>> +  }
>>>>>         return;
>>>>>    }
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>  
>>>>
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! 
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=KwsqvdArvOJV5IkAidvFxTT0JBVpgHVYUwJOsjK9dt4&s=Luyb_FCgTEXSpVle_diQMuhKxVmmm6cmv5VA03k0Zu8&e=
>>>>  
>>>>
>>>> _______________________________________________
>>>> Opensaf-devel mailing list
>>>> Opensaf-devel@lists.sourceforge.net
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Ddevel&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=KwsqvdArvOJV5IkAidvFxTT0JBVpgHVYUwJOsjK9dt4&s=e4sg0J1cdg4VnTqeWPDrNZlPv2BuIuFj4Dk7JACxgx8&e=
>>>>  
>>>>
>>>>
>>>
>>
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to