Hi Praveen,

Thanks for looking at the issue.
Here is what I am observing

amfnd-PL3 received NCSMDS_DOWN indicating no active amfd

May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] >> 
avnd_evt_mds_avd_dn_evh
May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] WA 
AMF director unexpectedly crashed
May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701] TR 
Delete all pending messages to be sent to AMFD
May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO 
Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO 
Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending messages
May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO 
Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending messages
May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083] 
TR SC absence timer started
May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742] << 
avnd_evt_mds_avd_dn_evh

But a bit later, susi got assigned, and amfnd-PL3 did send this susi 
response (it should not send out and buffer it, since the @is_avd_down 
should be true)

May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866] >> 
avnd_di_susi_resp_send: Sending Resp 
su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, 
si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1
...
May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482] 
 >> avnd_mds_send: Msg type '1'
May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537] 
ER ncsmds_api for 0 FAILED, dest=0

When SC1 restarted, amfd received the very first messages from PL3 
starting with msg_id=1 (it should be starting from 0)

May 17 12:37:28.398633 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0330] 
NO Receive message with event type:12, msg_type:31, from node:2030f, 
msg_id:1
May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334] 
NO Received node_up_msg from all nodes
May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254] 
NO Received node_up from 2030f: msg_id 2

Looks to me something should not happen inside 
avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(), 
@is_avd_down should be true, the msg counter should be reset to 0, but I 
do see the SC absence timer started. I couldn't figure how it happened 
for now

Thanks,
Minh

On 17/05/17 20:03, praveen malviya wrote:
> What I see is avnd_diq_del() is called as soon as system becomes 
> headless. This will delete all pending messages. But when component 
> will respond during SCs absence a new message will be generated and 
> buffered.
> For node_up AMFD will ack the message, but amfnd calls 
> avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process().
> We need to call avnd_diq_del() in ack message so that msg_id gets 
> updated.
> Further looking into it..
>
>
> Thanks.
> Praveen
>
>
>
> On 17-May-17 1:50 PM, praveen malviya wrote:
>> Hi Minh,
>>
>> While testing this, I am observing that amfd is dropping the assignment
>> message because of message id mismatch:
>> May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171]
>>   >> avd_su_si_assign_evh: id:1, node:2030f, act:5,
>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0
>> ....
>> ....
>> May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075]
>> WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f 
>> should be 3
>> May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777]
>> << avd_su_si_assign_evh
>>
>> I am also looking into this. For your reference I had attached amfd and
>> amfnd traces from SC-1 and PL-3 respectively in the ticket.
>> I am working with one controller and one payload.
>>
>>
>> Thanks
>> Praveen
>>
>> On 15-May-17 1:06 PM, Minh Chau wrote:
>>> When amfnd-payload responds susi assignment response just before 
>>> both SC
>>> go down, and that response message does not come to director. 
>>> Therefore,
>>> the status of that assignment could be seen as "modifying" in IMM. When
>>> SC comes back, active amfd will be waiting for that response forever.
>>>
>>> Patch checks if a susi assignment response is sent but not-ack just 
>>> before
>>> both SC come down, amfnd-payload will buffer it in a way as a susi get
>>> assigned during SC absence
>>> ---
>>>    src/amf/amfnd/di.cc | 53 
>>> +++++++++++++++++++++++++++++++++++++++++++++--------
>>>    1 file changed, 45 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>>> index e06b9260d..3776a09dc 100644
>>> --- a/src/amf/amfnd/di.cc
>>> +++ b/src/amf/amfnd/di.cc
>>> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, 
>>> uint32_t mid) {
>>>      Notes         : None.
>>> ******************************************************************************/
>>>    void avnd_diq_del(AVND_CB *cb) {
>>> -  AVND_DND_MSG_LIST *rec = 0;
>>>    -  do {
>>> -    /* pop the record */
>>> -    m_AVND_DIQ_REC_POP(cb, rec);
>>> -    if (!rec) break;
>>> +  if ((cb->dnd_list.head != nullptr)) {
>>> +    AVND_DND_MSG_LIST *rec = 0;
>>> +    bool found = true;
>>> +    while (found) {
>>> +      found = false;
>>> +      for (rec = cb->dnd_list.head; rec != nullptr;
>>> +           rec = rec->next) {
>>> +        osafassert(rec->msg.type == AVND_MSG_AVD);
>>> +        // delete all pending messages that haven't been sent out
>>> +        if (rec->no_retries == 0) {
>>> +          m_AVND_DIQ_REC_POP(cb, rec);
>>> +          avnd_diq_rec_del(cb, rec);
>>> +          break;
>>> +        } else {
>>> +          // Assignment response had been sent, but not ack because 
>>> last
>>> +          // controller go down, reset msg_id and will be resent later
>>> +          if (rec->msg.info.avd->msg_type == 
>>> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) {
>>> +            if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id 
>>> != 0) {
>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0;
>>> +              found = true;
>>> +              LOG_NO(
>>> +                  "Found not-ack su_si_assign msg for SU:'%s', "
>>> +                  "SI:'%s', ha_state:'%u', msg_act:'%u', 
>>> single_csi:'%u', "
>>> +                  "error:'%u', msg_id:'%u'",
>>> + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
>>> + .n2d_su_si_assign.su_name),
>>> + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
>>> + .n2d_su_si_assign.si_name),
>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state,
>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act,
>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign
>>> +                      .single_csi,
>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.error,
>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id);
>>> +            }
>>> +          } else {
>>> +            // delete other messages for now
>>> +            m_AVND_DIQ_REC_POP(cb, rec);
>>> +            avnd_diq_rec_del(cb, rec);
>>> +            break;
>>> +          }
>>> +        }
>>>    -    /* delete the record */
>>> -    avnd_diq_rec_del(cb, rec);
>>> -  } while (1);
>>> +      }
>>> +    }
>>> +  }
>>>         return;
>>>    }
>>>
>>
>> ------------------------------------------------------------------------------
>>  
>>
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=KwsqvdArvOJV5IkAidvFxTT0JBVpgHVYUwJOsjK9dt4&s=Luyb_FCgTEXSpVle_diQMuhKxVmmm6cmv5VA03k0Zu8&e=
>> _______________________________________________
>> Opensaf-devel mailing list
>> Opensaf-devel@lists.sourceforge.net
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Ddevel&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=KwsqvdArvOJV5IkAidvFxTT0JBVpgHVYUwJOsjK9dt4&s=e4sg0J1cdg4VnTqeWPDrNZlPv2BuIuFj4Dk7JACxgx8&e=
>>  
>>
>>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to