Hi Praveen,
Do you have any idea why @is_avd_down was false that made amfnd to send
susi_resp at 12:37:20.453974?
It should be true by the end of avnd_evt_mds_avd_dn_evh() at
12:37:16.741518, is it right?
Thanks,
Minh
On 17/05/17 21:31, minh chau wrote:
> Hi Praveen,
>
> Thanks for looking at the issue.
> Here is what I am observing
>
> amfnd-PL3 received NCSMDS_DOWN indicating no active amfd
>
> May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629]
> >> avnd_evt_mds_avd_dn_evh
> May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651]
> WA AMF director unexpectedly crashed
> May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701]
> TR Delete all pending messages to be sent to AMFD
> May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709]
> NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
> May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709]
> NO Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending
> messages
> May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709]
> NO Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending
> messages
> May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083]
> TR SC absence timer started
> May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742]
> << avnd_evt_mds_avd_dn_evh
>
> But a bit later, susi got assigned, and amfnd-PL3 did send this susi
> response (it should not send out and buffer it, since the @is_avd_down
> should be true)
>
> May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866]
> >> avnd_di_susi_resp_send: Sending Resp
> su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1,
> si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1
> ...
> May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482]
> >> avnd_mds_send: Msg type '1'
> May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537]
> ER ncsmds_api for 0 FAILED, dest=0
>
> When SC1 restarted, amfd received the very first messages from PL3
> starting with msg_id=1 (it should be starting from 0)
>
> May 17 12:37:28.398633 osafamfd
> [7686:7686:src/amf/amfd/ndproc.cc:0330] NO Receive message with event
> type:12, msg_type:31, from node:2030f, msg_id:1
> May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334]
> NO Received node_up_msg from all nodes
> May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254]
> NO Received node_up from 2030f: msg_id 2
>
> Looks to me something should not happen inside
> avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(),
> @is_avd_down should be true, the msg counter should be reset to 0, but
> I do see the SC absence timer started. I couldn't figure how it
> happened for now
>
> Thanks,
> Minh
>
> On 17/05/17 20:03, praveen malviya wrote:
>> What I see is avnd_diq_del() is called as soon as system becomes
>> headless. This will delete all pending messages. But when component
>> will respond during SCs absence a new message will be generated and
>> buffered.
>> For node_up AMFD will ack the message, but amfnd calls
>> avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process().
>> We need to call avnd_diq_del() in ack message so that msg_id gets
>> updated.
>> Further looking into it..
>>
>>
>> Thanks.
>> Praveen
>>
>>
>>
>> On 17-May-17 1:50 PM, praveen malviya wrote:
>>> Hi Minh,
>>>
>>> While testing this, I am observing that amfd is dropping the assignment
>>> message because of message id mismatch:
>>> May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171]
>>> >> avd_su_si_assign_evh: id:1, node:2030f, act:5,
>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0
>>> ....
>>> ....
>>> May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075]
>>> WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f
>>> should be 3
>>> May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777]
>>> << avd_su_si_assign_evh
>>>
>>> I am also looking into this. For your reference I had attached amfd and
>>> amfnd traces from SC-1 and PL-3 respectively in the ticket.
>>> I am working with one controller and one payload.
>>>
>>>
>>> Thanks
>>> Praveen
>>>
>>> On 15-May-17 1:06 PM, Minh Chau wrote:
>>>> When amfnd-payload responds susi assignment response just before
>>>> both SC
>>>> go down, and that response message does not come to director.
>>>> Therefore,
>>>> the status of that assignment could be seen as "modifying" in IMM.
>>>> When
>>>> SC comes back, active amfd will be waiting for that response forever.
>>>>
>>>> Patch checks if a susi assignment response is sent but not-ack just
>>>> before
>>>> both SC come down, amfnd-payload will buffer it in a way as a susi get
>>>> assigned during SC absence
>>>> ---
>>>> src/amf/amfnd/di.cc | 53
>>>> +++++++++++++++++++++++++++++++++++++++++++++--------
>>>> 1 file changed, 45 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>>>> index e06b9260d..3776a09dc 100644
>>>> --- a/src/amf/amfnd/di.cc
>>>> +++ b/src/amf/amfnd/di.cc
>>>> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb,
>>>> uint32_t mid) {
>>>> Notes : None.
>>>> ******************************************************************************/
>>>>
>>>>
>>>> void avnd_diq_del(AVND_CB *cb) {
>>>> - AVND_DND_MSG_LIST *rec = 0;
>>>> - do {
>>>> - /* pop the record */
>>>> - m_AVND_DIQ_REC_POP(cb, rec);
>>>> - if (!rec) break;
>>>> + if ((cb->dnd_list.head != nullptr)) {
>>>> + AVND_DND_MSG_LIST *rec = 0;
>>>> + bool found = true;
>>>> + while (found) {
>>>> + found = false;
>>>> + for (rec = cb->dnd_list.head; rec != nullptr;
>>>> + rec = rec->next) {
>>>> + osafassert(rec->msg.type == AVND_MSG_AVD);
>>>> + // delete all pending messages that haven't been sent out
>>>> + if (rec->no_retries == 0) {
>>>> + m_AVND_DIQ_REC_POP(cb, rec);
>>>> + avnd_diq_rec_del(cb, rec);
>>>> + break;
>>>> + } else {
>>>> + // Assignment response had been sent, but not ack
>>>> because last
>>>> + // controller go down, reset msg_id and will be resent
>>>> later
>>>> + if (rec->msg.info.avd->msg_type ==
>>>> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) {
>>>> + if
>>>> (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) {
>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0;
>>>> + found = true;
>>>> + LOG_NO(
>>>> + "Found not-ack su_si_assign msg for SU:'%s', "
>>>> + "SI:'%s', ha_state:'%u', msg_act:'%u',
>>>> single_csi:'%u', "
>>>> + "error:'%u', msg_id:'%u'",
>>>> + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
>>>> + .n2d_su_si_assign.su_name),
>>>> + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
>>>> + .n2d_su_si_assign.si_name),
>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state,
>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act,
>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign
>>>> + .single_csi,
>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.error,
>>>> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id);
>>>> + }
>>>> + } else {
>>>> + // delete other messages for now
>>>> + m_AVND_DIQ_REC_POP(cb, rec);
>>>> + avnd_diq_rec_del(cb, rec);
>>>> + break;
>>>> + }
>>>> + }
>>>> - /* delete the record */
>>>> - avnd_diq_rec_del(cb, rec);
>>>> - } while (1);
>>>> + }
>>>> + }
>>>> + }
>>>> return;
>>>> }
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org!
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=KwsqvdArvOJV5IkAidvFxTT0JBVpgHVYUwJOsjK9dt4&s=Luyb_FCgTEXSpVle_diQMuhKxVmmm6cmv5VA03k0Zu8&e=
>>> _______________________________________________
>>> Opensaf-devel mailing list
>>> [email protected]
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Ddevel&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=KwsqvdArvOJV5IkAidvFxTT0JBVpgHVYUwJOsjK9dt4&s=e4sg0J1cdg4VnTqeWPDrNZlPv2BuIuFj4Dk7JACxgx8&e=
>>>
>>>
>>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel