Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-19 Thread minh chau
Hi Praveen,

I have sent out V2:
- If we don't care no_retries in avnd_diq_del(), then I think checking 
mds failure code can be ignored
- Add checking for su_oper msg, since amfd also waits for this message 
to orchestrate recovery

Thanks,
Minh

On 18/05/17 19:38, praveen malviya wrote:
> Hi Minh,
>
> I had analysed the traces you attached.
> Based on that I am able to test that. When MDS returns success patch 
> works fine.
> Minor correction is needed when MDS return failure.
> I think susi message should be kept independent of no. of tries in 
> avnd_diq_del().
> Thanks
> Praveen
>
>
> On 18-May-17 12:41 PM, minh chau wrote:
>> Hi Praveen,
>>
>> Some comments in line with [Minh]
>>
>> thanks,
>> Minh
>>
>> On 18/05/17 14:54, praveen malviya wrote:
>>> Hi Minh,
>>>
>>> In the description of the ticket there is a log which is :
>>> "
>>> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 
>>> 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 
>>> 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
>>> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO 
>>> avnd_di_susi_resp_send() deferred as AMF director is offline
>>> "
>>> Last line in above log means AMFND was sending the message when it 
>>> new about SC absence state. I think this issue is already fixed 
>>> during #1725 and this published patch is not required. Why? After 
>>> led set message amfnd will anyway send this message.
>> [Minh] I have reproduced the problem and attached to ticket for your 
>> reference.
>> Some outlined logs:
>> The step is stopping SC1, SC2.
>> In SC2, amfd sent susi assignment req to amfnd-PL3
>> May 18 16:32:03.633226 osafamfd [245:245:src/amf/amfd/sgproc.cc:2444] 
>> >> avd_sg_su_si_mod_snd: 
>> 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', state 1
>>
>> In PL3, amfnd completed this susi req, and sent susi resp 
>> successfully but it did not reach to amfd-SC2
>> May 18 16:32:03.641156 osafamfnd [186:186:src/amf/amfnd/su.cc:0373] 
>> >> avnd_evt_avd_info_su_si_assign_evh: 
>> 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
>> May 18 16:32:03.641744 osafamfnd [186:186:src/amf/amfnd/di.cc:0866] 
>> >> avnd_di_susi_resp_send: Sending Resp 
>> su=safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon, 
>> si=safSi=AmfDemoTwon,safApp=AmfDemoTwon, curr_state=1, prv_state=2
>>
>> amfnd-PL3 is notified NCSMDS_DOWN, amfnd deleted all pending msg 
>> waiting for ack
>> May 18 16:32:05.568471 osafamfnd [186:186:src/amf/amfnd/di.cc:0629] 
>> >> avnd_evt_mds_avd_dn_evh
>> May 18 16:32:05.568492 osafamfnd [186:186:src/amf/amfnd/di.cc:0651] 
>> WA AMF director unexpectedly crashed
>> May 18 16:32:05.568495 osafamfnd [186:186:src/amf/amfnd/di.cc:0701] 
>> TR Delete all pending messages to be sent to AMFD
>> May 18 16:32:05.568498 osafamfnd [186:186:src/amf/amfnd/di.cc:1353] 
>> >> avnd_diq_rec_del
>> May 18 16:32:05.568503 osafamfnd [186:186:src/amf/amfnd/di.cc:1369] 
>> << avnd_diq_rec_del
>>
>> When SC restarts, amfd-SC1 thinks this assignment being in progress, 
>> so it waits and waits forever
>> May 18 16:32:28.954967 osafamfd [257:257:src/amf/amfd/su.cc:2588] >> 
>> any_susi_fsm_in: SU:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
>> check_fsm:5
>> May 18 16:32:28.954975 osafamfd [257:257:src/amf/amfd/su.cc:2593] TR 
>> SUSI:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon,safSi=AmfDemoTwon,safApp=AmfDemoTwon',
>>  
>> fsm:'5'
>> May 18 16:32:28.954982 osafamfd [257:257:src/amf/amfd/su.cc:2596] TR 
>> Found
>> May 18 16:32:28.954989 osafamfd [257:257:src/amf/amfd/su.cc:2599] << 
>> any_susi_fsm_in
>> May 18 16:32:28.954996 osafamfd [257:257:src/amf/amfd/sg.cc:2340] << 
>> any_assignment_in_progress
>>
>> This problem is very close to the one you mentioned and fixed in 
>> #1725. In #1725, amfnd surely knows amfd down, so amfnd buffers msg. 
>> In #2105, amfnd sends msg out just before amfnd detects amfd being down.
>>>
>>> The logs that I have attached can be ignored. I was simulating the 
>>> bug on different assumptions.
>>>
>>> One question regarding the patch:
>>> If the goal is to fix the issue when the message is being sent and 
>>> system has become SC-less. In this situation, then avnd_mds_send() 
>>> will return, most probably,  a failure as MDS will not find the 
>>> destination. In mds failure case, rec->no_retries will not be 
>>> incremented and will remain zero. Now AMFND will process down of SC 
>>> and it will call avnd_diq_del(). In this function, since no_retries 
>>> is zero for this message(first message),  the message will be deleted.
>>>
>> [Minh]: Thanks, it's good to handle failure code returned from MDS. I 
>> will update the patch
>>>
>>> Thanks,
>>> Praveen
>>>
>>>
>>> On 18-May-17 9:14 AM, minh chau wrote:
 Hi Praveen,

 Do you have any idea why @is_avd_down was false that made amfnd to 
 send susi_resp at 12:37:20.453974?
 It should be true by the end of avnd_evt_mds_avd_dn_evh() at 
 12:37:16.741518, is it right?

 Thanks,
 Minh
 On 17/05/17 

Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-18 Thread praveen malviya
Hi Minh,

I had analysed the traces you attached.
Based on that I am able to test that. When MDS returns success patch 
works fine.
Minor correction is needed when MDS return failure.
I think susi message should be kept independent of no. of tries in 
avnd_diq_del().
Thanks
Praveen


On 18-May-17 12:41 PM, minh chau wrote:
> Hi Praveen,
> 
> Some comments in line with [Minh]
> 
> thanks,
> Minh
> 
> On 18/05/17 14:54, praveen malviya wrote:
>> Hi Minh,
>>
>> In the description of the ticket there is a log which is :
>> "
>> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 
>> 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 
>> 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
>> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO 
>> avnd_di_susi_resp_send() deferred as AMF director is offline
>> "
>> Last line in above log means AMFND was sending the message when it new 
>> about SC absence state. I think this issue is already fixed during 
>> #1725 and this published patch is not required. Why? After led set 
>> message amfnd will anyway send this message.
> [Minh] I have reproduced the problem and attached to ticket for your 
> reference.
> Some outlined logs:
> The step is stopping SC1, SC2.
> In SC2, amfd sent susi assignment req to amfnd-PL3
> May 18 16:32:03.633226 osafamfd [245:245:src/amf/amfd/sgproc.cc:2444] >> 
> avd_sg_su_si_mod_snd: 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> state 1
> 
> In PL3, amfnd completed this susi req, and sent susi resp successfully 
> but it did not reach to amfd-SC2
> May 18 16:32:03.641156 osafamfnd [186:186:src/amf/amfnd/su.cc:0373] >> 
> avnd_evt_avd_info_su_si_assign_evh: 
> 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
> May 18 16:32:03.641744 osafamfnd [186:186:src/amf/amfnd/di.cc:0866] >> 
> avnd_di_susi_resp_send: Sending Resp 
> su=safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon, 
> si=safSi=AmfDemoTwon,safApp=AmfDemoTwon, curr_state=1, prv_state=2
> 
> amfnd-PL3 is notified NCSMDS_DOWN, amfnd deleted all pending msg waiting 
> for ack
> May 18 16:32:05.568471 osafamfnd [186:186:src/amf/amfnd/di.cc:0629] >> 
> avnd_evt_mds_avd_dn_evh
> May 18 16:32:05.568492 osafamfnd [186:186:src/amf/amfnd/di.cc:0651] WA 
> AMF director unexpectedly crashed
> May 18 16:32:05.568495 osafamfnd [186:186:src/amf/amfnd/di.cc:0701] TR 
> Delete all pending messages to be sent to AMFD
> May 18 16:32:05.568498 osafamfnd [186:186:src/amf/amfnd/di.cc:1353] >> 
> avnd_diq_rec_del
> May 18 16:32:05.568503 osafamfnd [186:186:src/amf/amfnd/di.cc:1369] << 
> avnd_diq_rec_del
> 
> When SC restarts, amfd-SC1 thinks this assignment being in progress, so 
> it waits and waits forever
> May 18 16:32:28.954967 osafamfd [257:257:src/amf/amfd/su.cc:2588] >> 
> any_susi_fsm_in: SU:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> check_fsm:5
> May 18 16:32:28.954975 osafamfd [257:257:src/amf/amfd/su.cc:2593] TR 
> SUSI:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon,safSi=AmfDemoTwon,safApp=AmfDemoTwon',
>  
> fsm:'5'
> May 18 16:32:28.954982 osafamfd [257:257:src/amf/amfd/su.cc:2596] TR Found
> May 18 16:32:28.954989 osafamfd [257:257:src/amf/amfd/su.cc:2599] << 
> any_susi_fsm_in
> May 18 16:32:28.954996 osafamfd [257:257:src/amf/amfd/sg.cc:2340] << 
> any_assignment_in_progress
> 
> This problem is very close to the one you mentioned and fixed in #1725. 
> In #1725, amfnd surely knows amfd down, so amfnd buffers msg. In #2105, 
> amfnd sends msg out just before amfnd detects amfd being down.
>>
>> The logs that I have attached can be ignored. I was simulating the bug 
>> on different assumptions.
>>
>> One question regarding the patch:
>> If the goal is to fix the issue when the message is being sent and 
>> system has become SC-less. In this situation, then avnd_mds_send() 
>> will return, most probably,  a failure as MDS will not find the 
>> destination. In mds failure case,  rec->no_retries will not be 
>> incremented and will remain zero. Now AMFND will process down of SC 
>> and it will call avnd_diq_del(). In this function, since no_retries is 
>> zero for this message(first message),  the message will be deleted.
>>
> [Minh]: Thanks, it's good to handle failure code returned from MDS. I 
> will update the patch
>>
>> Thanks,
>> Praveen
>>
>>
>> On 18-May-17 9:14 AM, minh chau wrote:
>>> Hi Praveen,
>>>
>>> Do you have any idea why @is_avd_down was false that made amfnd to 
>>> send susi_resp at 12:37:20.453974?
>>> It should be true by the end of avnd_evt_mds_avd_dn_evh() at 
>>> 12:37:16.741518, is it right?
>>>
>>> Thanks,
>>> Minh
>>> On 17/05/17 21:31, minh chau wrote:
 Hi Praveen,

 Thanks for looking at the issue.
 Here is what I am observing

 amfnd-PL3 received NCSMDS_DOWN indicating no active amfd

 May 17 12:37:16.741308 osafamfnd 
 [8141:8141:src/amf/amfnd/di.cc:0629] >> avnd_evt_mds_avd_dn_evh
 May 17 12:37:16.741342 osafamfnd 
 [8141:8141:src/amf/amfnd/di.cc:0651] WA AMF director unexpectedly 

Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-18 Thread minh chau
Hi Praveen,

Some comments in line with [Minh]

thanks,
Minh

On 18/05/17 14:54, praveen malviya wrote:
> Hi Minh,
>
> In the description of the ticket there is a log which is :
> "
> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 
> 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 
> 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO 
> avnd_di_susi_resp_send() deferred as AMF director is offline
> "
> Last line in above log means AMFND was sending the message when it new 
> about SC absence state. I think this issue is already fixed during 
> #1725 and this published patch is not required. Why? After led set 
> message amfnd will anyway send this message.
[Minh] I have reproduced the problem and attached to ticket for your 
reference.
Some outlined logs:
The step is stopping SC1, SC2.
In SC2, amfd sent susi assignment req to amfnd-PL3
May 18 16:32:03.633226 osafamfd [245:245:src/amf/amfd/sgproc.cc:2444] >> 
avd_sg_su_si_mod_snd: 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
state 1

In PL3, amfnd completed this susi req, and sent susi resp successfully 
but it did not reach to amfd-SC2
May 18 16:32:03.641156 osafamfnd [186:186:src/amf/amfnd/su.cc:0373] >> 
avnd_evt_avd_info_su_si_assign_evh: 
'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
May 18 16:32:03.641744 osafamfnd [186:186:src/amf/amfnd/di.cc:0866] >> 
avnd_di_susi_resp_send: Sending Resp 
su=safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon, 
si=safSi=AmfDemoTwon,safApp=AmfDemoTwon, curr_state=1, prv_state=2

amfnd-PL3 is notified NCSMDS_DOWN, amfnd deleted all pending msg waiting 
for ack
May 18 16:32:05.568471 osafamfnd [186:186:src/amf/amfnd/di.cc:0629] >> 
avnd_evt_mds_avd_dn_evh
May 18 16:32:05.568492 osafamfnd [186:186:src/amf/amfnd/di.cc:0651] WA 
AMF director unexpectedly crashed
May 18 16:32:05.568495 osafamfnd [186:186:src/amf/amfnd/di.cc:0701] TR 
Delete all pending messages to be sent to AMFD
May 18 16:32:05.568498 osafamfnd [186:186:src/amf/amfnd/di.cc:1353] >> 
avnd_diq_rec_del
May 18 16:32:05.568503 osafamfnd [186:186:src/amf/amfnd/di.cc:1369] << 
avnd_diq_rec_del

When SC restarts, amfd-SC1 thinks this assignment being in progress, so 
it waits and waits forever
May 18 16:32:28.954967 osafamfd [257:257:src/amf/amfd/su.cc:2588] >> 
any_susi_fsm_in: SU:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
check_fsm:5
May 18 16:32:28.954975 osafamfd [257:257:src/amf/amfd/su.cc:2593] TR 
SUSI:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon,safSi=AmfDemoTwon,safApp=AmfDemoTwon',
 
fsm:'5'
May 18 16:32:28.954982 osafamfd [257:257:src/amf/amfd/su.cc:2596] TR Found
May 18 16:32:28.954989 osafamfd [257:257:src/amf/amfd/su.cc:2599] << 
any_susi_fsm_in
May 18 16:32:28.954996 osafamfd [257:257:src/amf/amfd/sg.cc:2340] << 
any_assignment_in_progress

This problem is very close to the one you mentioned and fixed in #1725. 
In #1725, amfnd surely knows amfd down, so amfnd buffers msg. In #2105, 
amfnd sends msg out just before amfnd detects amfd being down.
>
> The logs that I have attached can be ignored. I was simulating the bug 
> on different assumptions.
>
> One question regarding the patch:
> If the goal is to fix the issue when the message is being sent and 
> system has become SC-less. In this situation, then avnd_mds_send() 
> will return, most probably,  a failure as MDS will not find the 
> destination. In mds failure case,  rec->no_retries will not be 
> incremented and will remain zero. Now AMFND will process down of SC 
> and it will call avnd_diq_del(). In this function, since no_retries is 
> zero for this message(first message),  the message will be deleted.
>
[Minh]: Thanks, it's good to handle failure code returned from MDS. I 
will update the patch
>
> Thanks,
> Praveen
>
>
> On 18-May-17 9:14 AM, minh chau wrote:
>> Hi Praveen,
>>
>> Do you have any idea why @is_avd_down was false that made amfnd to 
>> send susi_resp at 12:37:20.453974?
>> It should be true by the end of avnd_evt_mds_avd_dn_evh() at 
>> 12:37:16.741518, is it right?
>>
>> Thanks,
>> Minh
>> On 17/05/17 21:31, minh chau wrote:
>>> Hi Praveen,
>>>
>>> Thanks for looking at the issue.
>>> Here is what I am observing
>>>
>>> amfnd-PL3 received NCSMDS_DOWN indicating no active amfd
>>>
>>> May 17 12:37:16.741308 osafamfnd 
>>> [8141:8141:src/amf/amfnd/di.cc:0629] >> avnd_evt_mds_avd_dn_evh
>>> May 17 12:37:16.741342 osafamfnd 
>>> [8141:8141:src/amf/amfnd/di.cc:0651] WA AMF director unexpectedly 
>>> crashed
>>> May 17 12:37:16.741354 osafamfnd 
>>> [8141:8141:src/amf/amfnd/di.cc:0701] TR Delete all pending messages 
>>> to be sent to AMFD
>>> May 17 12:37:16.741379 osafamfnd 
>>> [8141:8141:src/amf/amfnd/di.cc:0709] NO Checking 
>>> 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
>>> May 17 12:37:16.741405 osafamfnd 
>>> [8141:8141:src/amf/amfnd/di.cc:0709] NO Checking 
>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending messages
>>> May 17 12:37:16.741430 osafamfnd 
>>>

Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-17 Thread praveen malviya
Hi Minh,

In the description of the ticket there is a log which is :
"
Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 
'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 
'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO 
avnd_di_susi_resp_send() deferred as AMF director is offline
"
Last line in above log means AMFND was sending the message when it new 
about SC absence state. I think this issue is already fixed during #1725 
and this published patch is not required. Why? After led set message 
amfnd will anyway send this message.

The logs that I have attached can be ignored. I was simulating the bug 
on different assumptions.

One question regarding the patch:
If the goal is to fix the issue when the message is being sent and 
system has become SC-less. In this situation, then avnd_mds_send() will 
return, most probably,  a failure as MDS will not find the destination. 
In mds failure case,  rec->no_retries will not be incremented and will 
remain zero. Now AMFND will process down of SC and it will call 
avnd_diq_del(). In this function, since no_retries is zero for this 
message(first message),  the message will be deleted.


Thanks,
Praveen


On 18-May-17 9:14 AM, minh chau wrote:
> Hi Praveen,
> 
> Do you have any idea why @is_avd_down was false that made amfnd to send 
> susi_resp at 12:37:20.453974?
> It should be true by the end of avnd_evt_mds_avd_dn_evh() at 
> 12:37:16.741518, is it right?
> 
> Thanks,
> Minh
> On 17/05/17 21:31, minh chau wrote:
>> Hi Praveen,
>>
>> Thanks for looking at the issue.
>> Here is what I am observing
>>
>> amfnd-PL3 received NCSMDS_DOWN indicating no active amfd
>>
>> May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] 
>> >> avnd_evt_mds_avd_dn_evh
>> May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] 
>> WA AMF director unexpectedly crashed
>> May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701] 
>> TR Delete all pending messages to be sent to AMFD
>> May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
>> NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
>> May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
>> NO Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending 
>> messages
>> May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
>> NO Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending 
>> messages
>> May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083] 
>> TR SC absence timer started
>> May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742] 
>> << avnd_evt_mds_avd_dn_evh
>>
>> But a bit later, susi got assigned, and amfnd-PL3 did send this susi 
>> response (it should not send out and buffer it, since the @is_avd_down 
>> should be true)
>>
>> May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866] 
>> >> avnd_di_susi_resp_send: Sending Resp 
>> su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, 
>> si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1
>> ...
>> May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482] 
>> >> avnd_mds_send: Msg type '1'
>> May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537] 
>> ER ncsmds_api for 0 FAILED, dest=0
>>
>> When SC1 restarted, amfd received the very first messages from PL3 
>> starting with msg_id=1 (it should be starting from 0)
>>
>> May 17 12:37:28.398633 osafamfd 
>> [7686:7686:src/amf/amfd/ndproc.cc:0330] NO Receive message with event 
>> type:12, msg_type:31, from node:2030f, msg_id:1
>> May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334] 
>> NO Received node_up_msg from all nodes
>> May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254] 
>> NO Received node_up from 2030f: msg_id 2
>>
>> Looks to me something should not happen inside 
>> avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(), 
>> @is_avd_down should be true, the msg counter should be reset to 0, but 
>> I do see the SC absence timer started. I couldn't figure how it 
>> happened for now
>>
>> Thanks,
>> Minh
>>
>> On 17/05/17 20:03, praveen malviya wrote:
>>> What I see is avnd_diq_del() is called as soon as system becomes 
>>> headless. This will delete all pending messages. But when component 
>>> will respond during SCs absence a new message will be generated and 
>>> buffered.
>>> For node_up AMFD will ack the message, but amfnd calls 
>>> avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process().
>>> We need to call avnd_diq_del() in ack message so that msg_id gets 
>>> updated.
>>> Further looking into it..
>>>
>>>
>>> Thanks.
>>> Praveen
>>>
>>>
>>>
>>> On 17-May-17 1:50 PM, praveen malviya wrote:
 Hi Minh,

 While testing this, I am observing that amfd is dropping the assignment
 message because of message id mismatch:
 May 17 12:37:39.522117 osafamfd [7686:7686:src/

Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-17 Thread minh chau
Hi Praveen,

Do you have any idea why @is_avd_down was false that made amfnd to send 
susi_resp at 12:37:20.453974?
It should be true by the end of avnd_evt_mds_avd_dn_evh() at 
12:37:16.741518, is it right?

Thanks,
Minh
On 17/05/17 21:31, minh chau wrote:
> Hi Praveen,
>
> Thanks for looking at the issue.
> Here is what I am observing
>
> amfnd-PL3 received NCSMDS_DOWN indicating no active amfd
>
> May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] 
> >> avnd_evt_mds_avd_dn_evh
> May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] 
> WA AMF director unexpectedly crashed
> May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701] 
> TR Delete all pending messages to be sent to AMFD
> May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
> NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
> May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
> NO Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending 
> messages
> May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] 
> NO Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending 
> messages
> May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083] 
> TR SC absence timer started
> May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742] 
> << avnd_evt_mds_avd_dn_evh
>
> But a bit later, susi got assigned, and amfnd-PL3 did send this susi 
> response (it should not send out and buffer it, since the @is_avd_down 
> should be true)
>
> May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866] 
> >> avnd_di_susi_resp_send: Sending Resp 
> su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, 
> si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1
> ...
> May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482] 
> >> avnd_mds_send: Msg type '1'
> May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537] 
> ER ncsmds_api for 0 FAILED, dest=0
>
> When SC1 restarted, amfd received the very first messages from PL3 
> starting with msg_id=1 (it should be starting from 0)
>
> May 17 12:37:28.398633 osafamfd 
> [7686:7686:src/amf/amfd/ndproc.cc:0330] NO Receive message with event 
> type:12, msg_type:31, from node:2030f, msg_id:1
> May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334] 
> NO Received node_up_msg from all nodes
> May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254] 
> NO Received node_up from 2030f: msg_id 2
>
> Looks to me something should not happen inside 
> avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(), 
> @is_avd_down should be true, the msg counter should be reset to 0, but 
> I do see the SC absence timer started. I couldn't figure how it 
> happened for now
>
> Thanks,
> Minh
>
> On 17/05/17 20:03, praveen malviya wrote:
>> What I see is avnd_diq_del() is called as soon as system becomes 
>> headless. This will delete all pending messages. But when component 
>> will respond during SCs absence a new message will be generated and 
>> buffered.
>> For node_up AMFD will ack the message, but amfnd calls 
>> avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process().
>> We need to call avnd_diq_del() in ack message so that msg_id gets 
>> updated.
>> Further looking into it..
>>
>>
>> Thanks.
>> Praveen
>>
>>
>>
>> On 17-May-17 1:50 PM, praveen malviya wrote:
>>> Hi Minh,
>>>
>>> While testing this, I am observing that amfd is dropping the assignment
>>> message because of message id mismatch:
>>> May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171]
>>>   >> avd_su_si_assign_evh: id:1, node:2030f, act:5,
>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0
>>> 
>>> 
>>> May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075]
>>> WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f 
>>> should be 3
>>> May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777]
>>> << avd_su_si_assign_evh
>>>
>>> I am also looking into this. For your reference I had attached amfd and
>>> amfnd traces from SC-1 and PL-3 respectively in the ticket.
>>> I am working with one controller and one payload.
>>>
>>>
>>> Thanks
>>> Praveen
>>>
>>> On 15-May-17 1:06 PM, Minh Chau wrote:
 When amfnd-payload responds susi assignment response just before 
 both SC
 go down, and that response message does not come to director. 
 Therefore,
 the status of that assignment could be seen as "modifying" in IMM. 
 When
 SC comes back, active amfd will be waiting for that response forever.

 Patch checks if a susi assignment response is sent but not-ack just 
 before
 both SC come down, amfnd-payload will buffer it in a way as a susi get
 assigned during SC absence
 ---
src/amf/amfnd/di.cc | 53 
 +
1 file changed, 45 insertions

Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-17 Thread minh chau
Hi Praveen,

Thanks for looking at the issue.
Here is what I am observing

amfnd-PL3 received NCSMDS_DOWN indicating no active amfd

May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] >> 
avnd_evt_mds_avd_dn_evh
May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] WA 
AMF director unexpectedly crashed
May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701] TR 
Delete all pending messages to be sent to AMFD
May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO 
Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO 
Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending messages
May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO 
Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending messages
May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083] 
TR SC absence timer started
May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742] << 
avnd_evt_mds_avd_dn_evh

But a bit later, susi got assigned, and amfnd-PL3 did send this susi 
response (it should not send out and buffer it, since the @is_avd_down 
should be true)

May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866] >> 
avnd_di_susi_resp_send: Sending Resp 
su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, 
si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1
...
May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482] 
 >> avnd_mds_send: Msg type '1'
May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537] 
ER ncsmds_api for 0 FAILED, dest=0

When SC1 restarted, amfd received the very first messages from PL3 
starting with msg_id=1 (it should be starting from 0)

May 17 12:37:28.398633 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0330] 
NO Receive message with event type:12, msg_type:31, from node:2030f, 
msg_id:1
May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334] 
NO Received node_up_msg from all nodes
May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254] 
NO Received node_up from 2030f: msg_id 2

Looks to me something should not happen inside 
avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(), 
@is_avd_down should be true, the msg counter should be reset to 0, but I 
do see the SC absence timer started. I couldn't figure how it happened 
for now

Thanks,
Minh

On 17/05/17 20:03, praveen malviya wrote:
> What I see is avnd_diq_del() is called as soon as system becomes 
> headless. This will delete all pending messages. But when component 
> will respond during SCs absence a new message will be generated and 
> buffered.
> For node_up AMFD will ack the message, but amfnd calls 
> avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process().
> We need to call avnd_diq_del() in ack message so that msg_id gets 
> updated.
> Further looking into it..
>
>
> Thanks.
> Praveen
>
>
>
> On 17-May-17 1:50 PM, praveen malviya wrote:
>> Hi Minh,
>>
>> While testing this, I am observing that amfd is dropping the assignment
>> message because of message id mismatch:
>> May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171]
>>   >> avd_su_si_assign_evh: id:1, node:2030f, act:5,
>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0
>> 
>> 
>> May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075]
>> WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f 
>> should be 3
>> May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777]
>> << avd_su_si_assign_evh
>>
>> I am also looking into this. For your reference I had attached amfd and
>> amfnd traces from SC-1 and PL-3 respectively in the ticket.
>> I am working with one controller and one payload.
>>
>>
>> Thanks
>> Praveen
>>
>> On 15-May-17 1:06 PM, Minh Chau wrote:
>>> When amfnd-payload responds susi assignment response just before 
>>> both SC
>>> go down, and that response message does not come to director. 
>>> Therefore,
>>> the status of that assignment could be seen as "modifying" in IMM. When
>>> SC comes back, active amfd will be waiting for that response forever.
>>>
>>> Patch checks if a susi assignment response is sent but not-ack just 
>>> before
>>> both SC come down, amfnd-payload will buffer it in a way as a susi get
>>> assigned during SC absence
>>> ---
>>>src/amf/amfnd/di.cc | 53 
>>> +
>>>1 file changed, 45 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>>> index e06b9260d..3776a09dc 100644
>>> --- a/src/amf/amfnd/di.cc
>>> +++ b/src/amf/amfnd/di.cc
>>> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, 
>>> uint32_t mid) {
>>>  Notes : None.
>>> **/
>>>void avnd_diq_del(AVND_CB *cb) {
>>> - 

Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-17 Thread praveen malviya
What I see is avnd_diq_del() is called as soon as system becomes 
headless. This will delete all pending messages. But when component will 
respond during SCs absence a new message will be generated and buffered.
For node_up AMFD will ack the message, but amfnd calls 
avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process().
We need to call avnd_diq_del() in ack message so that msg_id gets updated.
Further looking into it..


Thanks.
Praveen



On 17-May-17 1:50 PM, praveen malviya wrote:
> Hi Minh,
> 
> While testing this, I am observing that amfd is dropping the assignment
> message because of message id mismatch:
> May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171]
>   >> avd_su_si_assign_evh: id:1, node:2030f, act:5,
> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0
> 
> 
> May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075]
> WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f should be 3
> May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777]
> << avd_su_si_assign_evh
> 
> I am also looking into this. For your reference I had attached amfd and
> amfnd traces from SC-1 and PL-3 respectively in the ticket.
> I am working with one controller and one payload.
> 
> 
> Thanks
> Praveen
> 
> On 15-May-17 1:06 PM, Minh Chau wrote:
>> When amfnd-payload responds susi assignment response just before both SC
>> go down, and that response message does not come to director. Therefore,
>> the status of that assignment could be seen as "modifying" in IMM. When
>> SC comes back, active amfd will be waiting for that response forever.
>>
>> Patch checks if a susi assignment response is sent but not-ack just before
>> both SC come down, amfnd-payload will buffer it in a way as a susi get
>> assigned during SC absence
>> ---
>>src/amf/amfnd/di.cc | 53 
>> +
>>1 file changed, 45 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>> index e06b9260d..3776a09dc 100644
>> --- a/src/amf/amfnd/di.cc
>> +++ b/src/amf/amfnd/di.cc
>> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t 
>> mid) {
>>  Notes : None.
>>
>> **/
>>void avnd_diq_del(AVND_CB *cb) {
>> -  AVND_DND_MSG_LIST *rec = 0;
>>
>> -  do {
>> -/* pop the record */
>> -m_AVND_DIQ_REC_POP(cb, rec);
>> -if (!rec) break;
>> +  if ((cb->dnd_list.head != nullptr)) {
>> +AVND_DND_MSG_LIST *rec = 0;
>> +bool found = true;
>> +while (found) {
>> +  found = false;
>> +  for (rec = cb->dnd_list.head; rec != nullptr;
>> +   rec = rec->next) {
>> +osafassert(rec->msg.type == AVND_MSG_AVD);
>> +// delete all pending messages that haven't been sent out
>> +if (rec->no_retries == 0) {
>> +  m_AVND_DIQ_REC_POP(cb, rec);
>> +  avnd_diq_rec_del(cb, rec);
>> +  break;
>> +} else {
>> +  // Assignment response had been sent, but not ack because last
>> +  // controller go down, reset msg_id and will be resent later
>> +  if (rec->msg.info.avd->msg_type == 
>> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) {
>> +if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) {
>> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0;
>> +  found = true;
>> +  LOG_NO(
>> +  "Found not-ack su_si_assign msg for SU:'%s', "
>> +  "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', "
>> +  "error:'%u', msg_id:'%u'",
>> +  osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
>> + .n2d_su_si_assign.su_name),
>> +  osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
>> + .n2d_su_si_assign.si_name),
>> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state,
>> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act,
>> +  rec->msg.info.avd->msg_info.n2d_su_si_assign
>> +  .single_csi,
>> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.error,
>> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id);
>> +}
>> +  } else {
>> +// delete other messages for now
>> +m_AVND_DIQ_REC_POP(cb, rec);
>> +avnd_diq_rec_del(cb, rec);
>> +break;
>> +  }
>> +}
>>
>> -/* delete the record */
>> -avnd_diq_rec_del(cb, rec);
>> -  } while (1);
>> +  }
>> +}
>> +  }
>>
>>  return;
>>}
>>
> 
> --
> Check out the vibrant tech community on one of the world's most
> eng

Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-17 Thread praveen malviya
Hi Minh,

While testing this, I am observing that amfd is dropping the assignment 
message because of message id mismatch:
May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171] 
 >> avd_su_si_assign_evh: id:1, node:2030f, act:5, 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0


May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075] 
WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f should be 3
May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777] 
<< avd_su_si_assign_evh

I am also looking into this. For your reference I had attached amfd and 
amfnd traces from SC-1 and PL-3 respectively in the ticket.
I am working with one controller and one payload.


Thanks
Praveen

On 15-May-17 1:06 PM, Minh Chau wrote:
> When amfnd-payload responds susi assignment response just before both SC
> go down, and that response message does not come to director. Therefore,
> the status of that assignment could be seen as "modifying" in IMM. When
> SC comes back, active amfd will be waiting for that response forever.
> 
> Patch checks if a susi assignment response is sent but not-ack just before
> both SC come down, amfnd-payload will buffer it in a way as a susi get
> assigned during SC absence
> ---
>   src/amf/amfnd/di.cc | 53 
> +
>   1 file changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
> index e06b9260d..3776a09dc 100644
> --- a/src/amf/amfnd/di.cc
> +++ b/src/amf/amfnd/di.cc
> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t 
> mid) {
> Notes : None.
>   
> **/
>   void avnd_diq_del(AVND_CB *cb) {
> -  AVND_DND_MSG_LIST *rec = 0;
>   
> -  do {
> -/* pop the record */
> -m_AVND_DIQ_REC_POP(cb, rec);
> -if (!rec) break;
> +  if ((cb->dnd_list.head != nullptr)) {
> +AVND_DND_MSG_LIST *rec = 0;
> +bool found = true;
> +while (found) {
> +  found = false;
> +  for (rec = cb->dnd_list.head; rec != nullptr;
> +   rec = rec->next) {
> +osafassert(rec->msg.type == AVND_MSG_AVD);
> +// delete all pending messages that haven't been sent out
> +if (rec->no_retries == 0) {
> +  m_AVND_DIQ_REC_POP(cb, rec);
> +  avnd_diq_rec_del(cb, rec);
> +  break;
> +} else {
> +  // Assignment response had been sent, but not ack because last
> +  // controller go down, reset msg_id and will be resent later
> +  if (rec->msg.info.avd->msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) 
> {
> +if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) {
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0;
> +  found = true;
> +  LOG_NO(
> +  "Found not-ack su_si_assign msg for SU:'%s', "
> +  "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', "
> +  "error:'%u', msg_id:'%u'",
> +  osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
> + .n2d_su_si_assign.su_name),
> +  osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
> + .n2d_su_si_assign.si_name),
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state,
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act,
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign
> +  .single_csi,
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.error,
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id);
> +}
> +  } else {
> +// delete other messages for now
> +m_AVND_DIQ_REC_POP(cb, rec);
> +avnd_diq_rec_del(cb, rec);
> +break;
> +  }
> +}
>   
> -/* delete the record */
> -avnd_diq_rec_del(cb, rec);
> -  } while (1);
> +  }
> +}
> +  }
>   
> return;
>   }
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-16 Thread Gary Lee
Hi Minh

Ack (review only)

On 15/5/17, 5:36 pm, "Minh Chau"  wrote:

When amfnd-payload responds susi assignment response just before both SC
go down, and that response message does not come to director. Therefore,
the status of that assignment could be seen as "modifying" in IMM. When
SC comes back, active amfd will be waiting for that response forever.

Patch checks if a susi assignment response is sent but not-ack just before
both SC come down, amfnd-payload will buffer it in a way as a susi get
assigned during SC absence
---
 src/amf/amfnd/di.cc | 53 
+
 1 file changed, 45 insertions(+), 8 deletions(-)

diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
index e06b9260d..3776a09dc 100644
--- a/src/amf/amfnd/di.cc
+++ b/src/amf/amfnd/di.cc
@@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t 
mid) {
   Notes : None.
 
**/
 void avnd_diq_del(AVND_CB *cb) {
-  AVND_DND_MSG_LIST *rec = 0;
 
-  do {
-/* pop the record */
-m_AVND_DIQ_REC_POP(cb, rec);
-if (!rec) break;
+  if ((cb->dnd_list.head != nullptr)) {
+AVND_DND_MSG_LIST *rec = 0;
+bool found = true;
+while (found) {
+  found = false;
+  for (rec = cb->dnd_list.head; rec != nullptr;
+   rec = rec->next) {
+osafassert(rec->msg.type == AVND_MSG_AVD);
+// delete all pending messages that haven't been sent out
+if (rec->no_retries == 0) {
+  m_AVND_DIQ_REC_POP(cb, rec);
+  avnd_diq_rec_del(cb, rec);
+  break;
+} else {
+  // Assignment response had been sent, but not ack because last
+  // controller go down, reset msg_id and will be resent later
+  if (rec->msg.info.avd->msg_type == 
AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) {
+if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) {
+  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0;
+  found = true;
+  LOG_NO(
+  "Found not-ack su_si_assign msg for SU:'%s', "
+  "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', "
+  "error:'%u', msg_id:'%u'",
+  osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
+ 
.n2d_su_si_assign.su_name),
+  osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
+ 
.n2d_su_si_assign.si_name),
+  rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state,
+  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act,
+  rec->msg.info.avd->msg_info.n2d_su_si_assign
+  .single_csi,
+  rec->msg.info.avd->msg_info.n2d_su_si_assign.error,
+  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id);
+}
+  } else {
+// delete other messages for now
+m_AVND_DIQ_REC_POP(cb, rec);
+avnd_diq_rec_del(cb, rec);
+break;
+  }
+}
 
-/* delete the record */
-avnd_diq_rec_del(cb, rec);
-  } while (1);
+  }
+}
+  }
 
   return;
 }
-- 
2.11.0





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]

2017-05-15 Thread praveen malviya
Hi Minh,

I am reviewing this patch.

Thanks,
Praveen

On 15-May-17 1:06 PM, Minh Chau wrote:
> When amfnd-payload responds susi assignment response just before both SC
> go down, and that response message does not come to director. Therefore,
> the status of that assignment could be seen as "modifying" in IMM. When
> SC comes back, active amfd will be waiting for that response forever.
> 
> Patch checks if a susi assignment response is sent but not-ack just before
> both SC come down, amfnd-payload will buffer it in a way as a susi get
> assigned during SC absence
> ---
>   src/amf/amfnd/di.cc | 53 
> +
>   1 file changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
> index e06b9260d..3776a09dc 100644
> --- a/src/amf/amfnd/di.cc
> +++ b/src/amf/amfnd/di.cc
> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t 
> mid) {
> Notes : None.
>   
> **/
>   void avnd_diq_del(AVND_CB *cb) {
> -  AVND_DND_MSG_LIST *rec = 0;
>   
> -  do {
> -/* pop the record */
> -m_AVND_DIQ_REC_POP(cb, rec);
> -if (!rec) break;
> +  if ((cb->dnd_list.head != nullptr)) {
> +AVND_DND_MSG_LIST *rec = 0;
> +bool found = true;
> +while (found) {
> +  found = false;
> +  for (rec = cb->dnd_list.head; rec != nullptr;
> +   rec = rec->next) {
> +osafassert(rec->msg.type == AVND_MSG_AVD);
> +// delete all pending messages that haven't been sent out
> +if (rec->no_retries == 0) {
> +  m_AVND_DIQ_REC_POP(cb, rec);
> +  avnd_diq_rec_del(cb, rec);
> +  break;
> +} else {
> +  // Assignment response had been sent, but not ack because last
> +  // controller go down, reset msg_id and will be resent later
> +  if (rec->msg.info.avd->msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) 
> {
> +if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) {
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0;
> +  found = true;
> +  LOG_NO(
> +  "Found not-ack su_si_assign msg for SU:'%s', "
> +  "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', "
> +  "error:'%u', msg_id:'%u'",
> +  osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
> + .n2d_su_si_assign.su_name),
> +  osaf_extended_name_borrow(&rec->msg.info.avd->msg_info
> + .n2d_su_si_assign.si_name),
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state,
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act,
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign
> +  .single_csi,
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.error,
> +  rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id);
> +}
> +  } else {
> +// delete other messages for now
> +m_AVND_DIQ_REC_POP(cb, rec);
> +avnd_diq_rec_del(cb, rec);
> +break;
> +  }
> +}
>   
> -/* delete the record */
> -avnd_diq_rec_del(cb, rec);
> -  } while (1);
> +  }
> +}
> +  }
>   
> return;
>   }
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel