Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Praveen, I have sent out V2: - If we don't care no_retries in avnd_diq_del(), then I think checking mds failure code can be ignored - Add checking for su_oper msg, since amfd also waits for this message to orchestrate recovery Thanks, Minh On 18/05/17 19:38, praveen malviya wrote: > Hi Minh, > > I had analysed the traces you attached. > Based on that I am able to test that. When MDS returns success patch > works fine. > Minor correction is needed when MDS return failure. > I think susi message should be kept independent of no. of tries in > avnd_diq_del(). > Thanks > Praveen > > > On 18-May-17 12:41 PM, minh chau wrote: >> Hi Praveen, >> >> Some comments in line with [Minh] >> >> thanks, >> Minh >> >> On 18/05/17 14:54, praveen malviya wrote: >>> Hi Minh, >>> >>> In the description of the ticket there is a log which is : >>> " >>> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned >>> 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to >>> 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' >>> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO >>> avnd_di_susi_resp_send() deferred as AMF director is offline >>> " >>> Last line in above log means AMFND was sending the message when it >>> new about SC absence state. I think this issue is already fixed >>> during #1725 and this published patch is not required. Why? After >>> led set message amfnd will anyway send this message. >> [Minh] I have reproduced the problem and attached to ticket for your >> reference. >> Some outlined logs: >> The step is stopping SC1, SC2. >> In SC2, amfd sent susi assignment req to amfnd-PL3 >> May 18 16:32:03.633226 osafamfd [245:245:src/amf/amfd/sgproc.cc:2444] >> >> avd_sg_su_si_mod_snd: >> 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', state 1 >> >> In PL3, amfnd completed this susi req, and sent susi resp >> successfully but it did not reach to amfd-SC2 >> May 18 16:32:03.641156 osafamfnd [186:186:src/amf/amfnd/su.cc:0373] >> >> avnd_evt_avd_info_su_si_assign_evh: >> 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon' >> May 18 16:32:03.641744 osafamfnd [186:186:src/amf/amfnd/di.cc:0866] >> >> avnd_di_susi_resp_send: Sending Resp >> su=safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon, >> si=safSi=AmfDemoTwon,safApp=AmfDemoTwon, curr_state=1, prv_state=2 >> >> amfnd-PL3 is notified NCSMDS_DOWN, amfnd deleted all pending msg >> waiting for ack >> May 18 16:32:05.568471 osafamfnd [186:186:src/amf/amfnd/di.cc:0629] >> >> avnd_evt_mds_avd_dn_evh >> May 18 16:32:05.568492 osafamfnd [186:186:src/amf/amfnd/di.cc:0651] >> WA AMF director unexpectedly crashed >> May 18 16:32:05.568495 osafamfnd [186:186:src/amf/amfnd/di.cc:0701] >> TR Delete all pending messages to be sent to AMFD >> May 18 16:32:05.568498 osafamfnd [186:186:src/amf/amfnd/di.cc:1353] >> >> avnd_diq_rec_del >> May 18 16:32:05.568503 osafamfnd [186:186:src/amf/amfnd/di.cc:1369] >> << avnd_diq_rec_del >> >> When SC restarts, amfd-SC1 thinks this assignment being in progress, >> so it waits and waits forever >> May 18 16:32:28.954967 osafamfd [257:257:src/amf/amfd/su.cc:2588] >> >> any_susi_fsm_in: SU:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', >> check_fsm:5 >> May 18 16:32:28.954975 osafamfd [257:257:src/amf/amfd/su.cc:2593] TR >> SUSI:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon,safSi=AmfDemoTwon,safApp=AmfDemoTwon', >> >> fsm:'5' >> May 18 16:32:28.954982 osafamfd [257:257:src/amf/amfd/su.cc:2596] TR >> Found >> May 18 16:32:28.954989 osafamfd [257:257:src/amf/amfd/su.cc:2599] << >> any_susi_fsm_in >> May 18 16:32:28.954996 osafamfd [257:257:src/amf/amfd/sg.cc:2340] << >> any_assignment_in_progress >> >> This problem is very close to the one you mentioned and fixed in >> #1725. In #1725, amfnd surely knows amfd down, so amfnd buffers msg. >> In #2105, amfnd sends msg out just before amfnd detects amfd being down. >>> >>> The logs that I have attached can be ignored. I was simulating the >>> bug on different assumptions. >>> >>> One question regarding the patch: >>> If the goal is to fix the issue when the message is being sent and >>> system has become SC-less. In this situation, then avnd_mds_send() >>> will return, most probably, a failure as MDS will not find the >>> destination. In mds failure case, rec->no_retries will not be >>> incremented and will remain zero. Now AMFND will process down of SC >>> and it will call avnd_diq_del(). In this function, since no_retries >>> is zero for this message(first message), the message will be deleted. >>> >> [Minh]: Thanks, it's good to handle failure code returned from MDS. I >> will update the patch >>> >>> Thanks, >>> Praveen >>> >>> >>> On 18-May-17 9:14 AM, minh chau wrote: Hi Praveen, Do you have any idea why @is_avd_down was false that made amfnd to send susi_resp at 12:37:20.453974? It should be true by the end of avnd_evt_mds_avd_dn_evh() at 12:37:16.741518, is it right? Thanks, Minh On 17/05/17
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Minh, I had analysed the traces you attached. Based on that I am able to test that. When MDS returns success patch works fine. Minor correction is needed when MDS return failure. I think susi message should be kept independent of no. of tries in avnd_diq_del(). Thanks Praveen On 18-May-17 12:41 PM, minh chau wrote: > Hi Praveen, > > Some comments in line with [Minh] > > thanks, > Minh > > On 18/05/17 14:54, praveen malviya wrote: >> Hi Minh, >> >> In the description of the ticket there is a log which is : >> " >> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned >> 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to >> 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' >> Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO >> avnd_di_susi_resp_send() deferred as AMF director is offline >> " >> Last line in above log means AMFND was sending the message when it new >> about SC absence state. I think this issue is already fixed during >> #1725 and this published patch is not required. Why? After led set >> message amfnd will anyway send this message. > [Minh] I have reproduced the problem and attached to ticket for your > reference. > Some outlined logs: > The step is stopping SC1, SC2. > In SC2, amfd sent susi assignment req to amfnd-PL3 > May 18 16:32:03.633226 osafamfd [245:245:src/amf/amfd/sgproc.cc:2444] >> > avd_sg_su_si_mod_snd: 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', > state 1 > > In PL3, amfnd completed this susi req, and sent susi resp successfully > but it did not reach to amfd-SC2 > May 18 16:32:03.641156 osafamfnd [186:186:src/amf/amfnd/su.cc:0373] >> > avnd_evt_avd_info_su_si_assign_evh: > 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon' > May 18 16:32:03.641744 osafamfnd [186:186:src/amf/amfnd/di.cc:0866] >> > avnd_di_susi_resp_send: Sending Resp > su=safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon, > si=safSi=AmfDemoTwon,safApp=AmfDemoTwon, curr_state=1, prv_state=2 > > amfnd-PL3 is notified NCSMDS_DOWN, amfnd deleted all pending msg waiting > for ack > May 18 16:32:05.568471 osafamfnd [186:186:src/amf/amfnd/di.cc:0629] >> > avnd_evt_mds_avd_dn_evh > May 18 16:32:05.568492 osafamfnd [186:186:src/amf/amfnd/di.cc:0651] WA > AMF director unexpectedly crashed > May 18 16:32:05.568495 osafamfnd [186:186:src/amf/amfnd/di.cc:0701] TR > Delete all pending messages to be sent to AMFD > May 18 16:32:05.568498 osafamfnd [186:186:src/amf/amfnd/di.cc:1353] >> > avnd_diq_rec_del > May 18 16:32:05.568503 osafamfnd [186:186:src/amf/amfnd/di.cc:1369] << > avnd_diq_rec_del > > When SC restarts, amfd-SC1 thinks this assignment being in progress, so > it waits and waits forever > May 18 16:32:28.954967 osafamfd [257:257:src/amf/amfd/su.cc:2588] >> > any_susi_fsm_in: SU:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', > check_fsm:5 > May 18 16:32:28.954975 osafamfd [257:257:src/amf/amfd/su.cc:2593] TR > SUSI:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon,safSi=AmfDemoTwon,safApp=AmfDemoTwon', > > fsm:'5' > May 18 16:32:28.954982 osafamfd [257:257:src/amf/amfd/su.cc:2596] TR Found > May 18 16:32:28.954989 osafamfd [257:257:src/amf/amfd/su.cc:2599] << > any_susi_fsm_in > May 18 16:32:28.954996 osafamfd [257:257:src/amf/amfd/sg.cc:2340] << > any_assignment_in_progress > > This problem is very close to the one you mentioned and fixed in #1725. > In #1725, amfnd surely knows amfd down, so amfnd buffers msg. In #2105, > amfnd sends msg out just before amfnd detects amfd being down. >> >> The logs that I have attached can be ignored. I was simulating the bug >> on different assumptions. >> >> One question regarding the patch: >> If the goal is to fix the issue when the message is being sent and >> system has become SC-less. In this situation, then avnd_mds_send() >> will return, most probably, a failure as MDS will not find the >> destination. In mds failure case, rec->no_retries will not be >> incremented and will remain zero. Now AMFND will process down of SC >> and it will call avnd_diq_del(). In this function, since no_retries is >> zero for this message(first message), the message will be deleted. >> > [Minh]: Thanks, it's good to handle failure code returned from MDS. I > will update the patch >> >> Thanks, >> Praveen >> >> >> On 18-May-17 9:14 AM, minh chau wrote: >>> Hi Praveen, >>> >>> Do you have any idea why @is_avd_down was false that made amfnd to >>> send susi_resp at 12:37:20.453974? >>> It should be true by the end of avnd_evt_mds_avd_dn_evh() at >>> 12:37:16.741518, is it right? >>> >>> Thanks, >>> Minh >>> On 17/05/17 21:31, minh chau wrote: Hi Praveen, Thanks for looking at the issue. Here is what I am observing amfnd-PL3 received NCSMDS_DOWN indicating no active amfd May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] >> avnd_evt_mds_avd_dn_evh May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] WA AMF director unexpectedly
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Praveen, Some comments in line with [Minh] thanks, Minh On 18/05/17 14:54, praveen malviya wrote: > Hi Minh, > > In the description of the ticket there is a log which is : > " > Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned > 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to > 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' > Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO > avnd_di_susi_resp_send() deferred as AMF director is offline > " > Last line in above log means AMFND was sending the message when it new > about SC absence state. I think this issue is already fixed during > #1725 and this published patch is not required. Why? After led set > message amfnd will anyway send this message. [Minh] I have reproduced the problem and attached to ticket for your reference. Some outlined logs: The step is stopping SC1, SC2. In SC2, amfd sent susi assignment req to amfnd-PL3 May 18 16:32:03.633226 osafamfd [245:245:src/amf/amfd/sgproc.cc:2444] >> avd_sg_su_si_mod_snd: 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', state 1 In PL3, amfnd completed this susi req, and sent susi resp successfully but it did not reach to amfd-SC2 May 18 16:32:03.641156 osafamfnd [186:186:src/amf/amfnd/su.cc:0373] >> avnd_evt_avd_info_su_si_assign_evh: 'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon' May 18 16:32:03.641744 osafamfnd [186:186:src/amf/amfnd/di.cc:0866] >> avnd_di_susi_resp_send: Sending Resp su=safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon, si=safSi=AmfDemoTwon,safApp=AmfDemoTwon, curr_state=1, prv_state=2 amfnd-PL3 is notified NCSMDS_DOWN, amfnd deleted all pending msg waiting for ack May 18 16:32:05.568471 osafamfnd [186:186:src/amf/amfnd/di.cc:0629] >> avnd_evt_mds_avd_dn_evh May 18 16:32:05.568492 osafamfnd [186:186:src/amf/amfnd/di.cc:0651] WA AMF director unexpectedly crashed May 18 16:32:05.568495 osafamfnd [186:186:src/amf/amfnd/di.cc:0701] TR Delete all pending messages to be sent to AMFD May 18 16:32:05.568498 osafamfnd [186:186:src/amf/amfnd/di.cc:1353] >> avnd_diq_rec_del May 18 16:32:05.568503 osafamfnd [186:186:src/amf/amfnd/di.cc:1369] << avnd_diq_rec_del When SC restarts, amfd-SC1 thinks this assignment being in progress, so it waits and waits forever May 18 16:32:28.954967 osafamfd [257:257:src/amf/amfd/su.cc:2588] >> any_susi_fsm_in: SU:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon', check_fsm:5 May 18 16:32:28.954975 osafamfd [257:257:src/amf/amfd/su.cc:2593] TR SUSI:'safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon,safSi=AmfDemoTwon,safApp=AmfDemoTwon', fsm:'5' May 18 16:32:28.954982 osafamfd [257:257:src/amf/amfd/su.cc:2596] TR Found May 18 16:32:28.954989 osafamfd [257:257:src/amf/amfd/su.cc:2599] << any_susi_fsm_in May 18 16:32:28.954996 osafamfd [257:257:src/amf/amfd/sg.cc:2340] << any_assignment_in_progress This problem is very close to the one you mentioned and fixed in #1725. In #1725, amfnd surely knows amfd down, so amfnd buffers msg. In #2105, amfnd sends msg out just before amfnd detects amfd being down. > > The logs that I have attached can be ignored. I was simulating the bug > on different assumptions. > > One question regarding the patch: > If the goal is to fix the issue when the message is being sent and > system has become SC-less. In this situation, then avnd_mds_send() > will return, most probably, a failure as MDS will not find the > destination. In mds failure case, rec->no_retries will not be > incremented and will remain zero. Now AMFND will process down of SC > and it will call avnd_diq_del(). In this function, since no_retries is > zero for this message(first message), the message will be deleted. > [Minh]: Thanks, it's good to handle failure code returned from MDS. I will update the patch > > Thanks, > Praveen > > > On 18-May-17 9:14 AM, minh chau wrote: >> Hi Praveen, >> >> Do you have any idea why @is_avd_down was false that made amfnd to >> send susi_resp at 12:37:20.453974? >> It should be true by the end of avnd_evt_mds_avd_dn_evh() at >> 12:37:16.741518, is it right? >> >> Thanks, >> Minh >> On 17/05/17 21:31, minh chau wrote: >>> Hi Praveen, >>> >>> Thanks for looking at the issue. >>> Here is what I am observing >>> >>> amfnd-PL3 received NCSMDS_DOWN indicating no active amfd >>> >>> May 17 12:37:16.741308 osafamfnd >>> [8141:8141:src/amf/amfnd/di.cc:0629] >> avnd_evt_mds_avd_dn_evh >>> May 17 12:37:16.741342 osafamfnd >>> [8141:8141:src/amf/amfnd/di.cc:0651] WA AMF director unexpectedly >>> crashed >>> May 17 12:37:16.741354 osafamfnd >>> [8141:8141:src/amf/amfnd/di.cc:0701] TR Delete all pending messages >>> to be sent to AMFD >>> May 17 12:37:16.741379 osafamfnd >>> [8141:8141:src/amf/amfnd/di.cc:0709] NO Checking >>> 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages >>> May 17 12:37:16.741405 osafamfnd >>> [8141:8141:src/amf/amfnd/di.cc:0709] NO Checking >>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending messages >>> May 17 12:37:16.741430 osafamfnd >>>
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Minh, In the description of the ticket there is a log which is : " Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO avnd_di_susi_resp_send() deferred as AMF director is offline " Last line in above log means AMFND was sending the message when it new about SC absence state. I think this issue is already fixed during #1725 and this published patch is not required. Why? After led set message amfnd will anyway send this message. The logs that I have attached can be ignored. I was simulating the bug on different assumptions. One question regarding the patch: If the goal is to fix the issue when the message is being sent and system has become SC-less. In this situation, then avnd_mds_send() will return, most probably, a failure as MDS will not find the destination. In mds failure case, rec->no_retries will not be incremented and will remain zero. Now AMFND will process down of SC and it will call avnd_diq_del(). In this function, since no_retries is zero for this message(first message), the message will be deleted. Thanks, Praveen On 18-May-17 9:14 AM, minh chau wrote: > Hi Praveen, > > Do you have any idea why @is_avd_down was false that made amfnd to send > susi_resp at 12:37:20.453974? > It should be true by the end of avnd_evt_mds_avd_dn_evh() at > 12:37:16.741518, is it right? > > Thanks, > Minh > On 17/05/17 21:31, minh chau wrote: >> Hi Praveen, >> >> Thanks for looking at the issue. >> Here is what I am observing >> >> amfnd-PL3 received NCSMDS_DOWN indicating no active amfd >> >> May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] >> >> avnd_evt_mds_avd_dn_evh >> May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] >> WA AMF director unexpectedly crashed >> May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701] >> TR Delete all pending messages to be sent to AMFD >> May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] >> NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages >> May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] >> NO Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending >> messages >> May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] >> NO Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending >> messages >> May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083] >> TR SC absence timer started >> May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742] >> << avnd_evt_mds_avd_dn_evh >> >> But a bit later, susi got assigned, and amfnd-PL3 did send this susi >> response (it should not send out and buffer it, since the @is_avd_down >> should be true) >> >> May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866] >> >> avnd_di_susi_resp_send: Sending Resp >> su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, >> si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1 >> ... >> May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482] >> >> avnd_mds_send: Msg type '1' >> May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537] >> ER ncsmds_api for 0 FAILED, dest=0 >> >> When SC1 restarted, amfd received the very first messages from PL3 >> starting with msg_id=1 (it should be starting from 0) >> >> May 17 12:37:28.398633 osafamfd >> [7686:7686:src/amf/amfd/ndproc.cc:0330] NO Receive message with event >> type:12, msg_type:31, from node:2030f, msg_id:1 >> May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334] >> NO Received node_up_msg from all nodes >> May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254] >> NO Received node_up from 2030f: msg_id 2 >> >> Looks to me something should not happen inside >> avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(), >> @is_avd_down should be true, the msg counter should be reset to 0, but >> I do see the SC absence timer started. I couldn't figure how it >> happened for now >> >> Thanks, >> Minh >> >> On 17/05/17 20:03, praveen malviya wrote: >>> What I see is avnd_diq_del() is called as soon as system becomes >>> headless. This will delete all pending messages. But when component >>> will respond during SCs absence a new message will be generated and >>> buffered. >>> For node_up AMFD will ack the message, but amfnd calls >>> avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process(). >>> We need to call avnd_diq_del() in ack message so that msg_id gets >>> updated. >>> Further looking into it.. >>> >>> >>> Thanks. >>> Praveen >>> >>> >>> >>> On 17-May-17 1:50 PM, praveen malviya wrote: Hi Minh, While testing this, I am observing that amfd is dropping the assignment message because of message id mismatch: May 17 12:37:39.522117 osafamfd [7686:7686:src/
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Praveen, Do you have any idea why @is_avd_down was false that made amfnd to send susi_resp at 12:37:20.453974? It should be true by the end of avnd_evt_mds_avd_dn_evh() at 12:37:16.741518, is it right? Thanks, Minh On 17/05/17 21:31, minh chau wrote: > Hi Praveen, > > Thanks for looking at the issue. > Here is what I am observing > > amfnd-PL3 received NCSMDS_DOWN indicating no active amfd > > May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] > >> avnd_evt_mds_avd_dn_evh > May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] > WA AMF director unexpectedly crashed > May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701] > TR Delete all pending messages to be sent to AMFD > May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] > NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages > May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] > NO Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending > messages > May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] > NO Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending > messages > May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083] > TR SC absence timer started > May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742] > << avnd_evt_mds_avd_dn_evh > > But a bit later, susi got assigned, and amfnd-PL3 did send this susi > response (it should not send out and buffer it, since the @is_avd_down > should be true) > > May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866] > >> avnd_di_susi_resp_send: Sending Resp > su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, > si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1 > ... > May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482] > >> avnd_mds_send: Msg type '1' > May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537] > ER ncsmds_api for 0 FAILED, dest=0 > > When SC1 restarted, amfd received the very first messages from PL3 > starting with msg_id=1 (it should be starting from 0) > > May 17 12:37:28.398633 osafamfd > [7686:7686:src/amf/amfd/ndproc.cc:0330] NO Receive message with event > type:12, msg_type:31, from node:2030f, msg_id:1 > May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334] > NO Received node_up_msg from all nodes > May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254] > NO Received node_up from 2030f: msg_id 2 > > Looks to me something should not happen inside > avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(), > @is_avd_down should be true, the msg counter should be reset to 0, but > I do see the SC absence timer started. I couldn't figure how it > happened for now > > Thanks, > Minh > > On 17/05/17 20:03, praveen malviya wrote: >> What I see is avnd_diq_del() is called as soon as system becomes >> headless. This will delete all pending messages. But when component >> will respond during SCs absence a new message will be generated and >> buffered. >> For node_up AMFD will ack the message, but amfnd calls >> avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process(). >> We need to call avnd_diq_del() in ack message so that msg_id gets >> updated. >> Further looking into it.. >> >> >> Thanks. >> Praveen >> >> >> >> On 17-May-17 1:50 PM, praveen malviya wrote: >>> Hi Minh, >>> >>> While testing this, I am observing that amfd is dropping the assignment >>> message because of message id mismatch: >>> May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171] >>> >> avd_su_si_assign_evh: id:1, node:2030f, act:5, >>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0 >>> >>> >>> May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075] >>> WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f >>> should be 3 >>> May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777] >>> << avd_su_si_assign_evh >>> >>> I am also looking into this. For your reference I had attached amfd and >>> amfnd traces from SC-1 and PL-3 respectively in the ticket. >>> I am working with one controller and one payload. >>> >>> >>> Thanks >>> Praveen >>> >>> On 15-May-17 1:06 PM, Minh Chau wrote: When amfnd-payload responds susi assignment response just before both SC go down, and that response message does not come to director. Therefore, the status of that assignment could be seen as "modifying" in IMM. When SC comes back, active amfd will be waiting for that response forever. Patch checks if a susi assignment response is sent but not-ack just before both SC come down, amfnd-payload will buffer it in a way as a susi get assigned during SC absence --- src/amf/amfnd/di.cc | 53 + 1 file changed, 45 insertions
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Praveen, Thanks for looking at the issue. Here is what I am observing amfnd-PL3 received NCSMDS_DOWN indicating no active amfd May 17 12:37:16.741308 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0629] >> avnd_evt_mds_avd_dn_evh May 17 12:37:16.741342 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0651] WA AMF director unexpectedly crashed May 17 12:37:16.741354 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0701] TR Delete all pending messages to be sent to AMFD May 17 12:37:16.741379 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages May 17 12:37:16.741405 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO Checking 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' for pending messages May 17 12:37:16.741430 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0709] NO Checking 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' for pending messages May 17 12:37:16.741505 osafamfnd [8141:8141:src/amf/amfnd/tmr.cc:0083] TR SC absence timer started May 17 12:37:16.741518 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0742] << avnd_evt_mds_avd_dn_evh But a bit later, susi got assigned, and amfnd-PL3 did send this susi response (it should not send out and buffer it, since the @is_avd_down should be true) May 17 12:37:20.453974 osafamfnd [8141:8141:src/amf/amfnd/di.cc:0866] >> avnd_di_susi_resp_send: Sending Resp su=safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, si=safSi=AmfDemo,safApp=AmfDemo1, curr_state=3, prv_state=1 ... May 17 12:37:20.454083 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1482] >> avnd_mds_send: Msg type '1' May 17 12:37:20.454244 osafamfnd [8141:8141:src/amf/amfnd/mds.cc:1537] ER ncsmds_api for 0 FAILED, dest=0 When SC1 restarted, amfd received the very first messages from PL3 starting with msg_id=1 (it should be starting from 0) May 17 12:37:28.398633 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0330] NO Receive message with event type:12, msg_type:31, from node:2030f, msg_id:1 May 17 12:37:28.413018 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0334] NO Received node_up_msg from all nodes May 17 12:37:28.413069 osafamfd [7686:7686:src/amf/amfd/ndfsm.cc:0254] NO Received node_up from 2030f: msg_id 2 Looks to me something should not happen inside avnd_evt_mds_avd_dn_evh(). In this avnd_evt_mds_avd_dn_evh(), @is_avd_down should be true, the msg counter should be reset to 0, but I do see the SC absence timer started. I couldn't figure how it happened for now Thanks, Minh On 17/05/17 20:03, praveen malviya wrote: > What I see is avnd_diq_del() is called as soon as system becomes > headless. This will delete all pending messages. But when component > will respond during SCs absence a new message will be generated and > buffered. > For node_up AMFD will ack the message, but amfnd calls > avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process(). > We need to call avnd_diq_del() in ack message so that msg_id gets > updated. > Further looking into it.. > > > Thanks. > Praveen > > > > On 17-May-17 1:50 PM, praveen malviya wrote: >> Hi Minh, >> >> While testing this, I am observing that amfd is dropping the assignment >> message because of message id mismatch: >> May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171] >> >> avd_su_si_assign_evh: id:1, node:2030f, act:5, >> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0 >> >> >> May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075] >> WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f >> should be 3 >> May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777] >> << avd_su_si_assign_evh >> >> I am also looking into this. For your reference I had attached amfd and >> amfnd traces from SC-1 and PL-3 respectively in the ticket. >> I am working with one controller and one payload. >> >> >> Thanks >> Praveen >> >> On 15-May-17 1:06 PM, Minh Chau wrote: >>> When amfnd-payload responds susi assignment response just before >>> both SC >>> go down, and that response message does not come to director. >>> Therefore, >>> the status of that assignment could be seen as "modifying" in IMM. When >>> SC comes back, active amfd will be waiting for that response forever. >>> >>> Patch checks if a susi assignment response is sent but not-ack just >>> before >>> both SC come down, amfnd-payload will buffer it in a way as a susi get >>> assigned during SC absence >>> --- >>>src/amf/amfnd/di.cc | 53 >>> + >>>1 file changed, 45 insertions(+), 8 deletions(-) >>> >>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc >>> index e06b9260d..3776a09dc 100644 >>> --- a/src/amf/amfnd/di.cc >>> +++ b/src/amf/amfnd/di.cc >>> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, >>> uint32_t mid) { >>> Notes : None. >>> **/ >>>void avnd_diq_del(AVND_CB *cb) { >>> -
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
What I see is avnd_diq_del() is called as soon as system becomes headless. This will delete all pending messages. But when component will respond during SCs absence a new message will be generated and buffered. For node_up AMFD will ack the message, but amfnd calls avnd_diq_rec_del() (not avnd_diq_del()) in avnd_di_msg_ack_process(). We need to call avnd_diq_del() in ack message so that msg_id gets updated. Further looking into it.. Thanks. Praveen On 17-May-17 1:50 PM, praveen malviya wrote: > Hi Minh, > > While testing this, I am observing that amfd is dropping the assignment > message because of message id mismatch: > May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171] > >> avd_su_si_assign_evh: id:1, node:2030f, act:5, > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0 > > > May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075] > WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f should be 3 > May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777] > << avd_su_si_assign_evh > > I am also looking into this. For your reference I had attached amfd and > amfnd traces from SC-1 and PL-3 respectively in the ticket. > I am working with one controller and one payload. > > > Thanks > Praveen > > On 15-May-17 1:06 PM, Minh Chau wrote: >> When amfnd-payload responds susi assignment response just before both SC >> go down, and that response message does not come to director. Therefore, >> the status of that assignment could be seen as "modifying" in IMM. When >> SC comes back, active amfd will be waiting for that response forever. >> >> Patch checks if a susi assignment response is sent but not-ack just before >> both SC come down, amfnd-payload will buffer it in a way as a susi get >> assigned during SC absence >> --- >>src/amf/amfnd/di.cc | 53 >> + >>1 file changed, 45 insertions(+), 8 deletions(-) >> >> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc >> index e06b9260d..3776a09dc 100644 >> --- a/src/amf/amfnd/di.cc >> +++ b/src/amf/amfnd/di.cc >> @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t >> mid) { >> Notes : None. >> >> **/ >>void avnd_diq_del(AVND_CB *cb) { >> - AVND_DND_MSG_LIST *rec = 0; >> >> - do { >> -/* pop the record */ >> -m_AVND_DIQ_REC_POP(cb, rec); >> -if (!rec) break; >> + if ((cb->dnd_list.head != nullptr)) { >> +AVND_DND_MSG_LIST *rec = 0; >> +bool found = true; >> +while (found) { >> + found = false; >> + for (rec = cb->dnd_list.head; rec != nullptr; >> + rec = rec->next) { >> +osafassert(rec->msg.type == AVND_MSG_AVD); >> +// delete all pending messages that haven't been sent out >> +if (rec->no_retries == 0) { >> + m_AVND_DIQ_REC_POP(cb, rec); >> + avnd_diq_rec_del(cb, rec); >> + break; >> +} else { >> + // Assignment response had been sent, but not ack because last >> + // controller go down, reset msg_id and will be resent later >> + if (rec->msg.info.avd->msg_type == >> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) { >> +if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) { >> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0; >> + found = true; >> + LOG_NO( >> + "Found not-ack su_si_assign msg for SU:'%s', " >> + "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', " >> + "error:'%u', msg_id:'%u'", >> + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info >> + .n2d_su_si_assign.su_name), >> + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info >> + .n2d_su_si_assign.si_name), >> + rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >> + rec->msg.info.avd->msg_info.n2d_su_si_assign >> + .single_csi, >> + rec->msg.info.avd->msg_info.n2d_su_si_assign.error, >> + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id); >> +} >> + } else { >> +// delete other messages for now >> +m_AVND_DIQ_REC_POP(cb, rec); >> +avnd_diq_rec_del(cb, rec); >> +break; >> + } >> +} >> >> -/* delete the record */ >> -avnd_diq_rec_del(cb, rec); >> - } while (1); >> + } >> +} >> + } >> >> return; >>} >> > > -- > Check out the vibrant tech community on one of the world's most > eng
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Minh, While testing this, I am observing that amfd is dropping the assignment message because of message id mismatch: May 17 12:37:39.522117 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1171] >> avd_su_si_assign_evh: id:1, node:2030f, act:5, 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', '', ha:3, err:1, single:0 May 17 12:37:39.522404 osafamfd [7686:7686:src/amf/amfd/ndproc.cc:0075] WA avd_msg_sanity_chk: invalid msg id 1, msg type 5, from 2030f should be 3 May 17 12:37:39.522418 osafamfd [7686:7686:src/amf/amfd/sgproc.cc:1777] << avd_su_si_assign_evh I am also looking into this. For your reference I had attached amfd and amfnd traces from SC-1 and PL-3 respectively in the ticket. I am working with one controller and one payload. Thanks Praveen On 15-May-17 1:06 PM, Minh Chau wrote: > When amfnd-payload responds susi assignment response just before both SC > go down, and that response message does not come to director. Therefore, > the status of that assignment could be seen as "modifying" in IMM. When > SC comes back, active amfd will be waiting for that response forever. > > Patch checks if a susi assignment response is sent but not-ack just before > both SC come down, amfnd-payload will buffer it in a way as a susi get > assigned during SC absence > --- > src/amf/amfnd/di.cc | 53 > + > 1 file changed, 45 insertions(+), 8 deletions(-) > > diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc > index e06b9260d..3776a09dc 100644 > --- a/src/amf/amfnd/di.cc > +++ b/src/amf/amfnd/di.cc > @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t > mid) { > Notes : None. > > **/ > void avnd_diq_del(AVND_CB *cb) { > - AVND_DND_MSG_LIST *rec = 0; > > - do { > -/* pop the record */ > -m_AVND_DIQ_REC_POP(cb, rec); > -if (!rec) break; > + if ((cb->dnd_list.head != nullptr)) { > +AVND_DND_MSG_LIST *rec = 0; > +bool found = true; > +while (found) { > + found = false; > + for (rec = cb->dnd_list.head; rec != nullptr; > + rec = rec->next) { > +osafassert(rec->msg.type == AVND_MSG_AVD); > +// delete all pending messages that haven't been sent out > +if (rec->no_retries == 0) { > + m_AVND_DIQ_REC_POP(cb, rec); > + avnd_diq_rec_del(cb, rec); > + break; > +} else { > + // Assignment response had been sent, but not ack because last > + // controller go down, reset msg_id and will be resent later > + if (rec->msg.info.avd->msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) > { > +if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) { > + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0; > + found = true; > + LOG_NO( > + "Found not-ack su_si_assign msg for SU:'%s', " > + "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', " > + "error:'%u', msg_id:'%u'", > + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info > + .n2d_su_si_assign.su_name), > + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info > + .n2d_su_si_assign.si_name), > + rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state, > + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act, > + rec->msg.info.avd->msg_info.n2d_su_si_assign > + .single_csi, > + rec->msg.info.avd->msg_info.n2d_su_si_assign.error, > + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id); > +} > + } else { > +// delete other messages for now > +m_AVND_DIQ_REC_POP(cb, rec); > +avnd_diq_rec_del(cb, rec); > +break; > + } > +} > > -/* delete the record */ > -avnd_diq_rec_del(cb, rec); > - } while (1); > + } > +} > + } > > return; > } > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Minh Ack (review only) On 15/5/17, 5:36 pm, "Minh Chau" wrote: When amfnd-payload responds susi assignment response just before both SC go down, and that response message does not come to director. Therefore, the status of that assignment could be seen as "modifying" in IMM. When SC comes back, active amfd will be waiting for that response forever. Patch checks if a susi assignment response is sent but not-ack just before both SC come down, amfnd-payload will buffer it in a way as a susi get assigned during SC absence --- src/amf/amfnd/di.cc | 53 + 1 file changed, 45 insertions(+), 8 deletions(-) diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc index e06b9260d..3776a09dc 100644 --- a/src/amf/amfnd/di.cc +++ b/src/amf/amfnd/di.cc @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t mid) { Notes : None. **/ void avnd_diq_del(AVND_CB *cb) { - AVND_DND_MSG_LIST *rec = 0; - do { -/* pop the record */ -m_AVND_DIQ_REC_POP(cb, rec); -if (!rec) break; + if ((cb->dnd_list.head != nullptr)) { +AVND_DND_MSG_LIST *rec = 0; +bool found = true; +while (found) { + found = false; + for (rec = cb->dnd_list.head; rec != nullptr; + rec = rec->next) { +osafassert(rec->msg.type == AVND_MSG_AVD); +// delete all pending messages that haven't been sent out +if (rec->no_retries == 0) { + m_AVND_DIQ_REC_POP(cb, rec); + avnd_diq_rec_del(cb, rec); + break; +} else { + // Assignment response had been sent, but not ack because last + // controller go down, reset msg_id and will be resent later + if (rec->msg.info.avd->msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) { +if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) { + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0; + found = true; + LOG_NO( + "Found not-ack su_si_assign msg for SU:'%s', " + "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', " + "error:'%u', msg_id:'%u'", + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info + .n2d_su_si_assign.su_name), + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info + .n2d_su_si_assign.si_name), + rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state, + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act, + rec->msg.info.avd->msg_info.n2d_su_si_assign + .single_csi, + rec->msg.info.avd->msg_info.n2d_su_si_assign.error, + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id); +} + } else { +// delete other messages for now +m_AVND_DIQ_REC_POP(cb, rec); +avnd_diq_rec_del(cb, rec); +break; + } +} -/* delete the record */ -avnd_diq_rec_del(cb, rec); - } while (1); + } +} + } return; } -- 2.11.0 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfnd: Buffered not-ack susi assignment response after both SC go down [#2105]
Hi Minh, I am reviewing this patch. Thanks, Praveen On 15-May-17 1:06 PM, Minh Chau wrote: > When amfnd-payload responds susi assignment response just before both SC > go down, and that response message does not come to director. Therefore, > the status of that assignment could be seen as "modifying" in IMM. When > SC comes back, active amfd will be waiting for that response forever. > > Patch checks if a susi assignment response is sent but not-ack just before > both SC come down, amfnd-payload will buffer it in a way as a susi get > assigned during SC absence > --- > src/amf/amfnd/di.cc | 53 > + > 1 file changed, 45 insertions(+), 8 deletions(-) > > diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc > index e06b9260d..3776a09dc 100644 > --- a/src/amf/amfnd/di.cc > +++ b/src/amf/amfnd/di.cc > @@ -1282,16 +1282,53 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t > mid) { > Notes : None. > > **/ > void avnd_diq_del(AVND_CB *cb) { > - AVND_DND_MSG_LIST *rec = 0; > > - do { > -/* pop the record */ > -m_AVND_DIQ_REC_POP(cb, rec); > -if (!rec) break; > + if ((cb->dnd_list.head != nullptr)) { > +AVND_DND_MSG_LIST *rec = 0; > +bool found = true; > +while (found) { > + found = false; > + for (rec = cb->dnd_list.head; rec != nullptr; > + rec = rec->next) { > +osafassert(rec->msg.type == AVND_MSG_AVD); > +// delete all pending messages that haven't been sent out > +if (rec->no_retries == 0) { > + m_AVND_DIQ_REC_POP(cb, rec); > + avnd_diq_rec_del(cb, rec); > + break; > +} else { > + // Assignment response had been sent, but not ack because last > + // controller go down, reset msg_id and will be resent later > + if (rec->msg.info.avd->msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG) > { > +if (rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id != 0) { > + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0; > + found = true; > + LOG_NO( > + "Found not-ack su_si_assign msg for SU:'%s', " > + "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', " > + "error:'%u', msg_id:'%u'", > + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info > + .n2d_su_si_assign.su_name), > + osaf_extended_name_borrow(&rec->msg.info.avd->msg_info > + .n2d_su_si_assign.si_name), > + rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state, > + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act, > + rec->msg.info.avd->msg_info.n2d_su_si_assign > + .single_csi, > + rec->msg.info.avd->msg_info.n2d_su_si_assign.error, > + rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id); > +} > + } else { > +// delete other messages for now > +m_AVND_DIQ_REC_POP(cb, rec); > +avnd_diq_rec_del(cb, rec); > +break; > + } > +} > > -/* delete the record */ > -avnd_diq_rec_del(cb, rec); > - } while (1); > + } > +} > + } > > return; > } > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel