Attached are the AMFD and AMFND traces after reproducing the problem on 
changeset 7657 (before #1839 changeset). DWhen AMFND gets removal of 
assignments it is instantiating an uninstantiated component in 
avnd_err_su_repair() when su-failover escalation is going on and thus violates 
su-failover recovery principles.


Attachments:

- 
[1839.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b42140cb/4709/attachment/1839.xml)
 (10.7 kB; text/xml)
- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b42140cb/4709/attachment/osafamfd)
 (95.6 kB; application/octet-stream)
- 
[osafamfnd_before_1789](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b42140cb/4709/attachment/osafamfnd_before_1789)
 (94.1 kB; application/octet-stream)


---

** [tickets:#1863] amfnd: amfnd tries to repair su in su-failover recovery 
without AMFD request.**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Mon Jun 06, 2016 11:16 AM UTC by Praveen
**Last Updated:** Mon Jun 06, 2016 11:17 AM UTC
**Owner:** Praveen


AMFND calls avnd_err_su_repair() to repair the SU when su-failover recovery is 
going on.
This happens during su lock operation when a quiesced assigned comp faults with 
su-failover recovery. AMFND launches cleaup of components due to su-failover. 
In the meantime, AMFND gets removal of assignments and as a part of oper done 
it deletes SUSI and callsavnd_err_su_repair(). Inside this function AMFND tries 
to instantiate UNINSTANTIATED comps. In the reported case, however, no 
componnet is started as it is in TERMINATING state. But it resets SU_FAILOVER 
flag introduced in #1839. Since AMFND clears the flag, it loses the context of 
su-failover escalation. When first comp is cleaned up, AMFND instantiates it. 
Also AMFND does not inform AMFD about su-failover escalation and lock operation 
gets timed out.

However before the fix of #1839 also, for same case AMFND tries to call 
avnd_err_su_repair()  to repair SU. If a component is found in UNINSTANTIATED 
state then it can lead to instantiation. This can happen when AMFND gets 
removal of assignment after cleanup of atleast one comp is completed.

Steps to reproduce:
1) Set recovery policy as su-failover and bring up amf demo. Do not enable 
auto-repair
2)  Lock the active su and make sure that comp faults after responding for 
quiesced assignment.
3)  Component will get instantiated without repair admin op and lock operation 
will get timed out.

AMFND traces after fix of #1839:
Jun  6 14:42:22.731996 osafamfnd [9327:sidb.cc:0737] T1 SU-SI record deleted, 
SU= safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo,safApp=AmfDemo1
Jun  6 14:42:22.732012 osafamfnd [9327:sidb.cc:0785] << avnd_su_si_del: 1
Jun  6 14:42:22.732028 osafamfnd [9327:err.cc:1071] >> avnd_err_su_repair
Jun  6 14:42:22.732042 osafamfnd [9327:susm.cc:1408] TR 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' not terminated, 
pres.st=4
Jun  6 14:42:22.732056 osafamfnd [9327:clc.cc:0764] >> avnd_comp_clc_fsm_run: 
Comp 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', Ev '1'
Jun  6 14:42:22.732070 osafamfnd [9327:clc.cc:0854] T1 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':Entering CLC FSM: 
presence state:'SA_AMF_PRESENCE_TERMINATING(4)', 
Event:'AVND_COMP_CLC_PRES_FSM_EV_INST'
Jun  6 14:42:22.732084 osafamfnd [9327:clc.cc:0868] T1 Exited CLC FSM
Jun  6 14:42:22.732096 osafamfnd [9327:clc.cc:0870] T1 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':FSM Enter presence 
state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence 
state:SA_AMF_PRESENCE_TERMINATING(4)
Jun  6 14:42:22.732109 osafamfnd [9327:clc.cc:0889] << avnd_comp_clc_fsm_run: 1
Jun  6 14:42:22.732120 osafamfnd [9327:err.cc:1129] << avnd_err_su_repair: 
retval=1
Jun  6 14:42:22.732132 osafamfnd [9327:susm.cc:0255] >> avnd_su_siq_prc: SU 
'safSu=SU1,safSg=AmfDem

AMFND traces before fix of #1839:
Jun  6 16:16:18.947878 osafamfnd [31308:err.cc:1064] >> avnd_err_su_repair
Jun  6 16:16:18.947890 osafamfnd [31308:susm.cc:1408] TR 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' not terminated, 
pres.st=4
Jun  6 16:16:18.947903 osafamfnd [31308:clc.cc:0764] >> avnd_comp_clc_fsm_run: 
Comp 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', Ev '1'
Jun  6 16:16:18.947916 osafamfnd [31308:clc.cc:0854] T1 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':Entering CLC FSM: 
presence state:'SA_AMF_PRESENCE_TERMINATING(4)', 
Event:'AVND_COMP_CLC_PRES_FSM_EV_INST'
Jun  6 16:16:18.947929 osafamfnd [31308:clc.cc:0868] T1 Exited CLC FSM
Jun  6 16:16:18.947940 osafamfnd [31308:clc.cc:0870] T1 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':FSM Enter presence 
state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence 
state:SA_AMF_PRESENCE_TERMINATING(4)
Jun  6 16:16:18.947952 osafamfnd [31308:clc.cc:0889] << avnd_comp_clc_fsm_run: 1
Jun  6 16:16:18.947982 osafamfnd [31308:err.cc:1120] << avnd_err_su_repair: 
retval=1
Jun  6 16:16:18.948015 osafamfnd [31308:susm.cc:0255] >> avnd_su_siq_prc: SU 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Jun  6 16:16:18.948027 osafamfnd [31308:susm.cc:0260] << avnd_su_siq_prc



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to