Preliminary Analysis:
    Before stopping payloads in step 3, assignments status shown ins step 2 is 
like this:
    SU1 standby for: SI1, Si2, SI3 and SI4,
    SU2 active for : SI1, SI2, SI3 and SI4.
    SU3 active for: SI1, SI2 and SI3.
    In this way SG was in impropoer state before payloads were stopped. When 
PL-4 was stopped AMFD failovers active SU2 of PL-4 to SU1 honoring SI deps. 
After successful fail-over, before marking SG stable, AMFD checks status of 
assignments and it must get for each same SU for each susi for a given HA 
state. Since there was SU3 also active, it asserted becuase of mismatch of SUs:
     sg_2n_fsm.cc:534: avd_sg_2n_act_susi: Assertion 'a_susi_1->su == 
a_susi_2->su' failed.

When payloads were stopped, SC-2 was active. Before that SC-1 was active, I am 
analyzing SC-1 logs for what led to 2 active SUs in 2N model. That is the root 
cause of the problem,


---

** [tickets:#1794] AMF : amfd crashed on both controllers, after opensafd is 
stopped on appl hosted  payloads **

**Status:** assigned
**Milestone:** 5.0.RC2
**Created:** Fri Apr 29, 2016 06:48 AM UTC by Srikanth R
**Last Updated:** Mon May 02, 2016 05:01 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[1794.tgz](https://sourceforge.net/p/opensaf/tickets/1794/attachment/1794.tgz) 
(3.8 MB; application/x-compressed-tar)


Changeset : 7436 5.0.FC
Setup : 5 nodes cluster with 3 payloads.
Application : 2n red model , 3 SUs with 4 SIs ( si-si dep configured )
PL-3 is hosting SU1 and SU3 and PL-4 is hosting SU2.

Issue : AMFD on both controllers crashed , after opensafd is stopped on  
application hosted payloads.

Steps performed :

-> After deploying application, lot of AMF related operations have been 
performed.

-> After that,  following is the opensafd status , where SU1 deployed on PL-3 
is standby and SU2 deployed on PL-4 is active.

safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
        saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
        saAmfSISUHAState=STANDBY(2)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed6,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
        saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
        saAmfSISUHAState=STANDBY(2)
safSISU=safSu=PL-5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed5,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
        saAmfSISUHAState=ACTIVE(1)


-> Now stopped opensafd on the payloads PL-5 and PL-4, one after another.

-> Amfd on the active controller crashed after opensafd is stopped on PL-4.

Apr 28 16:47:54 CONTROLLER-2 osafamfd[12188]: NO Node 'PL-4' left the cluster
Apr 28 16:47:54 CONTROLLER-2 osafamfd[12188]: sg_2n_fsm.cc:534: 
avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed.
Apr 28 16:47:54 CONTROLLER-2 osafamfnd[12198]: WA AMF director unexpectedly 
crashed

Note, this issue is not reproducible just by bringing up the application and 
performing the above steps.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to