Preliminary Analysis:
Before stopping payloads in step 3, assignments status shown ins step 2 is
like this:
SU1 standby for: SI1, Si2, SI3 and SI4,
SU2 active for : SI1, SI2, SI3 and SI4.
SU3 active for: SI1, SI2 and SI3.
In this way SG was in impropoer state before payloads were stopped. When
PL-4 was stopped AMFD failovers active SU2 of PL-4 to SU1 honoring SI deps.
After successful fail-over, before marking SG stable, AMFD checks status of
assignments and it must get for each same SU for each susi for a given HA
state. Since there was SU3 also active, it asserted becuase of mismatch of SUs:
sg_2n_fsm.cc:534: avd_sg_2n_act_susi: Assertion 'a_susi_1->su ==
a_susi_2->su' failed.
When payloads were stopped, SC-2 was active. Before that SC-1 was active, I am
analyzing SC-1 logs for what led to 2 active SUs in 2N model. That is the root
cause of the problem,
---
** [tickets:#1794] AMF : amfd crashed on both controllers, after opensafd is
stopped on appl hosted payloads **
**Status:** assigned
**Milestone:** 5.0.RC2
**Created:** Fri Apr 29, 2016 06:48 AM UTC by Srikanth R
**Last Updated:** Mon May 02, 2016 05:01 AM UTC
**Owner:** Praveen
**Attachments:**
-
[1794.tgz](https://sourceforge.net/p/opensaf/tickets/1794/attachment/1794.tgz)
(3.8 MB; application/x-compressed-tar)
Changeset : 7436 5.0.FC
Setup : 5 nodes cluster with 3 payloads.
Application : 2n red model , 3 SUs with 4 SIs ( si-si dep configured )
PL-3 is hosting SU1 and SU3 and PL-4 is hosting SU2.
Issue : AMFD on both controllers crashed , after opensafd is stopped on
application hosted payloads.
Steps performed :
-> After deploying application, lot of AMF related operations have been
performed.
-> After that, following is the opensafd status , where SU1 deployed on PL-3
is standby and SU2 deployed on PL-4 is active.
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed6,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=PL-5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed5,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
-> Now stopped opensafd on the payloads PL-5 and PL-4, one after another.
-> Amfd on the active controller crashed after opensafd is stopped on PL-4.
Apr 28 16:47:54 CONTROLLER-2 osafamfd[12188]: NO Node 'PL-4' left the cluster
Apr 28 16:47:54 CONTROLLER-2 osafamfd[12188]: sg_2n_fsm.cc:534:
avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed.
Apr 28 16:47:54 CONTROLLER-2 osafamfnd[12198]: WA AMF director unexpectedly
crashed
Note, this issue is not reproducible just by bringing up the application and
performing the above steps.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets