- **Milestone**: 4.6.2 --> 4.7.2
---
** [tickets:#1566] Cluster reset happened during switchover due to AMF director
heart beat timeout.**
**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Oct 24, 2015 06:25 AM UTC by Ritu Raj
**Last Updated:** Mon Nov 02, 2015 09:03 AM UTC
**Owner:** nobody
Changeset: 6901
70 nodes configured with PBE
Application: Nway configured on all the nodes
Issues Observed:
> Cluster reset happened during switchover due to AMF director heart beat
> timeout.
Steps Performed:
* AMF (Nway) application brought up on the nodes.
* Some operations are performed on Nway application hosted on PL-65 to PL-68.
* Stopped opensaf on the nodes PL-65 to PL-68.
* Two switchover performed on Cluster. First switchover succeded without any
issue. During second switchover old standby controller (SC-2) rebooted when
it is being promoted to ACTIVE state.
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmd[2505]: WA IMMD not re-electing coord
for switch-over (si-swap) coord at (2020f)
........
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmnd[2516]: NO Implementer (applier)
connected: 130 (@OpenSafImmReplicatorA) <10675, 2020f>
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: ER
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131599, SupervisionTime = 60
Oct 22 15:45:10 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
* After SC-2 went for reboot, SC-1 tried to become active during witch AMF
director heart beat timeout and cluster reset happened.
Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfd[2557]: NO
'safRankedSu=safSu=dummy_NWay_1Norm_4\,safSg=SG_dummy_n\,safApp=N_6,safSi=dummy_NWay_1Norm_6,safApp=N_6'
Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfnd[2567]: ER AMF director heart beat
timeout, generating core for amfd
Oct 22 15:54:54 SLES-64BIT-SLOT1 osafamfnd[2567]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343,
SupervisionTime = 60
Oct 22 15:54:54 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
timeout=60
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA MDS Send Failed
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA Error code 2 returned for
message type 16 - ignoring
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: NO Implementer locally
disconnected. Marking it as doomed 136 <4871, 2010f> (@safAmfService2010f)
* Traces are not availbale
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets