- **status**: review --> fixed
- **Part**: - --> d
- **Comment**:
branch: opensaf-4.5.x
parent: 6590:d775d8fb7951
user: Nagendra Kumar<[email protected]>
date: Wed May 27 16:14:56 2015 +0530
summary: amfd: try again if ImplementerClear times out [#1360]
changeset: 6594:447615c80905
branch: opensaf-4.6.x
parent: 6591:05d5ba64ae8a
user: Nagendra Kumar<[email protected]>
date: Wed May 27 16:15:39 2015 +0530
summary: amfd: try again if ImplementerClear times out [#1360]
changeset: 6595:e163e5eebcb7
tag: tip
parent: 6592:17406d1e43d3
user: Nagendra Kumar<[email protected]>
date: Wed May 27 16:15:47 2015 +0530
summary: amfd: try again if ImplementerClear times out [#1360]
[staging:49e69a]
[staging:447615]
[staging:e163e5]
---
** [tickets:#1360] Quiesced controller failed to promote back to Active ( amfd
asserted : ImplementerClear failed 5 )**
**Status:** fixed
**Milestone:** 4.5.2
**Created:** Thu Apr 30, 2015 07:24 AM UTC by Srikanth R
**Last Updated:** Tue May 12, 2015 08:50 AM UTC
**Owner:** Nagendra Kumar
Changeset : 6490
Issue : Quiesced controller failed to promote back to Active ( amfd asserted :
ImplementerClear failed 5 )
-> Opensafd is brought up on 5 nodes in the cluster.
-> This issue is observed while verifying #707.
-> Invoked middleware si-swap operation ( SC-1 is the active and SC-2 is the
standby)
-> Rebooted the old standby controller SC-2 using reboot -f command.
-> The quiesced controller SC-1 failed to promote back to active.
Apr 30 12:07:44 CONTROLLER-1 osafamfd[2214]: NO safSi=SC-2N,safApp=OpenSAF Swap
initiated
Apr 30 12:07:44 CONTROLLER-1 osafamfnd[2224]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Apr 30 12:07:44 CONTROLLER-1 osafimmnd[2158]: NO Implementer locally
disconnected. Marking it as doomed 312 <496, 2010f> (safSmfService)
Apr 30 12:07:44 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 307
<328, 2010f> (safMsgGrpService)
.....
Apr 30 12:07:46 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 319
(safCheckPointService) <0, 2020f>
Apr 30 12:07:46 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 320
(safLckService) <0, 2020f>
Apr 30 12:07:46 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 321
(safEvtService) <0, 2020f>
Apr 30 12:07:46 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 322
(safClmService) <0, 2020f>
Apr 30 12:07:47 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 323
(safSmfService) <0, 2020f>
Apr 30 12:07:47 CONTROLLER-1 osafamfnd[2224]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Apr 30 12:07:47 CONTROLLER-1 osafamfnd[2224]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Apr 30 12:07:47 CONTROLLER-1 osafntfimcnd[2876]: NO exiting on signal 15
Apr 30 12:07:47 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 313
<503, 2010f> (@OpenSafImmReplicatorA)
Apr 30 12:07:47 CONTROLLER-1 osafamfd[2214]: NO Controller switch over initiated
Apr 30 12:07:47 CONTROLLER-1 osafamfd[2214]: NO ROLE SWITCH Active --> Quiesced
Apr 30 12:07:47 CONTROLLER-1 osafrded[2129]: NO RDE role set to QUIESCED
Apr 30 12:07:47 CONTROLLER-1 osafimmnd[2158]: NO Implementer (applier)
connected: 324 (@OpenSafImmReplicatorA) <608, 2010f>
Apr 30 12:07:47 CONTROLLER-1 osafntfimcnd[3025]: NO Started
Apr 30 12:07:50 CONTROLLER-1 osaffmd[2138]: NO Node Down event for node id
2020f:
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA Director Service in NOACTIVE
state - fevs replies pending:1 fevs highest processed:10375
Apr 30 12:07:50 CONTROLLER-1 kernel: [ 464.308117] TIPC: Resetting link
<1.1.1:eth0-1.1.2:eth3>, peer not responding
Apr 30 12:07:50 CONTROLLER-1 kernel: [ 464.308126] TIPC: Lost link
<1.1.1:eth0-1.1.2:eth3> on network plane A
Apr 30 12:07:50 CONTROLLER-1 kernel: [ 464.308132] TIPC: Lost contact with
<1.1.2>
Apr 30 12:07:50 CONTROLLER-1 osaffmd[2138]: NO Current role: QUIESCED
Apr 30 12:07:50 CONTROLLER-1 osaffmd[2138]: Rebooting OpenSAF NodeId = 131599
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343,
SupervisionTime = 60
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: WA IMMND DOWN on active controller
f2 detected at standby immd!! f1. Possible failover
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: WA IMMD lost contact with peer
IMMD (NCSMDS_RED_DOWN)
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO Skipping re-send of fevs
message 10374 since it has recently been resent.
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO Skipping re-send of fevs
message 10375 since it has recently been resent.
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA DISCARD DUPLICATE FEVS
message:10374
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA Error code 2 returned for
message type 57 - ignoring
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA DISCARD DUPLICATE FEVS
message:10375
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA Error code 2 returned for
message type 57 - ignoring
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Global discard node received
for nodeId:2020f pid:2642
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 316
<0, 2020f(down)> (MsgQueueService131599)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 314
<0, 2020f(down)> (@safAmfService2020f)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 318
<0, 2020f(down)> (safLogService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 322
<0, 2020f(down)> (safClmService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 317
<0, 2020f(down)> (safMsgGrpService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 319
<0, 2020f(down)> (safCheckPointService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 320
<0, 2020f(down)> (safLckService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 321
<0, 2020f(down)> (safEvtService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 323
<0, 2020f(down)> (safSmfService)
Apr 30 12:07:50 CONTROLLER-1 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Apr 30 12:07:50 CONTROLLER-1 osaffmd[2138]: NO Controller Failover: Setting
role to ACTIVE
Apr 30 12:07:50 CONTROLLER-1 osafrded[2129]: NO RDE role set to ACTIVE
Apr 30 12:07:50 CONTROLLER-1 osafntfd[2181]: NO ACTIVE request
Apr 30 12:07:50 CONTROLLER-1 osaflogd[2168]: NO ACTIVE request
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO ACTIVE request
Apr 30 12:07:50 CONTROLLER-1 osafclmd[2195]: NO ACTIVE request
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO ellect_coord invoke from
rda_callback ACTIVE
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO Coord re-elected, resides at
2010f
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO This IMMND re-elected coord
redundantly, failover ?
Apr 30 12:07:50 CONTROLLER-1 osaflogd[2168]: WA read_logsv_configuration(). All
attributes could not be read
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 325
(safLogService) <3, 2010f>
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 326
(safClmService) <5, 2010f>
Apr 30 12:07:58 CONTROLLER-1 osafamfd[2214]: ER FAILOVER Active --> Quiesced
FAILED, ImplementerClear failed 5
Apr 30 12:07:58 CONTROLLER-1 osafamfd[2214]: role.cc:671: avd_mds_qsd_role_evh:
Assertion '0' failed.
Apr 30 12:07:58 CONTROLLER-1 osafamfnd[2224]: ER AMF director unexpectedly
crashed
Apr 30 12:07:58 CONTROLLER-1 osafamfnd[2224]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received,
OwnNodeId = 131343, SupervisionTime = 60
Apr 30 12:07:58 CONTROLLER-1 opensaf_reboot: Rebooting local node; timeout=60
Apr 30 12:07:58 CONTROLLER-1 osafimmnd[2158]: NO Implementer locally
disconnected. Marking it as doomed 306 <10, 2010f> (safAmfService)
Apr 30 12:07:58 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 306
<10, 2010f> (safAmfService)
Below is the backtrace :
#0 0x00007ff43242eb55 in raise () from /lib64/libc.so.6
#1 0x00007ff432430131 in abort () from /lib64/libc.so.6
#2 0x00007ff43421637a in __osafassert_fail () from
/usr/lib64/libopensaf_core.so.0
#3 0x00000000004428ba in avd_mds_qsd_role_evh(cl_cb_tag*, avd_evt_tag*) ()
#4 0x00000000004333ce in process_event(cl_cb_tag*, avd_evt_tag*) () at
main.cc:810
#5 0x000000000040786a in main () at main.cc:710
Traces of AMF and syslog are attached. IMM traces are of huge sizes. If
required, appropriate logs at that time shall be shared.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets