- **status**: assigned --> accepted


---

** [tickets:#1360] Quiesced controller failed to promote back to Active ( amfd 
asserted : ImplementerClear failed 5 )**

**Status:** accepted
**Milestone:** 4.5.2
**Created:** Thu Apr 30, 2015 07:24 AM UTC by Srikanth R
**Last Updated:** Mon May 04, 2015 11:14 AM UTC
**Owner:** Nagendra Kumar

Changeset : 6490

Issue : Quiesced controller failed to promote back to Active ( amfd asserted : 
ImplementerClear failed 5 )



-> Opensafd is brought up on 5 nodes in the cluster.

-> This issue is observed while verifying #707.

-> Invoked middleware si-swap operation ( SC-1 is the active and SC-2 is the 
standby)

-> Rebooted the old standby controller SC-2 using reboot -f command.

-> The quiesced controller SC-1 failed to promote back to active.

Apr 30 12:07:44 CONTROLLER-1 osafamfd[2214]: NO safSi=SC-2N,safApp=OpenSAF Swap 
initiated
Apr 30 12:07:44 CONTROLLER-1 osafamfnd[2224]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Apr 30 12:07:44 CONTROLLER-1 osafimmnd[2158]: NO Implementer locally 
disconnected. Marking it as doomed 312 <496, 2010f> (safSmfService)
Apr 30 12:07:44 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 307 
<328, 2010f> (safMsgGrpService)
.....
Apr 30 12:07:46 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 319 
(safCheckPointService) <0, 2020f>
Apr 30 12:07:46 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 320 
(safLckService) <0, 2020f>
Apr 30 12:07:46 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 321 
(safEvtService) <0, 2020f>
Apr 30 12:07:46 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 322 
(safClmService) <0, 2020f>
Apr 30 12:07:47 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 323 
(safSmfService) <0, 2020f>
Apr 30 12:07:47 CONTROLLER-1 osafamfnd[2224]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Apr 30 12:07:47 CONTROLLER-1 osafamfnd[2224]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Apr 30 12:07:47 CONTROLLER-1 osafntfimcnd[2876]: NO exiting on signal 15
Apr 30 12:07:47 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 313 
<503, 2010f> (@OpenSafImmReplicatorA)
Apr 30 12:07:47 CONTROLLER-1 osafamfd[2214]: NO Controller switch over initiated
Apr 30 12:07:47 CONTROLLER-1 osafamfd[2214]: NO ROLE SWITCH Active --> Quiesced
Apr 30 12:07:47 CONTROLLER-1 osafrded[2129]: NO RDE role set to QUIESCED
Apr 30 12:07:47 CONTROLLER-1 osafimmnd[2158]: NO Implementer (applier) 
connected: 324 (@OpenSafImmReplicatorA) <608, 2010f>
Apr 30 12:07:47 CONTROLLER-1 osafntfimcnd[3025]: NO Started
Apr 30 12:07:50 CONTROLLER-1 osaffmd[2138]: NO Node Down event for node id 
2020f:
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA Director Service in NOACTIVE 
state - fevs replies pending:1 fevs highest processed:10375
Apr 30 12:07:50 CONTROLLER-1 kernel: [  464.308117] TIPC: Resetting link 
<1.1.1:eth0-1.1.2:eth3>, peer not responding
Apr 30 12:07:50 CONTROLLER-1 kernel: [  464.308126] TIPC: Lost link 
<1.1.1:eth0-1.1.2:eth3> on network plane A
Apr 30 12:07:50 CONTROLLER-1 kernel: [  464.308132] TIPC: Lost contact with 
<1.1.2>
Apr 30 12:07:50 CONTROLLER-1 osaffmd[2138]: NO Current role: QUIESCED
Apr 30 12:07:50 CONTROLLER-1 osaffmd[2138]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: WA IMMND DOWN on active controller 
f2 detected at standby immd!! f1. Possible failover
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: WA IMMD lost contact with peer 
IMMD (NCSMDS_RED_DOWN)
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO Skipping re-send of fevs 
message 10374 since it has recently been resent.
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO Skipping re-send of fevs 
message 10375 since it has recently been resent.
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA DISCARD DUPLICATE FEVS 
message:10374
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA Error code 2 returned for 
message type 57 - ignoring
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA DISCARD DUPLICATE FEVS 
message:10375
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: WA Error code 2 returned for 
message type 57 - ignoring
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Global discard node received 
for nodeId:2020f pid:2642
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 316 
<0, 2020f(down)> (MsgQueueService131599)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 314 
<0, 2020f(down)> (@safAmfService2020f)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 318 
<0, 2020f(down)> (safLogService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 322 
<0, 2020f(down)> (safClmService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 317 
<0, 2020f(down)> (safMsgGrpService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 319 
<0, 2020f(down)> (safCheckPointService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 320 
<0, 2020f(down)> (safLckService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 321 
<0, 2020f(down)> (safEvtService)
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 323 
<0, 2020f(down)> (safSmfService)
Apr 30 12:07:50 CONTROLLER-1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Apr 30 12:07:50 CONTROLLER-1 osaffmd[2138]: NO Controller Failover: Setting 
role to ACTIVE
Apr 30 12:07:50 CONTROLLER-1 osafrded[2129]: NO RDE role set to ACTIVE
Apr 30 12:07:50 CONTROLLER-1 osafntfd[2181]: NO ACTIVE request
Apr 30 12:07:50 CONTROLLER-1 osaflogd[2168]: NO ACTIVE request
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO ACTIVE request
Apr 30 12:07:50 CONTROLLER-1 osafclmd[2195]: NO ACTIVE request
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO ellect_coord invoke from 
rda_callback ACTIVE
Apr 30 12:07:50 CONTROLLER-1 osafimmd[2148]: NO Coord re-elected, resides at 
2010f
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO This IMMND re-elected coord 
redundantly, failover ?
Apr 30 12:07:50 CONTROLLER-1 osaflogd[2168]: WA read_logsv_configuration(). All 
attributes could not be read
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 325 
(safLogService) <3, 2010f>
Apr 30 12:07:50 CONTROLLER-1 osafimmnd[2158]: NO Implementer connected: 326 
(safClmService) <5, 2010f>
Apr 30 12:07:58 CONTROLLER-1 osafamfd[2214]: ER FAILOVER Active --> Quiesced 
FAILED, ImplementerClear failed 5
Apr 30 12:07:58 CONTROLLER-1 osafamfd[2214]: role.cc:671: avd_mds_qsd_role_evh: 
Assertion '0' failed.
Apr 30 12:07:58 CONTROLLER-1 osafamfnd[2224]: ER AMF director unexpectedly 
crashed
Apr 30 12:07:58 CONTROLLER-1 osafamfnd[2224]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131343, SupervisionTime = 60
Apr 30 12:07:58 CONTROLLER-1 opensaf_reboot: Rebooting local node; timeout=60
Apr 30 12:07:58 CONTROLLER-1 osafimmnd[2158]: NO Implementer locally 
disconnected. Marking it as doomed 306 <10, 2010f> (safAmfService)
Apr 30 12:07:58 CONTROLLER-1 osafimmnd[2158]: NO Implementer disconnected 306 
<10, 2010f> (safAmfService)


Below is the backtrace :


#0  0x00007ff43242eb55 in raise () from /lib64/libc.so.6
#1  0x00007ff432430131 in abort () from /lib64/libc.so.6
#2  0x00007ff43421637a in __osafassert_fail () from 
/usr/lib64/libopensaf_core.so.0
#3  0x00000000004428ba in avd_mds_qsd_role_evh(cl_cb_tag*, avd_evt_tag*) ()
#4  0x00000000004333ce in process_event(cl_cb_tag*, avd_evt_tag*) () at 
main.cc:810
#5  0x000000000040786a in main () at main.cc:710



  Traces of AMF and syslog are attached. IMM traces are of huge sizes. If 
required, appropriate logs at that time shall be shared.





---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to