- **status**: unassigned --> accepted
- **assigned_to**: Nagendra Kumar
- **Milestone**: future --> 4.4.2
---
** [tickets:#371] Standby controller went for reboot when opensaf on payload is
stopped during controller switchover**
**Status:** accepted
**Milestone:** 4.4.2
**Created:** Fri May 31, 2013 03:52 AM UTC by Nagendra Kumar
**Last Updated:** Fri May 31, 2013 03:52 AM UTC
**Owner:** Nagendra Kumar
Migrated from http://devel.opensaf.org/ticket/2318
Changeset: 3032.
Setup: SLES11 64BIT PC setups.
Scenario:
Switchover is triggered on the controller and opensafd is stopped on the
payload. Observed that standby controller went down. Also, amfd crash is
observed on SC-2.
This is happening randomly. (Observed twice for almost 10 retries).
Snippet of the /var/log/messages on standby controller:
=========================================================
Nov 16 19:10:48 linux-tf4k osafamfd[8227]: Controller switch over initiated
Nov 16 19:10:48 linux-tf4k osafimmnd[8173]: Implementer disconnected 39 <276,
20200> (safEvtService)
Nov 16 19:10:48 linux-tf4k osafimmnd[8173]: implementer for class
'SaSmfCampaign?' is released => class extent is UNSAFE
Nov 16 19:10:48 linux-tf4k osafimmnd[8173]: Implementer disconnected 35 <284,
20200> (safMsgGrpService)
Nov 16 19:10:48 linux-tf4k osafimmnd[8173]: Implementer disconnected 34 <1,
20200> (safLogService)
Nov 16 19:10:48 linux-tf4k osafimmnd[8173]: implementer for class
'OpenSafSmfConfig?' is released => class extent is UNSAFE
Nov 16 19:10:48 linux-tf4k osafimmnd[8173]: Implementer disconnected 33 <280,
20200> (safSmfService)
Nov 16 19:10:48 linux-tf4k osafimmnd[8173]: Implementer disconnected 42 <0,
20300> (MsgQueueService?131840)
Nov 16 19:10:48 linux-tf4k osaflckd[8319]: Event from unknown glnd: node_id
131840
Nov 16 19:10:48 linux-tf4k osafimmnd[8173]: Global discard node received for
nodeId:20300 pid:30275
Nov 16 19:10:48 linux-tf4k osafamfd[8227]:
safAmfNode=PL-3,safAmfCluster=myAmfCluster OperState? ENABLED => DISABLED
Nov 16 19:10:48 linux-tf4k osafamfd[8227]:
safSu=PL-3,safSg=NoRed?,safApp=OpenSAF OperState? ENABLED => DISABLED
Nov 16 19:10:49 linux-tf4k osafamfd[8227]:
safSu=PL-3,safSg=NoRed?,safApp=OpenSAF PresenceState? INSTANTIATED =>
UNINSTANTIATED
Nov 16 19:10:49 linux-tf4k osafamfd[8227]:
safSu=PL-3,safSg=NoRed?,safApp=OpenSAF ReadinessState? IN_SERVICE =>
OUT_OF_SERVICE
Nov 16 19:10:49 linux-tf4k osafimmnd[8173]: Implementer disconnected 36 <2,
20200> (safClmService)
Nov 16 19:10:49 linux-tf4k osafamfd[8227]: safSi=NoRed?3,safApp=OpenSAF
AssignmentState? FULLY_ASSIGNED => UNASSIGNED
Nov 16 19:10:49 linux-tf4k osafntfd[8198]: Failed to log an alarm or security
alarm notification (6)
Nov 16 19:10:49 linux-tf4k osafimmnd[8173]: Implementer disconnected 37 <277,
20200> (safLckService)
Nov 16 19:10:49 linux-tf4k osafimmnd[8173]: Implementer disconnected 38 <285,
20200> (safCheckPointService)
Nov 16 19:10:49 linux-tf4k osafimmnd[8173]: Director Service in NOACTIVE state
Nov 16 19:10:49 linux-tf4k osafimmd[8163]: Received IMMD service event
Nov 16 19:10:49 linux-tf4k osafimmd[8163]: Received IMMD service event
Nov 16 19:10:49 linux-tf4k osafdtmd[8130]: DTM:dtm_comm_socket_recv() failed rc
: 73
Nov 16 19:10:49 linux-tf4k osafamfd[8227]: Node not a member
Nov 16 19:10:49 linux-tf4k osafclmd[8208]: safNode=PL-3,safCluster=myClmCluster
LEFT, init view=9, cluster view=10
Nov 16 19:10:49 linux-tf4k osafclmd[8208]: clms_node_exit_ntf failed 2
Nov 16 19:10:49 linux-tf4k osafclmd[8208]: saImmOiRtObjectUpdate FAILED 9,
'safNode=PL-3,safCluster=myClmCluster'
Nov 16 19:10:49 linux-tf4k osafclmd[8208]: saImmOiRtObjectUpdate FAILED 9,
'safCluster=myClmCluster'
Nov 16 19:10:50 linux-tf4k osafamfd[8227]: safSi=SC-2N,safApp=OpenSAF
AssignmentState? UNASSIGNED => PARTIALLY_ASSIGNED
Nov 16 19:10:50 linux-tf4k osafamfd[8227]: sendStateChangeNotificationAvd:
saNtfNotificationSend Failed (6)
Nov 16 19:10:50 linux-tf4k osafamfd[8227]: sendAlarmNotificationAvd:
saNtfNotificationSend Failed (6)
Nov 16 19:10:50 linux-tf4k osafimmnd[8173]: Director Service Is NEWACTIVE state
Nov 16 19:10:50 linux-tf4k osafimmd[8163]: Received IMMD service event
Nov 16 19:10:50 linux-tf4k osafimmd[8163]: Received IMMD service event
Nov 16 19:10:50 linux-tf4k osafimmnd[8173]: Implementer connected: 43
(safSmfService) <0, 20100>
Nov 16 19:10:50 linux-tf4k osafimmnd[8173]: Implementer connected: 44
(safLogService) <0, 20100>
Nov 16 19:10:50 linux-tf4k osafimmnd[8173]: Implementer connected: 45
(safMsgGrpService) <0, 20100>
Nov 16 19:10:50 linux-tf4k osafmsgnd[8325]: Deferred mqa event list head NULL
Nov 16 19:10:50 linux-tf4k osafimmnd[8173]: Implementer connected: 46
(safCheckPointService) <0, 20100>
Nov 16 19:10:50 linux-tf4k osafimmnd[8173]: Implementer connected: 47
(safLckService) <0, 20100>
Nov 16 19:10:50 linux-tf4k osafimmnd[8173]: Implementer connected: 48
(safEvtService) <0, 20100>
Nov 16 19:10:50 linux-tf4k osafimmnd[8173]: Implementer connected: 49
(safClmService) <0, 20100>
Nov 16 19:10:50 linux-tf4k osafamfd[8227]: Node not a member
Nov 16 19:10:51 linux-tf4k osafamfd[8227]: safSi=SC-2N,safApp=OpenSAF Swap done
Nov 16 19:10:51 linux-tf4k osafamfd[8227]: ROLE SWITCH Active —> Quiesced
Nov 16 19:10:51 linux-tf4k osafrded[8145]: rde_rde_set_role: role set to 3
Nov 16 19:10:51 linux-tf4k osafamfnd[8238]: avnd_mds_set_vdest_role returned
failure, role:3
Nov 16 19:10:51 linux-tf4k osafamfnd[8238]: avnd_mds_set_vdest_role returned
failure, role:3
Nov 16 19:10:52 linux-tf4k osafimmnd[8173]: Implementer disconnected 41 <7,
20200> (safAmfService)
Nov 16 19:10:52 linux-tf4k osafimmnd[8173]: Implementer (applier) connected: 50
(@safAmfService20200) <7, 20200>
Nov 16 19:10:52 linux-tf4k osafimmnd[8173]: Implementer disconnected 40 <0,
20100> (@safAmfService20100)
Nov 16 19:10:52 linux-tf4k osafimmnd[8173]: Implementer connected: 51
(safAmfService) <0, 20100>
Nov 16 19:10:53 linux-tf4k osafamfd[8227]: Switching Quiesced —> StandBy?
Nov 16 19:10:53 linux-tf4k osafrded[8145]: rde_rde_set_role: role set to 2
Nov 16 19:10:53 linux-tf4k osafamfd[8227]: Controller switch over done
Nov 16 19:10:53 linux-tf4k osafamfd[8227]: avd_ckpt_dec.c:3117:
avsv_decode_warm_sync_rsp: Assertion '0' failed.
Nov 16 19:10:53 linux-tf4k osafamfnd[8238]: Rebooting OpenSAF NodeId? = 131584
EE Name = , Reason: AMF director unexpectedly crasched
Nov 16 19:10:54 linux-tf4k osafamfnd[8238]: Terminating all AMF components
Nov 16 19:10:54 linux-tf4k osafimmnd[8173]: Implementer disconnected 50 <7,
20200> (@safAmfService20200)
Nov 16 19:10:54 linux-tf4k osaflckd[8319]: Event from unknown glnd: node_id
131584
=========================================
Gdb Backtrace of the amfd crash is:
==================================
Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0xffffffff'.
Program terminated with signal 6, Aborted.
#0 0x00007f5219036645 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f5219036645 in raise () from /lib64/libc.so.6
#1 0x00007f5219037c33 in abort () from /lib64/libc.so.6
#2 0x00007f521a645ed5 in osafassert_fail (file=0x4947c3 "avd_ckpt_dec.c",
line=3117, func=0x495550 "avsv_decode_warm_sync_rsp",
assertion=0x4948a1 "0") at sysf_def.c:399
#3 0x0000000000412b88 in avsv_decode_warm_sync_rsp (cb=0x6b4b60,
dec=0x7fff22acba58) at avd_ckpt_dec.c:3117
#4 0x000000000040bcff in avsv_mbcsv_process_dec_cb (cb=0x6b4b60,
arg=0x7fff22acba40) at avd_chkop.c:465
#5 0x000000000040b5aa in avsv_mbcsv_cb (arg=0x7fff22acba40) at avd_chkop.c:169
#6 0x00007f521a65ebd9 in ncs_mbscv_rcv_decode (peer=0x6f3cd0, evt=0x730c40) at
mbcsv_act.c:393
#7 0x00007f521a65f553 in ncs_mbcsv_rcv_warm_sync_resp_cmplt (peer=0x6f3cd0,
evt=0x730c40) at mbcsv_act.c:711
#8 0x00007f521a666f35 in mbcsv_process_events (rcvd_evt=0x730c40,
mbcsv_hdl=4293918753) at mbcsv_pr_evts.c:168
#9 0x00007f521a66716b in mbcsv_hdl_dispatch_all (mbcsv_hdl=4293918753,
mbx=4288675841) at mbcsv_pr_evts.c:272
#10 0x00007f521a66086d in mbcsv_process_dispatch_request (arg=0x7fff22acbc20)
at mbcsv_api.c:423
#11 0x00007f521a6601d9 in ncs_mbcsv_svc (arg=0x7fff22acbc20) at mbcsv_api.c:162
#12 0x000000000040c557 in avsv_mbcsv_dispatch (cb=0x6b4b60, flag=2) at
avd_chkop.c:831
#13 0x000000000043a4fe in avd_main_proc () at avd_proc.c:515
#14 0x0000000000408bac in main (argc=2, argv=0x7fff22acbdb8) at amfd_main.c:47
================================================
Amfd, amfnd, immnd logs can be provided.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets