Hi Guys,
I tried to reproduce the issue on latest Opensaf release but i did not succeed.
I have an set up of 2 controllers and 1 payload with headless feature enabled.
I also upload the 2N application with 3 SUs on 3 different nodes.
I performed the spilt brain scenario to check the issue.
In my case, after spilt brain, nodes got joined successfully and application
successfully started on controllers.
The reported issue is not observed in the latest release.
So, can I close this ticket?
Thanks
Mohan(www.GetHighAvailability.com)
---
** [tickets:#2074] amfd asserted on rebooted controllers continuoulsy after
split brain scenario (headless)**
**Status:** accepted
**Milestone:** future
**Created:** Tue Sep 27, 2016 12:14 PM UTC by Srikanth R
**Last Updated:** Mon Oct 10, 2022 12:06 PM UTC
**Owner:** Mohan Kanakam
Setup :
SLES 11 Physical machine
Changeset :7997 5.1 FC
2 controllers and 2 payloads with headless feature enabled.
2N application with 3 SUs. (AmfDemo).
Issue :
amfd asserted on controllers continuoulsy for every reboot after initial
split brain scenario is observed
Steps performed :
-> Initially brought up four nodes and all the nodes joined the cluster.
-> Brought up the 2N application, with SUs hosted on SC-1 ,SC-2 and PL-3
successfully.
-> Performed some operations on the AMF objects and the cluster is left in idle
state later.
-> After a gap of 2 weeks, MDS down event is generated on both the controllers
for which spilt brain scenario is generated. Because of momentary cable(s)
unplugging, MDS down event is generated.
Sep 24 21:36:40 SLES-SLOT1 osafimmd[2729]: NO MDS event from svc_id 25
(change:3, dest:565214187380752)
Sep 24 21:36:40 SLES-SLOT1 kernel: [1297950.833811] TIPC: Established link
<1.1.1:em1-1.1.2:em1> on network plane A
Sep 24 21:36:40 SLES-SLOT1 osafrded[2710]: Rebooting OpenSAF NodeId = 0 EE Name
= No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131343,
SupervisionTime = 60
Sep 26 00:00:01 SLES-SLOT2 osafrded[2715]: NO Got peer info request from node
0x2010f with role ACTIVE
Sep 26 00:00:01 SLES-SLOT2 osafrded[2715]: Rebooting OpenSAF NodeId = 0 EE Name
= No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131599,
SupervisionTime = 60
-> As headless feature is enabled, payloads did not go for reboot.
-> Once controllers joined the payloads, amfd asserted on the rebooted
controller and controllers went for reboot.
Sep 24 21:39:27 SLES-SLOT1 osafamfd[2772]: NO Received node_up from 2010f:
msg_id 1
Sep 24 21:39:27 SLES-SLOT1 osafamfd[2772]: siass.cc:953: avd_susi_recreate:
Assertion 'su' failed.
Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: WA AMF director unexpectedly crashed
Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: WA AMF director unexpectedly crashed
Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received,
OwnNodeId = 131343, SupervisionTime = 60
Below is the backtrace :
#0 0x00007f1d28510b55 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f1d28512131 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007f1d2a397197 in __osafassert_fail (__file=0x517c15 "siass.cc",
__line=953, __func=0x518250
<avd_susi_recreate(avsv_n2d_nd_sisu_state_msg_info_tag*)::__FUNCTION__>
"avd_susi_recreate",
__assertion=0x517d01 "su") at sysf_def.c:281
No locals.
#3 0x00000000004c56a5 in avd_susi_recreate (info=0x7f1d20008ec8) at
siass.cc:953
su = 0x0
__FUNCTION__ = "avd_susi_recreate"
susi = 0x0
node = 0x7bfdf0
susi_state = 0x0
su_state = 0x7f1d200055a0
__PRETTY_FUNCTION__ = "SaAisErrorT
avd_susi_recreate(AVSV_N2D_ND_SISU_STATE_MSG_INFO*)"
#4 0x0000000000459943 in avd_process_state_info_queue (cb=0x75cba0
<_control_block>) at ndfsm.cc:78
n2d_msg = 0x7f1d20008ec0
i = 0
queue_size = 4
queue_evt = 0x7a9b60
act_amfnd_node_up_count = 1
found_state_info = true
__FUNCTION__ = "avd_process_state_info_queue"
#5 0x000000000045a50f in avd_node_up_evh (cb=0x75cba0 <_control_block>,
evt=0x7f1d20008880) at ndfsm.cc:363
avnd = 0x7bf380
n2d_msg = 0x7f1d20004b30
rc = 1
sync_nd_size = 4
act_nd = true
__FUNCTION__ = "avd_node_up_evh"
#6 0x0000000000453d78 in process_event (cb_now=0x75cba0 <_control_block>,
evt=0x7f1d20008880) at main.cc:768
__FUNCTION__ = "process_event"
#7 0x0000000000453a9b in main_loop () at main.cc:689
pollretval = 1
cb = 0x75cba0 <_control_block>
evt = 0x7f1d20008880
mbx_fd = {raise_obj = 11, rmv_obj = 12}
error = SA_AIS_OK
polltmo = -1
term_fd = 17
__FUNCTION__ = "main_loop"
#8 0x0000000000454017 in main (argc=2, argv=0x7fff50cd9958) at main.cc:841
Suggested recovery :
During a split brain scenario, payloads should be ordered for reboot even in
headless feature.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets