- Description has changed:
Diff:
~~~~
--- old
+++ new
@@ -10,13 +10,28 @@
Before the split, PL-3 is active for a 2N SG. PL-4 is standby.
+~~~
+2018-08-30 19:06:53.913 PL-3 osafamfnd[204]: NO Assigning
'safSi=A,safApp=AmfDemo' ACTIVE to 'safSu=1,safSg=1,safApp=AmfDemo'
+2018-08-30 19:06:53.944 PL-3 osafamfnd[204]: NO Assigned
'safSi=A,safApp=AmfDemo' ACTIVE to 'safSu=1,safSg=1,safApp=AmfDemo'
+
+2018-08-30 19:06:54.094 PL-4 osafamfnd[204]: NO Assigning
'safSi=A,safApp=AmfDemo' STANDBY to 'safSu=2,safSg=1,safApp=AmfDemo'
+2018-08-30 19:06:54.128 PL-4 osafamfnd[204]: NO Assigned
'safSi=A,safApp=AmfDemo' STANDBY to 'safSu=2,safSg=1,safApp=AmfDemo'
+~~~
+
During the split, SC-2 may assign PL-4 to be active.
+
+2018-08-30 19:07:04.299 PL-4 osafamfnd[204]: NO Assigning
'safSi=A,safApp=AmfDemo' ACTIVE to 'safSu=2,safSg=1,safApp=AmfDemo'
After the network merges, SC-1 and SC-2 may both reboot after they detect
spilt brain.
-`2018-08-30 13:30:15.010 SC-1 osafrded[178]: Rebooting OpenSAF NodeId = 0 EE
Name = No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131343,
SupervisionTime = 60`
+~~~
+2018-08-30 19:07:05.003 SC-1 osafrded[178]: NO Got peer info response from
node 0x2020f with role ACTIVE
+2018-08-30 19:07:05.003 SC-1 osafrded[178]: Rebooting OpenSAF NodeId = 0 EE
Name = No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131343,
SupervisionTime = 60
-`2018-08-30 13:30:15.008 SC-2 osafrded[178]: Rebooting OpenSAF NodeId = 0 EE
Name = No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131599,
SupervisionTime = 60`
+2018-08-30 19:07:04.999 SC-2 osafrded[180]: NO Got peer info response from
node 0x2010f with role ACTIVE
+2018-08-30 19:07:05.001 SC-2 osafrded[180]: Rebooting OpenSAF NodeId = 0 EE
Name = No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131599,
SupervisionTime = 60
+~~~
+
Then PL-3 and PL-4 will sync these duplicated active assignments to AMFD, and
cause an assertion in AMFD.
~~~~
---
** [tickets:#2920] amfd: cyclic SC reboot after split network**
**Status:** unassigned
**Milestone:** 5.18.08
**Created:** Thu Aug 30, 2018 03:33 AM UTC by Gary Lee
**Last Updated:** Thu Aug 30, 2018 09:33 AM UTC
**Owner:** nobody
After a split network event, both SCs can reboot endlessly, due to this
assertion:
2018-08-29 18:05:34.689 SC-2 osafamfd[263]: src/amf/amfd/sg_2n_fsm.cc:596:
avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed.
2018-08-29 18:05:34.695 SC-2 osafamfnd[273]: ER AMFD has unexpectedly crashed.
Rebooting node
To reproduce, enable SC absence, and split a network into two partitions.
Partition 1 contains SC-1, PL-3
Partition 2 contains SC-2, PL-4,PL-5
Before the split, PL-3 is active for a 2N SG. PL-4 is standby.
~~~
2018-08-30 19:06:53.913 PL-3 osafamfnd[204]: NO Assigning
'safSi=A,safApp=AmfDemo' ACTIVE to 'safSu=1,safSg=1,safApp=AmfDemo'
2018-08-30 19:06:53.944 PL-3 osafamfnd[204]: NO Assigned
'safSi=A,safApp=AmfDemo' ACTIVE to 'safSu=1,safSg=1,safApp=AmfDemo'
2018-08-30 19:06:54.094 PL-4 osafamfnd[204]: NO Assigning
'safSi=A,safApp=AmfDemo' STANDBY to 'safSu=2,safSg=1,safApp=AmfDemo'
2018-08-30 19:06:54.128 PL-4 osafamfnd[204]: NO Assigned
'safSi=A,safApp=AmfDemo' STANDBY to 'safSu=2,safSg=1,safApp=AmfDemo'
~~~
During the split, SC-2 may assign PL-4 to be active.
2018-08-30 19:07:04.299 PL-4 osafamfnd[204]: NO Assigning
'safSi=A,safApp=AmfDemo' ACTIVE to 'safSu=2,safSg=1,safApp=AmfDemo'
After the network merges, SC-1 and SC-2 may both reboot after they detect spilt
brain.
~~~
2018-08-30 19:07:05.003 SC-1 osafrded[178]: NO Got peer info response from node
0x2020f with role ACTIVE
2018-08-30 19:07:05.003 SC-1 osafrded[178]: Rebooting OpenSAF NodeId = 0 EE
Name = No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131343,
SupervisionTime = 60
2018-08-30 19:07:04.999 SC-2 osafrded[180]: NO Got peer info response from node
0x2010f with role ACTIVE
2018-08-30 19:07:05.001 SC-2 osafrded[180]: Rebooting OpenSAF NodeId = 0 EE
Name = No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131599,
SupervisionTime = 60
~~~
Then PL-3 and PL-4 will sync these duplicated active assignments to AMFD, and
cause an assertion in AMFD.
~~~
2018-08-30 19:08:43.974 SC-1 osafamfd[267]: NO Perform absent failover for
failed SU:safSu=1,safSg=1,safApp=AmfDemo
2018-08-30 19:08:43.975 SC-1 osafamfd[267]: src/amf/amfd/sg_2n_fsm.cc:596:
avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed.
2018-08-30 19:08:43.981 SC-1 osafamfnd[282]: ER AMFD has unexpectedly crashed.
Rebooting node
2018-08-30 19:08:43.982 SC-1 osafamfnd[282]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId =
131343, SupervisionTime = 60
~~~
The user must then manually recover the cluster by doing a cluster reboot, or
rebooting one of PL-3 / PL-4.
[#2918] addresses issues such as this, but for now, we can aid recovery of the
cluster by rebooting one or both of the PLs in place of the assertion.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets