- Description has changed:

Diff:

~~~~

--- old
+++ new
@@ -1,13 +1,13 @@
-After the active SC rebooted, the standby SC executed failover to active. The 
new active SC notified a PL left cluster but that PL was still in cluster. The 
reason is the connection between the standby SC and that PL was dropped in the 
past, but that PL still connected with the active SC. It led the standby SC 
considered that PL is down regardless the connection was established after 
that. The standby SC only removes a down PL when it receives a check point from 
the active SC. However, the active SC will not send that check point because it 
still connect with the PL. During failover, the standby SC will notify all 
recorded down nodes left cluster.
+After the active SC rebooted, the standby SC executed failover to active. The 
new active SC notified a PL left cluster but that PL was still in cluster. The 
reason is the connection between the standby SC and that PL was dropped in the 
past, but that PL still connected with the active SC. It led the standby SC 
considered that PL absented regardless the connection was established after 
that. The standby SC only change the PL  state when it receives a check point 
from the active SC. However, the active SC will not send that check point 
because it still connect with the PL. During failover, the standby SC will 
notify all recorded absent nodes left cluster.
 <pre>
-                                    down list:PL-3             down list:PL-3
-SC-1(Act)----SC-2(Stb)   SC-1(Act)----SC-2(Stb)            
SC-1(Act)----SC-2(Stb)
-    \        /               \                          \        /     
-     \      /                 \                          \      /
-       PL-3                     PL-3                       PL-3
+                                     absent nodes:PL-3           absent 
nodes:PL-3
+SC-1(Act)----SC-2(Stb)    SC-1(Act)----SC-2(Stb)       SC-1(Act)----SC-2(Stb)
+    \        /                \                            \        /
+     \      /                  \                            \      /
+       PL-3                      PL-3                         PL-3
 
-                     down list:PL-3,SC-1
-          SC-1(Down)   SC-2(Stb)            SC-1(Stb)----SC-2(Atc)
+                     absent nodes:PL-3,SC-1
+          SC-1(Down)   SC-2(Stb)            SC-1(Stb)----SC-2(Act)
                        /                        \        /
                       /                          \      /
                  PL-3                              PL-3

~~~~

- **Component**: clm --> amf
- **Part**: - --> d



---

** [tickets:#3309] clm: the payload node unexpectedly left cluster right after 
failover**

**Status:** accepted
**Milestone:** 5.22.04
**Created:** Thu Feb 24, 2022 03:57 AM UTC by Hieu Hong Hoang
**Last Updated:** Thu Feb 24, 2022 03:57 AM UTC
**Owner:** Hieu Hong Hoang


After the active SC rebooted, the standby SC executed failover to active. The 
new active SC notified a PL left cluster but that PL was still in cluster. The 
reason is the connection between the standby SC and that PL was dropped in the 
past, but that PL still connected with the active SC. It led the standby SC 
considered that PL absented regardless the connection was established after 
that. The standby SC only change the PL  state when it receives a check point 
from the active SC. However, the active SC will not send that check point 
because it still connect with the PL. During failover, the standby SC will 
notify all recorded absent nodes left cluster.
<pre>
                                     absent nodes:PL-3           absent 
nodes:PL-3
SC-1(Act)----SC-2(Stb)    SC-1(Act)----SC-2(Stb)       SC-1(Act)----SC-2(Stb)
    \        /                \                            \        /
     \      /                  \                            \      /
       PL-3                      PL-3                         PL-3

                     absent nodes:PL-3,SC-1
          SC-1(Down)   SC-2(Stb)            SC-1(Stb)----SC-2(Act)
                       /                        \        /
                      /                          \      /
                 PL-3                              PL-3
</pre>
Log analysis:

* SC-2 (standby SC) lost contact with PL-3
2022-02-23 09:03:24.114 SC-2 osafdtmd[320]: NO Lost contact with 'PL-3'

* SC-2 (standby SC) re-established contact with PL-3
2022-02-23 09:03:24.513 SC-2 osafdtmd[320]: NO Established contact with 'PL-3'

* SC-2 finished the failover: 
2022-02-23 09:03:25.582 SC-2 osafamfd[422]: NO FAILOVER StandBy --> Active DONE!

* SC-2 notified the PL-3 left the cluster: 
2022-02-23 09:03:25.679 SC-2 osafamfd[422]: NO Node 'PL-3' left the cluster

* State of nodes:
safAmfNode=PL-3,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=DISABLED(2)
safAmfNode=PL-4,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=ENABLED(1)
safAmfNode=PL-5,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=ENABLED(1)
safAmfNode=SC-1,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=ENABLED(1)
safAmfNode=SC-2,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=ENABLED(1)

Steps to reproduce:
1. Drop connection between the standby SC-2 and PL-3
2. Reconnect SC-2 with PL-3
3. Execute "immdump" inside a node. (immd in the standby SC-2 will remove the 
PL-3 from the list of detached nodes)
4. Reboot the active SC-1
5. Execute "amf-state node" inside a node


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to