- **status**: review --> fixed
- **Comment**:

commit f7e9ed4cee2d95490a3d5c05676dc6c512d08b9a (HEAD -> develop, 
origin/develop, ticket-3309)
Author: thang.d.nguyen <[email protected]>
Date:   Fri Mar 4 14:57:19 2022 +0700

    amf: reboot to recovery PL in split-brain [#3309]

    The connection between the standby SC and that PL was dropped,
    but that PL still connected with the active SC. It led the
    standby SC considered that PL absented regardless the connection
    was established after that. During failover, the standby SC will
    notify all recorded absent nodes left cluster. It causes PL left
    cluster from AMF view but still connect to active.

    This scenario is a kind of split-brain use case and amfd should
    order PL reboot to recovery the issue.




---

** [tickets:#3309] amf: the payload node unexpectedly left cluster right after 
failover**

**Status:** fixed
**Milestone:** 5.22.04
**Created:** Thu Feb 24, 2022 03:57 AM UTC by Hieu Hong Hoang
**Last Updated:** Mon Mar 07, 2022 04:27 AM UTC
**Owner:** Thang Duc Nguyen


After the active SC rebooted, the standby SC executed failover to active. The 
new active SC notified a PL left cluster but that PL was still in cluster. The 
reason is the connection between the standby SC and that PL was dropped in the 
past, but that PL still connected with the active SC. It led the standby SC 
considered that PL absented regardless the connection was established after 
that. The standby SC only change the PL  state when it receives a check point 
from the active SC. However, the active SC will not send that check point 
because it still connect with the PL. During failover, the standby SC will 
notify all recorded absent nodes left cluster.
<pre>
                                     absent nodes:PL-3           absent 
nodes:PL-3
SC-1(Act)----SC-2(Stb)    SC-1(Act)----SC-2(Stb)       SC-1(Act)----SC-2(Stb)
    \        /                \                            \        /
     \      /                  \                            \      /
       PL-3                      PL-3                         PL-3

                     absent nodes:PL-3,SC-1
          SC-1(Down)   SC-2(Stb)            SC-1(Stb)----SC-2(Act)
                       /                        \        /
                      /                          \      /
                 PL-3                              PL-3
</pre>
Log analysis:

* SC-2 (standby SC) lost contact with PL-3
2022-02-23 09:03:24.114 SC-2 osafdtmd[320]: NO Lost contact with 'PL-3'

* SC-2 (standby SC) re-established contact with PL-3
2022-02-23 09:03:24.513 SC-2 osafdtmd[320]: NO Established contact with 'PL-3'

* SC-2 finished the failover: 
2022-02-23 09:03:25.582 SC-2 osafamfd[422]: NO FAILOVER StandBy --> Active DONE!

* SC-2 notified the PL-3 left the cluster: 
2022-02-23 09:03:25.679 SC-2 osafamfd[422]: NO Node 'PL-3' left the cluster

* State of nodes:
safAmfNode=PL-3,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=DISABLED(2)
safAmfNode=PL-4,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=ENABLED(1)
safAmfNode=PL-5,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=ENABLED(1)
safAmfNode=SC-1,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=ENABLED(1)
safAmfNode=SC-2,safAmfCluster=myAmfCluster
        saAmfNodeAdminState=UNLOCKED(1)
        saAmfNodeOperState=ENABLED(1)

Steps to reproduce:
1. Drop connection between the standby SC-2 and PL-3
2. Reconnect SC-2 with PL-3
3. Execute "immdump" inside a node. (immd in the standby SC-2 will remove the 
PL-3 from the list of detached nodes)
4. Reboot the active SC-1
5. Execute "amf-state node" inside a node


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to