- **status**: review --> fixed
- **Comment**:
commit 2499e077f6a6ebbb381d3ad199dd570dfb82c8a4 (HEAD -> develop,
origin/develop)
Author: thuan.tran <[email protected]>
Date: Thu Jun 18 12:13:15 2020 +0700
imm: define macro for values of canBeCoord [#2936]
commit 581483399a873c0a0596c31e41afd5b5957d1a12
Author: thuan.tran <[email protected]>
Date: Thu Jun 18 12:10:17 2020 +0700
imm: reboot nodes used to be different partition with coord [#2936]
- immnd send re-introduce refresh=3 with ex-immd (active) node id.
- immd set very high priority for re-introduce msg of local immnd
and choose coord if re-introduce refresh=3 from local immnd.
- immd reply re-intro to reboot if ex-immd is not same as ex-immd
of selected coord.
- immd use new INTRO_RSP_2 to checkpoint ex-immd to standby.
- immnd use MDS_RED_SUBSCRIBE for immd to know active/standby immd
and help detect headless in multi partition clusters rejoin.
- immnd discard FEVS from unknown immd or during re-introduce to
avoid immnd OUT OF ORDER restart and lost ex-immd info.
- Update README.SC_ABSENCE for this new feature.
- Allow to configure disable/enable this new feature.
- immd standby will reboot if see two actives immd to avoid sync
with wrong active.
commit c2234e0cd1db88890e481ad4a93a6472e31289da
Author: thuan.tran <[email protected]>
Date: Fri Apr 17 13:09:23 2020 +0700
amf: enhance to work in roaming SC and headless [#2936]
- amfd reset msg id counter for node that ignore amfnd down
event to avoid nodes reboot once more due to mismatch msg id after
reboot up from reboot order for sending node_up after sync window.
- amfd active order reboot its standby if it detect another
active amfd (multi partition cluster rejoin). Two actives will be
handled by RDE detect split-brain.
- amfd standby should reboot itself if see two active peers to
avoid standby do cold-sync or be updated with wrong active.
Two actives will be handled by RDE detect split-brain.
- amfd just become standby (out of sync) but see active down
should reboot itself.
---
** [tickets:#2936] imm: Select one from multiple headless partitioned cluster
to join into one cluster**
**Status:** fixed
**Milestone:** future
**Created:** Thu Oct 04, 2018 08:47 AM UTC by Minh Hon Chau
**Last Updated:** Fri Jul 10, 2020 05:49 AM UTC
**Owner:** Thuan Tran
In the event of split network that separates nodes of cluster into multiple
paritions, each partition may have one or no SC. Network merges back, 2 SCs
will be self-fenced and rebooted as current OpenSAF behavior, leaves multiple
partitions as headless clusters. Also, before network merges back, if the SC in
each partition shutdown, which also leaves multiple partitions in headless.
Once a SC comes back, we have multiple headless clusters joining into a single
cluster. These headless clusters will be conflicted in term of IMM data and AMF
assignments.
In order to address this problem, this ticket introduces the partition
selection in IMM, in which IMM is responsible to select payloads from only one
partition to be alive among others, the others will be rebooted when all nodes
join into a single cluster.
To do that, IMMND will hold an aditional same cluster-wide information about
the node id at which its active IMMD locates and a unique id sent by the active
IMMD. These two information will help to distinguish which IMMND used to be on
the same partition or from other partitions with the coord. Particularly, when
an SC comes up from headless, one of these veteran IMMNDs will be elected to be
the coord, and any IMMND which has these global data different with the coord
will order its local node reboot at the time of receiving intro rsp from the
active IMMD.
For example:
Normal cluster: SC1, SC2, PL3, PL4, PL5, PL6, PL7, PL8
Split network first time:
P#1: PL3, PL4 (previously has SC1 as active SC, and unique id: 1111)
P#2: SC1, SC2, PL5, PL6, PL7, PL8
Split network second time:
P#1: PL3, PL4 (previously has SC1 as active SC, and unique id: 1111)
P#2: SC1, PL5, PL6 (has SC1 as active and the unique id: 2222 )
P#3: SC2, PL7 PL8 (has SC2 as active, and the unique id: 3333)
Network merge (both SC reboots), or shutdown both SCs. Then SC1 comes to active
role and elects IMMND on PL5 to be the coord. When IMMNDs on PL3, PL4, PL7, PL8
request to sync data, it will be rejected by the active IMMD and these nodes
will be rebooted afterward.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets