- **status**: accepted --> fixed
- **Comment**:

commit 68fde36133a5fd47b667c6971c967a7cf8629b03
Author: Minh Chau <[email protected]>
Date:   Wed May 26 21:05:12 2021 +1000

    rde: Use broadcast for peer info message [#3263]

commit ca0cb78a03a2eb3cfa3519b4c5d9af0905f325a5
Author: Minh Chau <[email protected]>
Date:   Wed May 26 21:05:12 2021 +1000

    rde: Add timeout waiting for peer info [#3263]





---

** [tickets:#3263] rde: Cluster is unrecoverable after all nodes split-brain in 
roaming SC**

**Status:** fixed
**Milestone:** 5.21.06
**Created:** Fri May 14, 2021 04:56 AM UTC by Minh Hon Chau
**Last Updated:** Fri May 14, 2021 04:58 AM UTC
**Owner:** Minh Hon Chau


In Roaming SC deployment, if split-brain occurs that separates all nodes apart, 
in which each partition has one SC, we have all SCs becoming active. At rejoin, 
all SCs detect themself as duplicated active to one of other SCs, they should 
all reboot, ideally.
However, sometimes the last active SC is not detected as duplicated because all 
the other SCs already reboot. The last SC does not find any others as active 
duplicated to itself. As of this result, since the last SC is not healthy 
throughout the split time, it's causing many errors for other nodes to rejoin 
again after reboot.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to