- **status**: accepted --> fixed
- **Comment**:
commit 68fde36133a5fd47b667c6971c967a7cf8629b03
Author: Minh Chau <[email protected]>
Date: Wed May 26 21:05:12 2021 +1000
rde: Use broadcast for peer info message [#3263]
commit ca0cb78a03a2eb3cfa3519b4c5d9af0905f325a5
Author: Minh Chau <[email protected]>
Date: Wed May 26 21:05:12 2021 +1000
rde: Add timeout waiting for peer info [#3263]
---
** [tickets:#3263] rde: Cluster is unrecoverable after all nodes split-brain in
roaming SC**
**Status:** fixed
**Milestone:** 5.21.06
**Created:** Fri May 14, 2021 04:56 AM UTC by Minh Hon Chau
**Last Updated:** Fri May 14, 2021 04:58 AM UTC
**Owner:** Minh Hon Chau
In Roaming SC deployment, if split-brain occurs that separates all nodes apart,
in which each partition has one SC, we have all SCs becoming active. At rejoin,
all SCs detect themself as duplicated active to one of other SCs, they should
all reboot, ideally.
However, sometimes the last active SC is not detected as duplicated because all
the other SCs already reboot. The last SC does not find any others as active
duplicated to itself. As of this result, since the last SC is not healthy
throughout the split time, it's causing many errors for other nodes to rejoin
again after reboot.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets