- **Comment**:
commit b0dd5b39172e01d644a84804abda2b1ded6e81cc
Author: Gary Lee <[email protected]>
Date: Mon May 27 17:04:36 2019 +1000
rded: improve self-fencing response time [#3039]
When connectivity to consensus service is lost, it is recorded
in a state variable. When all RDE peers are lost, the node will
now self-fence immediately.
---
** [tickets:#3039] rded: improve self fencing response time**
**Status:** review
**Milestone:** 5.19.06
**Created:** Tue May 14, 2019 01:53 AM UTC by Gary Lee
**Last Updated:** Mon May 27, 2019 12:12 AM UTC
**Owner:** Gary Lee
**Attachments:**
-
[3039.diff](https://sourceforge.net/p/opensaf/tickets/3039/attachment/3039.diff)
(4.1 kB; text/x-patch)
Currently, when connectivity to the consensus service is lost, the plugin
returns an error code and a callback is received by the main thread in rded.
rded then checks if the peer SC is up if relaxed mode is enabled, before
self-fencing.
The consensus service could be run over TCP, for example, and OpenSAF is using
TIPC. We could end up in race conditions due to different timeouts in the
network layer, and rded does not notice the node has been separated from the
main cluster. The rest of the cluster end up restarting due to "TIPC flicker"
detection in amfnd, before the separated node itself has to also reboot due to
loss of both peer SC and consensus. A much better outcome is for the separated
active controller to promptly self fence, so the rest of the cluster can remain
operational.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets