- Attachments has changed:
Diff:
~~~~
--- old
+++ new
@@ -0,0 +1 @@
+3039.diff (4.1 kB; text/x-patch)
~~~~
- **Comment**:
proposal
---
** [tickets:#3039] rded: improve self fencing response time**
**Status:** accepted
**Milestone:** 5.19.06
**Created:** Tue May 14, 2019 01:53 AM UTC by Gary Lee
**Last Updated:** Tue May 14, 2019 01:54 AM UTC
**Owner:** Gary Lee
**Attachments:**
-
[3039.diff](https://sourceforge.net/p/opensaf/tickets/3039/attachment/3039.diff)
(4.1 kB; text/x-patch)
Currently, when connectivity to the consensus service is lost, the plugin
returns an error code and a callback is received by the main thread in rded.
rded then checks if the peer SC is up if relaxed mode is enabled, before
self-fencing.
The consensus service could be run over TCP, for example, and OpenSAF is using
TIPC. We could end up in race conditions due to different timeouts in the
network layer, and rded does not notice the node has been separated from the
main cluster. The rest of the cluster end up restarting due to "TIPC flicker"
detection in amfnd, before the separated node itself has to also reboot due to
loss of both peer SC and consensus. A much better outcome is for the separated
active controller to promptly self fence, so the rest of the cluster can remain
operational.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets