- Attachments has changed:

Diff:

~~~~

--- old
+++ new
@@ -0,0 +1 @@
+3039.diff (4.1 kB; text/x-patch)

~~~~

- **Comment**:

proposal



---

** [tickets:#3039] rded: improve self fencing response time**

**Status:** accepted
**Milestone:** 5.19.06
**Created:** Tue May 14, 2019 01:53 AM UTC by Gary Lee
**Last Updated:** Tue May 14, 2019 01:54 AM UTC
**Owner:** Gary Lee
**Attachments:**

- 
[3039.diff](https://sourceforge.net/p/opensaf/tickets/3039/attachment/3039.diff)
 (4.1 kB; text/x-patch)


Currently, when connectivity to the consensus service is lost, the plugin 
returns an error code and a callback is received by the main thread in rded. 
rded then checks if the peer SC is up if relaxed mode is enabled, before 
self-fencing.

The consensus service could be run over TCP, for example, and OpenSAF is using 
TIPC. We could end up in race conditions due to different timeouts in the 
network layer, and rded does not notice the node has been separated from the 
main cluster. The rest of the cluster end up restarting due to "TIPC flicker" 
detection in amfnd, before the separated node itself has to also reboot due to 
loss of both peer SC and consensus. A much better outcome is for the separated 
active controller to promptly self fence, so the rest of the cluster can remain 
operational.



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to