- Description has changed:

Diff:

~~~~

--- old
+++ new
@@ -1,13 +1,13 @@
 Tickets [#64] and [#2795] added support to prevent multiple active controllers 
in a split network scenario. However, nodes residing in the smaller network 
partitions can remain running.  Meanwhile the active SC residing in the largest 
partition may failover assignments at the unreachable nodes to other reachable 
nodes,  causing conflicts when the partitions are merged.
 
-There are two parts needed for this; a CLM part and an AMF part:
+The original proposal involved two parts, a CLM part and an AMF part. CLM 
would not announce a node has left the cluster  until the fencing of the node 
has completed successfully. However, some users rely on timely CLM 
notifications to send out node related events and alarms. Thus the proposal has 
been changed to be done in AMF only.
 
-* CLM should not announce that a node has left the cluster until the fencing 
of the node has completed successfully. When using remote fencing, this means 
that the fencing API has reported that the fencing was completed. When remote 
fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED 
seconds (the configuration in immd.conf) before considering the fencing to be 
completed. If MDS connectivity is re-established while waiting, CLM can send an 
MDS message to the node asking it to reboot itself. When CLM has received a 
reply to the reboot request (over MDS) and then later sees that the MDS 
connectivity is lost again, it can consider the fencing to be complete witout 
the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.
+AMF should not perform a node failover, until a node has been fenced.
 
-* AMF should use CLM (only) as source of information regarding which nodes are 
up or down. AMF should not use MDS link notifications or MDS service 
notifications for this purpose.
+When using remote fencing, this means that the fencing API has reported that 
the fencing was completed. When remote fencing is disabled, we need to wait for 
at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) 
before considering the fencing to be completed.
 
-Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect 
availability.
-This option must be configurable via IMM and take effect without a restart.
-It is up to the user to turn on, if node disturbances are planned or expected 
in the environment due to poor quality links between the nodes.
+If MDS connectivity is re-established while waiting, AMF can wait a few 
seconds for a node_up (with leds_set == false) message to indicate the node has 
been rebooted. Otherwise, AMF can send a message to the node asking it to 
reboot itself. When AMF sees that the MDS connectivity is lost again, it can 
consider the fencing to be complete witout the need to wait the full 
IMMSV_SC_ABSENCE_ALLOWED time.
+
+Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect 
availability. This option must be configurable via IMM and take effect without 
a restart. It is up to the user to turn on, if node disturbances are planned or 
expected in the environment due to poor quality links between the nodes.
 
 Additionally, we should allow the user to set this 'node failover' timer to a 
smaller value than IMMSV_SC_ABSENCE_ALLOWED, with the understanding that this 
introduces the risk of duplicate assignments.

~~~~




---

** [tickets:#2918] osaf: fence nodes that are separated from the main network 
partition**

**Status:** accepted
**Milestone:** 5.18.12
**Created:** Fri Aug 24, 2018 06:40 AM UTC by Gary Lee
**Last Updated:** Sat Sep 29, 2018 08:57 AM UTC
**Owner:** Gary Lee


Tickets [#64] and [#2795] added support to prevent multiple active controllers 
in a split network scenario. However, nodes residing in the smaller network 
partitions can remain running.  Meanwhile the active SC residing in the largest 
partition may failover assignments at the unreachable nodes to other reachable 
nodes,  causing conflicts when the partitions are merged.

The original proposal involved two parts, a CLM part and an AMF part. CLM would 
not announce a node has left the cluster  until the fencing of the node has 
completed successfully. However, some users rely on timely CLM notifications to 
send out node related events and alarms. Thus the proposal has been changed to 
be done in AMF only.

AMF should not perform a node failover, until a node has been fenced.

When using remote fencing, this means that the fencing API has reported that 
the fencing was completed. When remote fencing is disabled, we need to wait for 
at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) 
before considering the fencing to be completed.

If MDS connectivity is re-established while waiting, AMF can wait a few seconds 
for a node_up (with leds_set == false) message to indicate the node has been 
rebooted. Otherwise, AMF can send a message to the node asking it to reboot 
itself. When AMF sees that the MDS connectivity is lost again, it can consider 
the fencing to be complete witout the need to wait the full 
IMMSV_SC_ABSENCE_ALLOWED time.

Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect 
availability. This option must be configurable via IMM and take effect without 
a restart. It is up to the user to turn on, if node disturbances are planned or 
expected in the environment due to poor quality links between the nodes.

Additionally, we should allow the user to set this 'node failover' timer to a 
smaller value than IMMSV_SC_ABSENCE_ALLOWED, with the understanding that this 
introduces the risk of duplicate assignments.



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to