Other applications are probably also using CLM notifications. Perhaps we should
do this in AMF only.
---
** [tickets:#2918] osaf: fence nodes that are separated from the main network
partition**
**Status:** unassigned
**Milestone:** 5.18.09
**Created:** Fri Aug 24, 2018 06:40 AM UTC by Gary Lee
**Last Updated:** Wed Sep 19, 2018 08:22 AM UTC
**Owner:** nobody
Tickets [#64] and [#2795] added support to prevent multiple active controllers
in a split network scenario. However, nodes residing in the smaller network
partitions can remain running. Meanwhile the active SC residing in the largest
partition may failover assignments at the unreachable nodes to other reachable
nodes, causing conflicts when the partitions are merged.
There are two parts needed for this; a CLM part and an AMF part:
* CLM should not announce that a node has left the cluster until the fencing of
the node has completed successfully. When using remote fencing, this means that
the fencing API has reported that the fencing was completed. When remote
fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED
seconds (the configuration in immd.conf) before considering the fencing to be
completed. If MDS connectivity is re-established while waiting, CLM can send an
MDS message to the node asking it to reboot itself. When CLM has received a
reply to the reboot request (over MDS) and then later sees that the MDS
connectivity is lost again, it can consider the fencing to be complete witout
the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.
* AMF should use CLM (only) as source of information regarding which nodes are
up or down. AMF should not use MDS link notifications or MDS service
notifications for this purpose.
Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect
availability.
This option must be configurable via IMM and take effect without a restart.
It is up to the user to turn on, if node disturbances are planned or expected
in the environment due to poor quality links between the nodes.
Additionally, we should allow the user to set this 'node failover' timer to a
smaller value than IMMSV_SC_ABSENCE_ALLOWED, with the understanding that this
introduces the risk of duplicate assignments.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets