- **summary**: Enhanced cluster management --> Enhanced cluster management
using RAFT
- Description has changed:
Diff:
~~~~
--- old
+++ new
@@ -1,19 +1,54 @@
-The purpose of this ticket is to achieve the following enhancements to OpenSAF
cluster management/membership:
+The goal of this ticket is to address the following requirements.This ticket
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple
standbys):
-- Perform high level node monitoring(heartbeating)
-- Enhanced split-brain avoidance techniques.
-RAFT is being considered for implementing the above cluster management
enhancements.
-.
-The scope of this ticket includes the following:
+Deployment of large OpenSAF clusters in the cloud presents with the following
challenges:
+- Multiple nodes failing/faulting simultaneously (either in a cattle class
deployment OR the host machine going down which inturn will pull down the guest
VM nodes)
+- Relying on 3rd party OR less reliable - hardware/network/hosts
+- Dynamically changing cluster membership due to scale-out and scale-in
operations
+- Multiple (or all) nodes can now become system controller nodes. This
increases the probability of split brain and cluster partitioning.
-(a) Implement RAFT and/or RAFT adaptation layer that provides interfaces for
-- adding/removing nodes to the cluster membership
-- querying leader
-- callbacks notifying about new leader
-- read/write interface
-- notification of nodes joining/leaving the cluster membership
-Note: Yet to be seen if a leader yield interface is necessary
+These requirements are being addressed in a phased manner.
+(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was
implemented in 5.0. (And the headless cluster feature)
-(b) an interface that alows invoking a fencing mechanism
+(2) As a second step, implement (this ticket in 5.1) -
+Enhanced OpenSAF cluster management such that there is always consensus (among
the cluster nodes) on the
+- current cluster members
+- the current active SC, leader election
+- the order of member nodes joining/leaving the cluster
-(c) an interface that allows invoking an arbitration mechanism
+
+(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/
in 5.2?)
+
+
+This ticket addresses bullet (2) above.
+
+Requirements:
+
+* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as
the mechanism for
+(a) achieving consensus among a set of the cluster nodes (and the membership
changes)
+(b) quorum based leader election
+(c) split brain avoidance
+The following deployment scenarios shall be supported when using RAFT:
+- classic 2 SC OpenSAF cluster (or)
+- when all nodes are SCs (2N + the rest are all spares) (or)
+- 2N + spare SCs (2N + a smaller subset are spares) (or)
+- N-WAY (a active, the rest are all hot standbys) - 5.2
+Note: A mix of hot standbys and spares should also be possible.
+
+
+* RAFT shall be a added as a new OpenSAF service.
+
+* OpenSAF shall either implement RAFT or re-use existing RAFT implementations
like logcabin or etcd, etc.
+
+* A new topology service(TS) *may* be added which shall use the topology
information (from TIPC) and MDS (in case of TCP) to determine cluster membership
+
+* CLM is the single layer that interfaces with the underlying RAFT and TS
+
+* All interactions to RAFT and TS shall be via the normalised cluster services
adaptation interface called as OpenSAF cluster services library (CS). The CS
library thereby shall enable OpenSAF to work with different implementations of
RAFT.
+
+* CS and TS shall be added as libraries of OpenSAF CLM service.
+(In the code structure, these shall be part of ....services/saf/clm/libcs and
....services/saf/clm/libts.
+The name of the library shall be libOsafClusterServices.so)
+
+The CS library shall provide a normalized set of APIs (and callback
interfaces) such that OpenSAF can interact with different implementations of
RAFT.
+
+API and High level design details to follow:
~~~~
---
** [tickets:#439] Enhanced cluster management using RAFT**
**Status:** accepted
**Milestone:** 5.1.FC
**Labels:** #79 #1170
**Created:** Fri May 31, 2013 11:15 AM UTC by Mathi Naickan
**Last Updated:** Mon Apr 11, 2016 10:14 PM UTC
**Owner:** Mathi Naickan
The goal of this ticket is to address the following requirements.This ticket
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple
standbys):
Deployment of large OpenSAF clusters in the cloud presents with the following
challenges:
- Multiple nodes failing/faulting simultaneously (either in a cattle class
deployment OR the host machine going down which inturn will pull down the guest
VM nodes)
- Relying on 3rd party OR less reliable - hardware/network/hosts
- Dynamically changing cluster membership due to scale-out and scale-in
operations
- Multiple (or all) nodes can now become system controller nodes. This
increases the probability of split brain and cluster partitioning.
These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was
implemented in 5.0. (And the headless cluster feature)
(2) As a second step, implement (this ticket in 5.1) -
Enhanced OpenSAF cluster management such that there is always consensus (among
the cluster nodes) on the
- current cluster members
- the current active SC, leader election
- the order of member nodes joining/leaving the cluster
(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in
5.2?)
This ticket addresses bullet (2) above.
Requirements:
* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as
the mechanism for
(a) achieving consensus among a set of the cluster nodes (and the membership
changes)
(b) quorum based leader election
(c) split brain avoidance
The following deployment scenarios shall be supported when using RAFT:
- classic 2 SC OpenSAF cluster (or)
- when all nodes are SCs (2N + the rest are all spares) (or)
- 2N + spare SCs (2N + a smaller subset are spares) (or)
- N-WAY (a active, the rest are all hot standbys) - 5.2
Note: A mix of hot standbys and spares should also be possible.
* RAFT shall be a added as a new OpenSAF service.
* OpenSAF shall either implement RAFT or re-use existing RAFT implementations
like logcabin or etcd, etc.
* A new topology service(TS) *may* be added which shall use the topology
information (from TIPC) and MDS (in case of TCP) to determine cluster membership
* CLM is the single layer that interfaces with the underlying RAFT and TS
* All interactions to RAFT and TS shall be via the normalised cluster services
adaptation interface called as OpenSAF cluster services library (CS). The CS
library thereby shall enable OpenSAF to work with different implementations of
RAFT.
* CS and TS shall be added as libraries of OpenSAF CLM service.
(In the code structure, these shall be part of ....services/saf/clm/libcs and
....services/saf/clm/libts.
The name of the library shall be libOsafClusterServices.so)
The CS library shall provide a normalized set of APIs (and callback interfaces)
such that OpenSAF can interact with different implementations of RAFT.
API and High level design details to follow:
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets