- **summary**: Enhanced cluster management using RAFT consensus algorithm --> 
Enhanced cluster management using quorum



---

** [tickets:#439] Enhanced cluster management using quorum**

**Status:** accepted
**Milestone:** 5.1.FC
**Labels:** #79 #1170 
**Created:** Fri May 31, 2013 11:15 AM UTC by Mathi Naickan
**Last Updated:** Tue May 31, 2016 12:57 PM UTC
**Owner:** Mathi Naickan


The goal of this ticket is to address the following requirements.This ticket 
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple 
standbys):

Deployment of large OpenSAF clusters in the cloud presents with the following 
challenges:
- Multiple nodes failing/faulting simultaneously (either in a cattle class 
deployment OR the host machine going down which inturn will pull down the guest 
VM nodes)
- Relying on 3rd party OR less reliable - hardware/network/hosts
- Dynamically changing cluster membership due to scale-out and scale-in 
operations
- Multiple (or all) nodes can now become system controller nodes. This 
increases the probability of split brain and cluster partitioning.

These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was 
implemented in 5.0. (And the headless cluster feature)

(2) As a second step, implement (this ticket in 5.1)  - 
Enhanced OpenSAF cluster management such that there is always consensus (among 
the cluster nodes) on the 
- current cluster members
- the current active SC, leader election
- the order of member nodes joining/leaving the cluster


(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in 
5.2?)


This ticket addresses bullet (2) above.

Requirements:

* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as 
the mechanism for 
(a) achieving consensus among a set of the cluster nodes (and the membership 
changes)
(b) quorum based leader election
(c) split brain avoidance
The following deployment scenarios shall be supported when using RAFT:
-classic 2 SC OpenSAF cluster (or)
-when all nodes are SCs (2N + the rest are all spares) (or)
-2N + spare SCs (2N + a smaller subset are spares) (or)
-N-WAY (a active, the rest are all hot standbys) - 5.2
Note: A mix of hot standbys and spares should also be possible.


* RAFT shall be a added as a new OpenSAF service. 

* OpenSAF shall either implement RAFT or re-use existing RAFT implementations 
like logcabin or etcd, etc.

* A new topology service(TS) *may* be added which shall use the topology 
information (from TIPC) and MDS (in case of TCP) to determine cluster membership

* CLM is the single layer that interfaces with the underlying RAFT and TS

* All interactions to RAFT and TS shall be via the normalised cluster services 
adaptation interface called as OpenSAF cluster services library (CS).  The CS 
library thereby shall enable OpenSAF to work with different implementations of 
RAFT.

* CS and TS shall be added as libraries of OpenSAF CLM service. 
(In the code structure, these shall be part of ....services/saf/clm/libcs and 
....services/saf/clm/libts.
The name of the library shall be libOsafClusterServices.so)

* OpenSAF should work both when RAFT is enabled or disabled on that system and 
should be backward compatible to previous OpenSAF releases!

The CS library shall provide a normalized set of APIs (and callback interfaces) 
such that OpenSAF can interact with different implementations of RAFT. 

API and High level design details to follow:


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to