[tickets] [opensaf:tickets] #1132 opensaf shall support indefinite unavailbility of both controllers (Hydra V1)

Anders Bjornerstedt Tue, 02 Dec 2014 05:55:39 -0800

- **summary**: opensaf shall support indefinite unavailbility of both 
controllers --> opensaf shall support indefinite unavailbility of both 
controllers (Hydra V1)
- **Comment**:


The point of this feature is of course not that OpenSAF should be in the
"headless" state indefinitely. The point is rather to survive long 
enough for at least one of the curently two statically configured
"heads", ie. SCs to come back.

I propose that we not call this new feature "hedless" but that we instead
take a term that evokes the ultimate goal: the regeneration of heads
(recovery of SCs from the state of no SCs).

One suitable term for such a feature would be "Hydra", a many-headed
serpent in Greek mythology, that also regenerated heads when they where
chopped off. (http://en.wikipedia.org/wiki/Lernaean_Hydra)

Version 1 of the OpensAF Hydra feature will not have any realtime
guarantee on how fast it regenerates even one head.

Version 2 of Hydra would focus on approaching realtime recovery of SCs.




---

** [tickets:#1132] opensaf shall support indefinite unavailbility of both 
controllers (Hydra V1)**

**Status:** unassigned
**Milestone:** 4.6.FC
**Created:** Tue Sep 23, 2014 01:51 PM UTC by Hans Feldt
**Last Updated:** Fri Nov 28, 2014 04:51 PM UTC
**Owner:** nobody

The opensaf cluster shall survive that both system controllers are indefinitely 
unavaliable i.e. down (no cluster reboot as today)

After the system controllers recover, IMM and AMF state shall be as before the 
controllers got unavailable.

Since AMF state can not change, this means that AMF can not react to
service availability events. This means that service availability (a statistical
property) will be impacted in relation to how often this new feature is
excercised.

It is important that *everyone* understands this.
There is no magic being done here.

Use case: opensaf cloud deployment. In a cloud deployment, the risk for 
multiple simultaneous node failures is increased due to a number of reasons:

* The hardware used to build cloud infrastructure may not be carrier-grade.
* The hypervisor is an extra layer which can also cause VM failures.
* Multiple VMs can be hosted on the same physical hardware. There is no 
standardized interface for querying if two nodes are located on the same 
physical machine.
* Live migration of VMs can cause disruptions
* The "Pets vs cattle" thinking: There is an expectation that VMs can be 
treated as "cattle", i.e. that the loss of a few VMs shall not have a 
devastating effect on the whole cluster (which can consist of a hundred nodes).

To be refined a lot...


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1132 opensaf shall support indefinite unavailbility of both controllers (Hydra V1)

Reply via email to