osaf/services/saf/amf/README-HEADLESS |  123 ++++++++++++++++++++++++++++++++++
 1 files changed, 123 insertions(+), 0 deletions(-)


diff --git a/osaf/services/saf/amf/README-HEADLESS 
b/osaf/services/saf/amf/README-HEADLESS
new file mode 100644
--- /dev/null
+++ b/osaf/services/saf/amf/README-HEADLESS
@@ -0,0 +1,123 @@
+#
+#      -*- OpenSAF  -*-
+#
+# (C) Copyright 2016 The OpenSAF Foundation
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+# under the GNU Lesser General Public License Version 2.1, February 1999.
+# The complete license can be accessed from the following location:
+# http://opensource.org/licenses/lgpl-license.php
+# See the Copying file included with the OpenSAF distribution for full
+# licensing terms.
+#
+# Author(s): Ericsson AB
+#
+
+GENERAL
+-------
+
+This is a description of how the AMF service handles being headless (SC down)
+and recovery (SC up).
+
+CONFIGURATION
+-------------
+
+AMF reads the "scAbsenceAllowed" attribute to determine if headless mode is
+supported. A positive integer indicates the number of seconds AMF will tolerate
+being headless, and a zero value indicates the headless feature is disabled.
+
+Normally, the AMF Node Director (amfnd) restarts a node if there is no active
+AMF Director (amfd). If headless support is enabled, the Node Director will 
+delay the restart for the duration specified in "scAbsenceAllowed". If a SC 
+recovers during the period, the restart is aborted.
+
+IMPLEMENTATION DETAILS
+----------------------
+
+* Amfnd detects being headless:
+Up on receiving NCSMDS_DOWN event which indicates the last active SC has 
+gone, amfnd does not reboot node and entering headless mode (if 
saAbsenceAllowed
+configured)
+
+* Escalation and Recovery during headless:
+The Restart faulty is still real-time recovered as before, while only recovery
+has failover/switchover involved will be delayed until amfd up. If Component or
+Su failover happens, the component/Su will marked as failed only. The repair 
+action will be initiated when SC comes back but it also depends on desired 
+configurations. The Node Failover or Switchover will result in a node restart.
+
+* Amfnd detects SC comes back from headless:
+NCSMDS_UP is the event that amfnd detects active amfd's presence after being
+headless.
+
+* Recovery after being headless:
+There could be admin ops or recovery actions in progress while cluster enters
+headless. The normal sequence of those actions are uncompleted and therefore it
+will leave assignments and states of AMF entities inappropriate. The new 
messages
+(state information messages) have been introduced to carry those assignments 
and
+states from all amfnd(s), which then are sent to amfd. Amfd collects all these 
+messages and recover/adjust the assignments and states which are left over from
+headless.
+
+State information messages also contain component and SU restart counts, these
+new counter value will be updated to IMM after headless.
+
+The operation that amfnd(s) send state information messages and amfd processes
+these messages, is known as a *sync* operation, and has been refered in 
+implementation.
+
+Example 1:
+Admin si-swap an 2N SI: Cluster goes headless at the time which SU1 has Active
+assignment moves to Quiesced. Amfd will receives state information message 
+with one Quiesced (SU1) and one Standby (SU2) assignment. Amfd will send su
+si assignment message to assign SU2 to Active, and SU1's to Standby.
+
+Example 2:
+Su failover on 2N SU: During headless cluster, Active SU1 has faulty that 
+escalates to a SU failover. SU1's assignment is removed, marked as failed, 
+operState as Disabled. Once SC comes back, amfd will send su si assignment
+to assign SU2 (being Standby) to Active. Depends on AutoRepair is whether 
+configured, the SU1 will be repaired.
+
+Example 3:
+Si dependency: During headless cluster, both SU1 and SU2 have faulty that 
+all assignment of SI1 to those SUs are removed. After receiving state 
information
+messages, if any SI have SI1 as sponsored SI, these dependent SI(s) will start
+assignment removal.
+
+* Other services interfaces:
+Only Amfd uses Log, Ntf api, and those functions in AMF which require Log/Ntf
+are limited during headless.
+The Clm functionalities (mostly track cb) Amfnd uses should work as before,
+only thing Amfnd needs is to reinitialize Clm handle when active SC comes back
+from headless.
+Imm Admin ops are not supported during headless since no active Amfd.
+In general, the AMF functions that requires Amfd involvement is not supported
+during headless. And all those functions will work as normal after SC comes 
back
+from headless.
+
+LIMITATIONS
+-----------
+
+* Recovery actions are limited while headless.
+Failover/Switchover is delayed until SC recovery.
+
+* Delayed failover recovery is supported for 2N Service Group
+Only for 2N Service Group, delayed failover recovery supports most of 
combination 
+assignment states (Quiesced/Quiescing/Standby/Active) which are left over 
+from headless.
+EX: A Standby assignment will transition to Active directly if required,
+a Quiesced/Quiescing assignment will be removed if admin entity is LOCKED,
+or transition to Standby, etc...
+
+For other Service Group types, Standby assignments are first removed, and 
+reassigned as appropriate for the SG.
+
+* Tolerant timer of SI dependency.
+After recovery from headless, if unassigned sponsor SI is detected, all its
+dependent SI(s) assignment are removed regardless tolerant duration. The time
+of sponsor SI becoming unassigned is not recorded so that the new amfd can
+figure how much time is left that the dependent SI(s) can tolerate.
+

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to