osaf/services/saf/amf/README-HEADLESS | 123 ++++++++++++++++++++++++++++++++++ 1 files changed, 123 insertions(+), 0 deletions(-)
diff --git a/osaf/services/saf/amf/README-HEADLESS b/osaf/services/saf/amf/README-HEADLESS new file mode 100644 --- /dev/null +++ b/osaf/services/saf/amf/README-HEADLESS @@ -0,0 +1,123 @@ +# +# -*- OpenSAF -*- +# +# (C) Copyright 2016 The OpenSAF Foundation +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed +# under the GNU Lesser General Public License Version 2.1, February 1999. +# The complete license can be accessed from the following location: +# http://opensource.org/licenses/lgpl-license.php +# See the Copying file included with the OpenSAF distribution for full +# licensing terms. +# +# Author(s): Ericsson AB +# + +GENERAL +------- + +This is a description of how the AMF service handles being headless (SC down) +and recovery (SC up). + +CONFIGURATION +------------- + +AMF reads the "scAbsenceAllowed" attribute to determine if headless mode is +supported. A positive integer indicates the number of seconds AMF will tolerate +being headless, and a zero value indicates the headless feature is disabled. + +Normally, the AMF Node Director (amfnd) restarts a node if there is no active +AMF Director (amfd). If headless support is enabled, the Node Director will +delay the restart for the duration specified in "scAbsenceAllowed". If a SC +recovers during the period, the restart is aborted. + +IMPLEMENTATION DETAILS +---------------------- + +* Amfnd detects being headless: +Up on receiving NCSMDS_DOWN event which indicates the last active SC has +gone, amfnd does not reboot node and entering headless mode (if saAbsenceAllowed +configured) + +* Escalation and Recovery during headless: +The Restart faulty is still real-time recovered as before, while only recovery +has failover/switchover involved will be delayed until amfd up. If Component or +Su failover happens, the component/Su will marked as failed only. The repair +action will be initiated when SC comes back but it also depends on desired +configurations. The Node Failover or Switchover will result in a node restart. + +* Amfnd detects SC comes back from headless: +NCSMDS_UP is the event that amfnd detects active amfd's presence after being +headless. + +* Recovery after being headless: +There could be admin ops or recovery actions in progress while cluster enters +headless. The normal sequence of those actions are uncompleted and therefore it +will leave assignments and states of AMF entities inappropriate. The new messages +(state information messages) have been introduced to carry those assignments and +states from all amfnd(s), which then are sent to amfd. Amfd collects all these +messages and recover/adjust the assignments and states which are left over from +headless. + +State information messages also contain component and SU restart counts, these +new counter value will be updated to IMM after headless. + +The operation that amfnd(s) send state information messages and amfd processes +these messages, is known as a *sync* operation, and has been refered in +implementation. + +Example 1: +Admin si-swap an 2N SI: Cluster goes headless at the time which SU1 has Active +assignment moves to Quiesced. Amfd will receives state information message +with one Quiesced (SU1) and one Standby (SU2) assignment. Amfd will send su +si assignment message to assign SU2 to Active, and SU1's to Standby. + +Example 2: +Su failover on 2N SU: During headless cluster, Active SU1 has faulty that +escalates to a SU failover. SU1's assignment is removed, marked as failed, +operState as Disabled. Once SC comes back, amfd will send su si assignment +to assign SU2 (being Standby) to Active. Depends on AutoRepair is whether +configured, the SU1 will be repaired. + +Example 3: +Si dependency: During headless cluster, both SU1 and SU2 have faulty that +all assignment of SI1 to those SUs are removed. After receiving state information +messages, if any SI have SI1 as sponsored SI, these dependent SI(s) will start +assignment removal. + +* Other services interfaces: +Only Amfd uses Log, Ntf api, and those functions in AMF which require Log/Ntf +are limited during headless. +The Clm functionalities (mostly track cb) Amfnd uses should work as before, +only thing Amfnd needs is to reinitialize Clm handle when active SC comes back +from headless. +Imm Admin ops are not supported during headless since no active Amfd. +In general, the AMF functions that requires Amfd involvement is not supported +during headless. And all those functions will work as normal after SC comes back +from headless. + +LIMITATIONS +----------- + +* Recovery actions are limited while headless. +Failover/Switchover is delayed until SC recovery. + +* Delayed failover recovery is supported for 2N Service Group +Only for 2N Service Group, delayed failover recovery supports most of combination +assignment states (Quiesced/Quiescing/Standby/Active) which are left over +from headless. +EX: A Standby assignment will transition to Active directly if required, +a Quiesced/Quiescing assignment will be removed if admin entity is LOCKED, +or transition to Standby, etc... + +For other Service Group types, Standby assignments are first removed, and +reassigned as appropriate for the SG. + +* Tolerant timer of SI dependency. +After recovery from headless, if unassigned sponsor SI is detected, all its +dependent SI(s) assignment are removed regardless tolerant duration. The time +of sponsor SI becoming unassigned is not recorded so that the new amfd can +figure how much time is left that the dependent SI(s) can tolerate. + ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel