osaf/services/saf/amf/README_HEADLESS |  172 ++++++++++++++++++++++++++++++++++
 1 files changed, 172 insertions(+), 0 deletions(-)


diff --git a/osaf/services/saf/amf/README_HEADLESS 
b/osaf/services/saf/amf/README_HEADLESS
new file mode 100644
--- /dev/null
+++ b/osaf/services/saf/amf/README_HEADLESS
@@ -0,0 +1,172 @@
+#
+#      -*- OpenSAF  -*-
+#
+# (C) Copyright 2016 The OpenSAF Foundation
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+# under the GNU Lesser General Public License Version 2.1, February 1999.
+# The complete license can be accessed from the following location:
+# http://opensource.org/licenses/lgpl-license.php
+# See the Copying file included with the OpenSAF distribution for full
+# licensing terms.
+#
+# Author(s): Ericsson AB
+#
+
+GENERAL
+-------
+
+This is a description of how the AMF service handles being headless (SC down)
+and recovery (SC up).
+
+CONFIGURATION
+-------------
+
+AMF reads the "scAbsenceAllowed" attribute to determine if headless mode is
+enabled. A positive integer indicates the number of seconds AMF will tolerate
+being headless, and a zero value indicates the headless feature is disabled.
+
+Normally, the AMF Node Director (amfnd) will restart a node if there is no 
active
+AMF Director (amfd). If headless support is enabled, the Node Director will 
+delay the restart for the duration specified in "scAbsenceAllowed". If a SC 
+recovers during the period, the restart is aborted.
+
+IMPLEMENTATION DETAILS
+----------------------
+
+* Amfnd detects being headless:
+Upon receiving NCSMDS_DOWN event which indicates the last active SC has 
+gone, amfnd will not reboot the node and enter headless mode (if 
saAbsenceAllowed
+is configured)
+
+* Escalation and Recovery during headless:
+Restarts will work as normal, but failover/switchover will be delayed until 
amfd
+is up. During headless period, a component failover will be treated
+as a SU failover to simplify error handling.
+Node Failover or Switchover will result in Node Failfast.
+
+The repair action will be initiated when a SC returns if
+saAmfSGAutoRepair is enabled.
+
+* Amfnd detects SC comes back from headless:
+NCSMDS_UP is the event that amfnd uses to detect the presence of an active amfd
+after being headless.
+
+* Recovery after being headless:
+There could be admin operations or recovery actions in progress when the 
cluster enters
+headless state. The normal sequence of these actions could be incomplete and 
therefore
+leave assignments and states of AMF entities in an inappropriate manner. 
+New messages (state information messages) have been introduced to carry those 
assignments and
+states from all amfnd(s), which then are sent to amfd. Amfd collects all these 
+messages and will recover/adjust the assignments and states which are left 
over from
+headless.
+
+State information messages also contain component and SU restart counts. These
+new counter values will be updated to IMM after headless recovery.
+
+The operation where amfnd(s) sends state information messages and amfd 
processes
+these messages is known as a *sync* operation.
+
+Example 1:
+Admin si-swap a 2N SI: Cluster goes headless when SU1 which has Active
+assignment moves to Quiesced. Amfd will receive state information message 
+with one Quiesced (SU1) and one Standby (SU2) assignment. Amfd will send SU-SI 
+assignment message to assign SU2 to Active, and SU1 to Standby.
+
+Example 2:
+SU failover on 2N SU: While headless, Active SU1 becomes faulty and 
+escalates to a SU failover. SU1's assignment is removed, marked as failed, and 
+operState as Disabled. Once SC comes back, amfd will send SU-SI assignment
+to assign SU2 (being Standby) to Active. If AutoRepair is configured,
+the SU1 will be repaired.
+
+Example 3:
+SI dependency: While headless, both SU1 and SU2 become faulty and 
+all assignment of SI1 to those SUs are removed. After receiving state 
information
+messages, if any SI have SI1 as sponsor SI, these dependent SI(s) will start
+assignment removal.
+
+LIMITATIONS
+-----------
+
+* Recovery actions are limited while headless.
+
+Failover/Switchover is delayed until SC recovery. saAmfSUFailover setting
+will be ignored and will be treated as being set to 1.
+
+* Only 2N/NoRed/NwayAct Service Groups are supported
+
+For these SG types, delayed failover recovery will support most combinations
+of assignment states (Quiesced/Quiescing/Standby/Active) left over 
+from headless. 
+
+Example: A Standby assignment will transition to Active directly if required,
+a Quiesced/Quiescing assignment will be removed if admin entity is LOCKED,
+or transition to Standby.
+
+* SI dependency tolerance timer 
+After recovery from headless, if an unassigned sponsor SI is detected, all its
+dependent SI(s) assignments are removed regardless of tolerance duration. The 
time
+of sponsor SI becoming unassigned is not recorded, so the new amfd cannot
+figure out how much time is left that the dependent SI(s) can tolerate.
+
+* Proxy and Proxied components are not yet supported
+
+* Alarms and notifications
+
+During the headless period, notifications will not be sent 
+as the Director in charge of sending notifications is not available.
+For example, if a component fails to instantiate while headless and its
+SU becomes disabled, a state change for the SU from ENABLED to DISABLED
+will not be sent.
+
+List of possible missed notifications
+=====================================
+SA_AMF_PRESENCE_STATE of a SU
+SA_AMF_OP_STATE of a SU 
+SA_AMF_HA_STATE of a SI 
+SA_AMF_ASSIGNMENT_STATE of a SI
+
+After the headless period, some redundant alarms and notifications
+may be sent from the Director. Initially the Director will think
+all PLs are down. But as sync info is received from PLs, alarms
+will be cleared or set, and finally reflect the current state of the cluster.
+For example, an alarm may initially be raised for an unassigned SI, but
+later cleared as the Director learns of the SI assignment on a PL that
+remained running.
+
+Redundant notifications
+=======================
+SA_AMF_PRESENCE_STATE of a SU may change from SA_AMF_PRESENCE_UNINSTANTIATED 
to <<current state>>
+SA_AMF_OP_STATE of a SU may change from SA_AMF_OPERATIONAL_DISABLED to 
<<current state>>
+SA_AMF_HA_STATE of a SI may change from "" to <<current state>>
+SA_AMF_ASSIGNMENT_STATE of a SI may change from SA_AMF_ASSIGNMENT_UNASSIGNED 
to <<current state>>
+
+Redundant alarms
+================
+An unassigned SI alarm may be raised and then cleared shortly afterwards
+
+Furthermore, some notifications may be slightly misleading.
+For example, if a SI becomes PARTIALLY_ASSIGNED from FULLY_ASSIGNED
+because a component develops a fault while headless, the SI change notification
+may describe the SI going from UNASSIGNED to PARTIALLY_ASSIGNED. This is
+because the Director initially does not know about the existence of the SIs 
assigned 
+to PLs that remained running.
+
+Limited notifications
+=====================
+SA_AMF_ASSIGNMENT_STATE of a SI may change from SA_AMF_ASSIGNMENT_UNASSIGNED 
to SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED
+when it should be SA_AMF_ASSIGNMENT_FULLY_ASSIGNED to 
SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED
+
+* Some AMF API functions will be unavailable while headless
+
+saAmfProtectionGroupTrack() and saAmfProtectionGroupTrackStop() return 
SA_AMF_ERROR_TRY_AGAIN during headless
+
+* One payload limitation
+
+Cluster is configured with one payload without PBE, IMM will reload from xml 
at the second time cluster going headless
+That cause amfd lost all objects which were created before headless and the 
data inconsistency happens between 
+amfnd and amfnd/IMM. To avoid this inconsistency, the payload needs a reboot.
+

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to