Ack.

Thanks
-Nagu

> -----Original Message-----
> From: Gary Lee [mailto:[email protected]]
> Sent: 05 April 2016 02:35
> To: Nagendra Kumar
> Cc: [email protected]
> Subject: [PATCH 1 of 1] amf: add README_HEADLESS [#1620]
> 
>  osaf/services/saf/amf/README_HEADLESS |  150
> ++++++++++++++++++++++++++++++++++
>  1 files changed, 150 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/osaf/services/saf/amf/README_HEADLESS
> b/osaf/services/saf/amf/README_HEADLESS
> new file mode 100644
> --- /dev/null
> +++ b/osaf/services/saf/amf/README_HEADLESS
> @@ -0,0 +1,150 @@
> +#
> +#      -*- OpenSAF  -*-
> +#
> +# (C) Copyright 2016 The OpenSAF Foundation
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> +# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are
> licensed
> +# under the GNU Lesser General Public License Version 2.1, February 1999.
> +# The complete license can be accessed from the following location:
> +# http://opensource.org/licenses/lgpl-license.php
> +# See the Copying file included with the OpenSAF distribution for full
> +# licensing terms.
> +#
> +# Author(s): Ericsson AB
> +#
> +
> +GENERAL
> +-------
> +
> +This is a description of how the AMF service handles being headless (SC
> down)
> +and recovery (SC up).
> +
> +CONFIGURATION
> +-------------
> +
> +AMF reads the "scAbsenceAllowed" attribute to determine if headless mode
> is
> +enabled. A positive integer indicates the number of seconds AMF will
> tolerate
> +being headless, and a zero value indicates the headless feature is disabled.
> +
> +Normally, the AMF Node Director (amfnd) will restart a node if there is no
> active
> +AMF Director (amfd). If headless support is enabled, the Node Director will
> +delay the restart for the duration specified in "scAbsenceAllowed". If a SC
> +recovers during the period, the restart is aborted.
> +
> +IMPLEMENTATION DETAILS
> +----------------------
> +
> +* Amfnd detects being headless:
> +Upon receiving NCSMDS_DOWN event which indicates the last active SC
> has
> +gone, amfnd will not reboot the node and enter headless mode (if
> saAbsenceAllowed
> +is configured)
> +
> +* Escalation and Recovery during headless:
> +Restarts will work as normal, but failover or switchover will
> +result in Node Failfast.
> +
> +The repair action will be initiated when a SC returns if
> +saAmfSGAutoRepair is enabled.
> +
> +* Amfnd detects SC comes back from headless:
> +NCSMDS_UP is the event that amfnd uses to detect the presence of an
> active amfd
> +after being headless.
> +
> +* New sync messages
> +
> +New messages (state information messages) have been introduced to carry
> assignments and
> +states from all amfnd(s), which then are sent to amfd.
> +
> +State information messages also contain component and SU restart counts.
> These
> +new counter values will be updated to IMM after headless recovery.
> +
> +The operation where amfnd(s) sends state information messages and amfd
> processes
> +these messages is known as a *sync* operation.
> +
> +LIMITATIONS
> +-----------
> +
> +* Recovery actions are limited while headless.
> +
> +Failover/Switchover will result in node failfast.
> +
> +* No recovery support if a failover, switchover or node failfast occurs
> during headless state
> +
> +If PL is rebooted during headless state, then SI assignments may be
> improper after headless recovery.
> +
> +* No recovery support if an operation or recovery action is in progress
> while entering headless state
> +
> +If an admin operation or recovery action is in progress when the cluster
> enters
> +headless state, the normal sequence of these actions could be incomplete
> and therefore
> +leave assignments and states of AMF entities in an inappropriate manner.
> +
> +Recovery from this is currently *not supported*.
> +
> +* SI dependency tolerance timer
> +
> +After recovery from headless, if an unassigned sponsor SI is detected, all 
> its
> +dependent SI(s) assignments are removed regardless of tolerance duration.
> The time
> +of sponsor SI becoming unassigned is not recorded, so the new amfd
> cannot
> +figure out how much time is left that the dependent SI(s) can tolerate.
> +
> +* Proxy and Proxied components are not yet supported
> +
> +* Alarms and notifications
> +
> +During the headless period, notifications will not be sent
> +as the Director in charge of sending notifications is not available.
> +For example, if a component fails to instantiate while headless and its
> +SU becomes disabled, a state change for the SU from ENABLED to
> DISABLED
> +will not be sent.
> +
> +List of possible missed notifications
> +=====================================
> +SA_AMF_PRESENCE_STATE of a SU
> +SA_AMF_OP_STATE of a SU
> +SA_AMF_HA_STATE of a SI
> +SA_AMF_ASSIGNMENT_STATE of a SI
> +
> +After the headless period, some redundant alarms and notifications
> +may be sent from the Director. Initially the Director will think
> +all PLs are down. But as sync info is received from PLs, alarms
> +will be cleared or set, and finally reflect the current state of the cluster.
> +For example, an alarm may initially be raised for an unassigned SI, but
> +later cleared as the Director learns of the SI assignment on a PL that
> +remained running.
> +
> +Redundant notifications
> +=======================
> +SA_AMF_PRESENCE_STATE of a SU may change from
> SA_AMF_PRESENCE_UNINSTANTIATED to <<current state>>
> +SA_AMF_OP_STATE of a SU may change from
> SA_AMF_OPERATIONAL_DISABLED to <<current state>>
> +SA_AMF_HA_STATE of a SI may change from "" to <<current state>>
> +SA_AMF_ASSIGNMENT_STATE of a SI may change from
> SA_AMF_ASSIGNMENT_UNASSIGNED to <<current state>>
> +
> +Redundant alarms
> +================
> +An unassigned SI alarm may be raised and then cleared shortly afterwards
> +
> +Furthermore, some notifications may be slightly misleading.
> +For example, if a SI becomes PARTIALLY_ASSIGNED from FULLY_ASSIGNED
> +because a component develops a fault while headless, the SI change
> notification
> +may describe the SI going from UNASSIGNED to PARTIALLY_ASSIGNED. This
> is
> +because the Director initially does not know about the existence of the SIs
> assigned
> +to PLs that remained running.
> +
> +Limited notifications
> +=====================
> +SA_AMF_ASSIGNMENT_STATE of a SI may change from
> SA_AMF_ASSIGNMENT_UNASSIGNED to
> SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED
> +when it should be SA_AMF_ASSIGNMENT_FULLY_ASSIGNED to
> SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED
> +
> +* Some AMF API functions will be unavailable while headless
> +
> +saAmfProtectionGroupTrack() and saAmfProtectionGroupTrackStop() return
> SA_AMF_ERROR_TRY_AGAIN during headless
> +
> +* One payload limitation
> +
> +If the cluster cluster is configured with one payload without PBE, IMM will
> reload
> +from XML the second time the cluster goes headless. This causes amfd to
> lose all objects
> +which were created before headless and data inconsistency will occur
> between
> +amfnd and amfd/IMM on the SC. To avoid this inconsistency, the payload
> will be rebooted.
> +

------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to