Hi Gary,

1. Please mention the redundancy model supported. I have done testing on 2N 
only, to be on safe side, I would suggest to mention 2N only.
2. It would be more clear if we mention that if cluster goes headless when 
admin operation is undergoing, then this case is not supported.
3. The same can be mentioned that if faults occurs and if recovery is other 
than comp restart, then it is not supported(node will reboot and SUSI may be 
improper).
[You can remove this line once the node failover case gets supported.]
4. The following lines is incomplete:
        +SA_AMF_HA_STATE of a SI may change from "" to <<current state>>

Thanks
-Nagu

> -----Original Message-----
> From: Gary Lee [mailto:[email protected]]
> Sent: 04 April 2016 13:07
> To: [email protected]; Nagendra Kumar; praveen malviya;
> [email protected]; [email protected]
> Cc: [email protected]
> Subject: [PATCH 1 of 1] amf: add README_HEADLESS [#1620]
> 
>  osaf/services/saf/amf/README_HEADLESS |  150
> ++++++++++++++++++++++++++++++++++
>  1 files changed, 150 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/osaf/services/saf/amf/README_HEADLESS
> b/osaf/services/saf/amf/README_HEADLESS
> new file mode 100644
> --- /dev/null
> +++ b/osaf/services/saf/amf/README_HEADLESS
> @@ -0,0 +1,150 @@
> +#
> +#      -*- OpenSAF  -*-
> +#
> +# (C) Copyright 2016 The OpenSAF Foundation
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> +# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are
> licensed
> +# under the GNU Lesser General Public License Version 2.1, February 1999.
> +# The complete license can be accessed from the following location:
> +# http://opensource.org/licenses/lgpl-license.php
> +# See the Copying file included with the OpenSAF distribution for full
> +# licensing terms.
> +#
> +# Author(s): Ericsson AB
> +#
> +
> +GENERAL
> +-------
> +
> +This is a description of how the AMF service handles being headless (SC
> down)
> +and recovery (SC up).
> +
> +CONFIGURATION
> +-------------
> +
> +AMF reads the "scAbsenceAllowed" attribute to determine if headless mode
> is
> +enabled. A positive integer indicates the number of seconds AMF will
> tolerate
> +being headless, and a zero value indicates the headless feature is disabled.
> +
> +Normally, the AMF Node Director (amfnd) will restart a node if there is no
> active
> +AMF Director (amfd). If headless support is enabled, the Node Director will
> +delay the restart for the duration specified in "scAbsenceAllowed". If a SC
> +recovers during the period, the restart is aborted.
> +
> +IMPLEMENTATION DETAILS
> +----------------------
> +
> +* Amfnd detects being headless:
> +Upon receiving NCSMDS_DOWN event which indicates the last active SC
> has
> +gone, amfnd will not reboot the node and enter headless mode (if
> saAbsenceAllowed
> +is configured)
> +
> +* Escalation and Recovery during headless:
> +Restarts will work as normal, but failover or switchover will
> +result in Node Failfast.
> +
> +The repair action will be initiated when a SC returns if
> +saAmfSGAutoRepair is enabled.
> +
> +* Amfnd detects SC comes back from headless:
> +NCSMDS_UP is the event that amfnd uses to detect the presence of an
> active amfd
> +after being headless.
> +
> +* New sync messages
> +
> +New messages (state information messages) have been introduced to carry
> assignments and
> +states from all amfnd(s), which then are sent to amfd.
> +
> +State information messages also contain component and SU restart counts.
> These
> +new counter values will be updated to IMM after headless recovery.
> +
> +The operation where amfnd(s) sends state information messages and amfd
> processes
> +these messages is known as a *sync* operation.
> +
> +LIMITATIONS
> +-----------
> +
> +* Recovery actions are limited while headless.
> +
> +Failover/Switchover will result in node failfast.
> +
> +* Very limited recovery if a failover, switchover or node failfast occurs
> during headless state
> +
> +If PL is rebooted during headless state, then SI assignments may be
> improper after headless recovery.
> +
> +* Very limited recovery from operations or recovery actions in progress
> while entering headless state
> +
> +If an admin operation or recovery action is in progress when the cluster
> enters
> +headless state, the normal sequence of these actions could be incomplete
> and therefore
> +leave assignments and states of AMF entities in an inappropriate manner.
> +
> +Recovery from this is currently *not supported*.
> +
> +* SI dependency tolerance timer
> +
> +After recovery from headless, if an unassigned sponsor SI is detected, all 
> its
> +dependent SI(s) assignments are removed regardless of tolerance duration.
> The time
> +of sponsor SI becoming unassigned is not recorded, so the new amfd
> cannot
> +figure out how much time is left that the dependent SI(s) can tolerate.
> +
> +* Proxy and Proxied components are not yet supported
> +
> +* Alarms and notifications
> +
> +During the headless period, notifications will not be sent
> +as the Director in charge of sending notifications is not available.
> +For example, if a component fails to instantiate while headless and its
> +SU becomes disabled, a state change for the SU from ENABLED to
> DISABLED
> +will not be sent.
> +
> +List of possible missed notifications
> +=====================================
> +SA_AMF_PRESENCE_STATE of a SU
> +SA_AMF_OP_STATE of a SU
> +SA_AMF_HA_STATE of a SI
> +SA_AMF_ASSIGNMENT_STATE of a SI
> +
> +After the headless period, some redundant alarms and notifications
> +may be sent from the Director. Initially the Director will think
> +all PLs are down. But as sync info is received from PLs, alarms
> +will be cleared or set, and finally reflect the current state of the cluster.
> +For example, an alarm may initially be raised for an unassigned SI, but
> +later cleared as the Director learns of the SI assignment on a PL that
> +remained running.
> +
> +Redundant notifications
> +=======================
> +SA_AMF_PRESENCE_STATE of a SU may change from
> SA_AMF_PRESENCE_UNINSTANTIATED to <<current state>>
> +SA_AMF_OP_STATE of a SU may change from
> SA_AMF_OPERATIONAL_DISABLED to <<current state>>
> +SA_AMF_HA_STATE of a SI may change from "" to <<current state>>
> +SA_AMF_ASSIGNMENT_STATE of a SI may change from
> SA_AMF_ASSIGNMENT_UNASSIGNED to <<current state>>
> +
> +Redundant alarms
> +================
> +An unassigned SI alarm may be raised and then cleared shortly afterwards
> +
> +Furthermore, some notifications may be slightly misleading.
> +For example, if a SI becomes PARTIALLY_ASSIGNED from FULLY_ASSIGNED
> +because a component develops a fault while headless, the SI change
> notification
> +may describe the SI going from UNASSIGNED to PARTIALLY_ASSIGNED. This
> is
> +because the Director initially does not know about the existence of the SIs
> assigned
> +to PLs that remained running.
> +
> +Limited notifications
> +=====================
> +SA_AMF_ASSIGNMENT_STATE of a SI may change from
> SA_AMF_ASSIGNMENT_UNASSIGNED to
> SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED
> +when it should be SA_AMF_ASSIGNMENT_FULLY_ASSIGNED to
> SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED
> +
> +* Some AMF API functions will be unavailable while headless
> +
> +saAmfProtectionGroupTrack() and saAmfProtectionGroupTrackStop() return
> SA_AMF_ERROR_TRY_AGAIN during headless
> +
> +* One payload limitation
> +
> +If the cluster cluster is configured with one payload without PBE, IMM will
> reload
> +from XML the second time the cluster goes headless. This causes amfd to
> lose all objects
> +which were created before headless and data inconsistency will occur
> between
> +amfnd and amfd/IMM on the SC. To avoid this inconsistency, the payload
> will be rebooted.
> +

------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to