Ack. Thanks -Nagu
> -----Original Message----- > From: Gary Lee [mailto:[email protected]] > Sent: 05 April 2016 02:35 > To: Nagendra Kumar > Cc: [email protected] > Subject: [PATCH 1 of 1] amf: add README_HEADLESS [#1620] > > osaf/services/saf/amf/README_HEADLESS | 150 > ++++++++++++++++++++++++++++++++++ > 1 files changed, 150 insertions(+), 0 deletions(-) > > > diff --git a/osaf/services/saf/amf/README_HEADLESS > b/osaf/services/saf/amf/README_HEADLESS > new file mode 100644 > --- /dev/null > +++ b/osaf/services/saf/amf/README_HEADLESS > @@ -0,0 +1,150 @@ > +# > +# -*- OpenSAF -*- > +# > +# (C) Copyright 2016 The OpenSAF Foundation > +# > +# This program is distributed in the hope that it will be useful, but > +# WITHOUT ANY WARRANTY; without even the implied warranty of > MERCHANTABILITY > +# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are > licensed > +# under the GNU Lesser General Public License Version 2.1, February 1999. > +# The complete license can be accessed from the following location: > +# http://opensource.org/licenses/lgpl-license.php > +# See the Copying file included with the OpenSAF distribution for full > +# licensing terms. > +# > +# Author(s): Ericsson AB > +# > + > +GENERAL > +------- > + > +This is a description of how the AMF service handles being headless (SC > down) > +and recovery (SC up). > + > +CONFIGURATION > +------------- > + > +AMF reads the "scAbsenceAllowed" attribute to determine if headless mode > is > +enabled. A positive integer indicates the number of seconds AMF will > tolerate > +being headless, and a zero value indicates the headless feature is disabled. > + > +Normally, the AMF Node Director (amfnd) will restart a node if there is no > active > +AMF Director (amfd). If headless support is enabled, the Node Director will > +delay the restart for the duration specified in "scAbsenceAllowed". If a SC > +recovers during the period, the restart is aborted. > + > +IMPLEMENTATION DETAILS > +---------------------- > + > +* Amfnd detects being headless: > +Upon receiving NCSMDS_DOWN event which indicates the last active SC > has > +gone, amfnd will not reboot the node and enter headless mode (if > saAbsenceAllowed > +is configured) > + > +* Escalation and Recovery during headless: > +Restarts will work as normal, but failover or switchover will > +result in Node Failfast. > + > +The repair action will be initiated when a SC returns if > +saAmfSGAutoRepair is enabled. > + > +* Amfnd detects SC comes back from headless: > +NCSMDS_UP is the event that amfnd uses to detect the presence of an > active amfd > +after being headless. > + > +* New sync messages > + > +New messages (state information messages) have been introduced to carry > assignments and > +states from all amfnd(s), which then are sent to amfd. > + > +State information messages also contain component and SU restart counts. > These > +new counter values will be updated to IMM after headless recovery. > + > +The operation where amfnd(s) sends state information messages and amfd > processes > +these messages is known as a *sync* operation. > + > +LIMITATIONS > +----------- > + > +* Recovery actions are limited while headless. > + > +Failover/Switchover will result in node failfast. > + > +* No recovery support if a failover, switchover or node failfast occurs > during headless state > + > +If PL is rebooted during headless state, then SI assignments may be > improper after headless recovery. > + > +* No recovery support if an operation or recovery action is in progress > while entering headless state > + > +If an admin operation or recovery action is in progress when the cluster > enters > +headless state, the normal sequence of these actions could be incomplete > and therefore > +leave assignments and states of AMF entities in an inappropriate manner. > + > +Recovery from this is currently *not supported*. > + > +* SI dependency tolerance timer > + > +After recovery from headless, if an unassigned sponsor SI is detected, all > its > +dependent SI(s) assignments are removed regardless of tolerance duration. > The time > +of sponsor SI becoming unassigned is not recorded, so the new amfd > cannot > +figure out how much time is left that the dependent SI(s) can tolerate. > + > +* Proxy and Proxied components are not yet supported > + > +* Alarms and notifications > + > +During the headless period, notifications will not be sent > +as the Director in charge of sending notifications is not available. > +For example, if a component fails to instantiate while headless and its > +SU becomes disabled, a state change for the SU from ENABLED to > DISABLED > +will not be sent. > + > +List of possible missed notifications > +===================================== > +SA_AMF_PRESENCE_STATE of a SU > +SA_AMF_OP_STATE of a SU > +SA_AMF_HA_STATE of a SI > +SA_AMF_ASSIGNMENT_STATE of a SI > + > +After the headless period, some redundant alarms and notifications > +may be sent from the Director. Initially the Director will think > +all PLs are down. But as sync info is received from PLs, alarms > +will be cleared or set, and finally reflect the current state of the cluster. > +For example, an alarm may initially be raised for an unassigned SI, but > +later cleared as the Director learns of the SI assignment on a PL that > +remained running. > + > +Redundant notifications > +======================= > +SA_AMF_PRESENCE_STATE of a SU may change from > SA_AMF_PRESENCE_UNINSTANTIATED to <<current state>> > +SA_AMF_OP_STATE of a SU may change from > SA_AMF_OPERATIONAL_DISABLED to <<current state>> > +SA_AMF_HA_STATE of a SI may change from "" to <<current state>> > +SA_AMF_ASSIGNMENT_STATE of a SI may change from > SA_AMF_ASSIGNMENT_UNASSIGNED to <<current state>> > + > +Redundant alarms > +================ > +An unassigned SI alarm may be raised and then cleared shortly afterwards > + > +Furthermore, some notifications may be slightly misleading. > +For example, if a SI becomes PARTIALLY_ASSIGNED from FULLY_ASSIGNED > +because a component develops a fault while headless, the SI change > notification > +may describe the SI going from UNASSIGNED to PARTIALLY_ASSIGNED. This > is > +because the Director initially does not know about the existence of the SIs > assigned > +to PLs that remained running. > + > +Limited notifications > +===================== > +SA_AMF_ASSIGNMENT_STATE of a SI may change from > SA_AMF_ASSIGNMENT_UNASSIGNED to > SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED > +when it should be SA_AMF_ASSIGNMENT_FULLY_ASSIGNED to > SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED > + > +* Some AMF API functions will be unavailable while headless > + > +saAmfProtectionGroupTrack() and saAmfProtectionGroupTrackStop() return > SA_AMF_ERROR_TRY_AGAIN during headless > + > +* One payload limitation > + > +If the cluster cluster is configured with one payload without PBE, IMM will > reload > +from XML the second time the cluster goes headless. This causes amfd to > lose all objects > +which were created before headless and data inconsistency will occur > between > +amfnd and amfd/IMM on the SC. To avoid this inconsistency, the payload > will be rebooted. > + ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
