Hi Praveen,

Please find my comments inline [Minh]

Thanks,
Minh

On 11/02/16 17:10, praveen malviya wrote:
> Hi Minh,
>
> I have marked some doubts with [Praveen]. Could you please answer 
> them, it will helpful in understanding the changes in AMFD and AMFND.
>
> Thanks,
> Praveen
>
> On 20-Jan-16 9:03 AM, Minh Hon Chau wrote:
>>   osaf/services/saf/amf/README-HEADLESS | 123 
>> ++++++++++++++++++++++++++++++++++
>>   1 files changed, 123 insertions(+), 0 deletions(-)
>>
>>
>> diff --git a/osaf/services/saf/amf/README-HEADLESS 
>> b/osaf/services/saf/amf/README-HEADLESS
>> new file mode 100644
>> --- /dev/null
>> +++ b/osaf/services/saf/amf/README-HEADLESS
>> @@ -0,0 +1,123 @@
>> +#
>> +#      -*- OpenSAF  -*-
>> +#
>> +# (C) Copyright 2016 The OpenSAF Foundation
>> +#
>> +# This program is distributed in the hope that it will be useful, but
>> +# WITHOUT ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY
>> +# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are 
>> licensed
>> +# under the GNU Lesser General Public License Version 2.1, February 
>> 1999.
>> +# The complete license can be accessed from the following location:
>> +# http://opensource.org/licenses/lgpl-license.php
>> +# See the Copying file included with the OpenSAF distribution for full
>> +# licensing terms.
>> +#
>> +# Author(s): Ericsson AB
>> +#
>> +
>> +GENERAL
>> +-------
>> +
>> +This is a description of how the AMF service handles being headless 
>> (SC down)
>> +and recovery (SC up).
>> +
>> +CONFIGURATION
>> +-------------
>> +
>> +AMF reads the "scAbsenceAllowed" attribute to determine if headless 
>> mode is
>> +supported. A positive integer indicates the number of seconds AMF 
>> will tolerate
>> +being headless, and a zero value indicates the headless feature is 
>> disabled.
>> +
>> +Normally, the AMF Node Director (amfnd) restarts a node if there is 
>> no active
>> +AMF Director (amfd). If headless support is enabled, the Node 
>> Director will
>> +delay the restart for the duration specified in "scAbsenceAllowed". 
>> If a SC
>> +recovers during the period, the restart is aborted.
>> +
> [Praveen] What happens to controller amfnds? It there is problem with 
> only directors on the controller then how controller amfnd reacts to 
> it. Is there any difference between the handling for AMFND on 
> controllers and on payloads.
> [Minh] The usecase is to take down the whole SC, not just only amfd. 
> The behaviour of amfnd on controller would be similar as before, we're 
> using leds_set in node_up to indicate which amfnd is newly started or 
> veteran. From there, the handling of node_up message in amfd will 
> treat amfnd differently up on leds_set.
>> +IMPLEMENTATION DETAILS
>> +----------------------
>> +
>> +* Amfnd detects being headless:
>> +Up on receiving NCSMDS_DOWN event which indicates the last active SC 
>> has
>> +gone, amfnd does not reboot node and entering headless mode (if 
>> saAbsenceAllowed
>> +configured)
>> +
>> +* Escalation and Recovery during headless:
>> +The Restart faulty is still real-time recovered as before, while 
>> only recovery
>> +has failover/switchover involved will be delayed until amfd up.
> [Praveen] For comp restart recovery failover and switchover are not 
> aplicable, so which type of switchover or failover.
[Minh] all switchover/failover on comp/su are delayed until SC comes back
> If Component or
>> +Su failover happens, the component/Su will marked as failed only. 
>> The repair
>> +action will be initiated when SC comes back but it also depends on 
>> desired
>> +configurations. The Node Failover or Switchover will result in a 
>> node restart.
>> +
>> +* Amfnd detects SC comes back from headless:
>> +NCSMDS_UP is the event that amfnd detects active amfd's presence 
>> after being
>> +headless.
>> +
>> +* Recovery after being headless:
>> +There could be admin ops or recovery actions in progress while 
>> cluster enters
>> +headless. The normal sequence of those actions are uncompleted and 
>> therefore it
>> +will leave assignments and states of AMF entities inappropriate. The 
>> new messages
>> +(state information messages) have been introduced to carry those 
>> assignments and
>> +states from all amfnd(s), which then are sent to amfd. Amfd collects 
>> all these
>> +messages and recover/adjust the assignments and states which are 
>> left over from
>> +headless.
>> +
> [Praveen] In case AMFND deletes the assignments in case like 
> su-failover or node-failover node-switcover then how AMFD will asjust 
> assignments since it will not get the assignments from AMFND.
[Minh] So this case AMFD doesn't have to adjust the assignment since 
there's no assignment. Assignments could be removed due to faulty during 
headless and AMFD will try to repair if auto-repair is configured.
>> +State information messages also contain component and SU restart 
>> counts, these
>> +new counter value will be updated to IMM after headless.
>> +
>> +The operation that amfnd(s) send state information messages and amfd 
>> processes
>> +these messages, is known as a *sync* operation, and has been refered in
>> +implementation.
>> +
>> +Example 1:
>> +Admin si-swap an 2N SI: Cluster goes headless at the time which SU1 
>> has Active
>> +assignment moves to Quiesced. Amfd will receives state information 
>> message
>> +with one Quiesced (SU1) and one Standby (SU2) assignment. Amfd will 
>> send su
>> +si assignment message to assign SU2 to Active, and SU1's to Standby.
>> +
>> +Example 2:
>> +Su failover on 2N SU: During headless cluster, Active SU1 has faulty 
>> that
>> +escalates to a SU failover. SU1's assignment is removed, marked as 
>> failed,
>> +operState as Disabled. Once SC comes back, amfd will send su si 
>> assignment
>> +to assign SU2 (being Standby) to Active. Depends on AutoRepair is 
>> whether
>> +configured, the SU1 will be repaired.
>> +
>> +Example 3:
>> +Si dependency: During headless cluster, both SU1 and SU2 have faulty 
>> that
>> +all assignment of SI1 to those SUs are removed. After receiving 
>> state information
>> +messages, if any SI have SI1 as sponsored SI, these dependent SI(s) 
>> will start
>> +assignment removal.
>> +
>> +* Other services interfaces:
>> +Only Amfd uses Log, Ntf api, and those functions in AMF which 
>> require Log/Ntf
>> +are limited during headless.
>> +The Clm functionalities (mostly track cb) Amfnd uses should work as 
>> before,
>> +only thing Amfnd needs is to reinitialize Clm handle when active SC 
>> comes back
>> +from headless.
> [Praveen] Since CLM handle becomes invalid, AMFND will not get CLM 
> track callbacks for Controllers. How amfnd will send the node_up 
> message to AMFD? Is it the same like based on MDS callbacks and after 
> CLM reinitialization.
> [Minh] It relied on MDS callback as before, but AMFND reuses the CLM 
> *info* before headless to fill node_up message
>> +Imm Admin ops are not supported during headless since no active Amfd.
>> +In general, the AMF functions that requires Amfd involvement is not 
>> supported
>> +during headless. And all those functions will work as normal after 
>> SC comes back
>> +from headless.
>> +
>> +LIMITATIONS
>> +-----------
>> +
>> +* Recovery actions are limited while headless.
>> +Failover/Switchover is delayed until SC recovery.
>> +
>> +* Delayed failover recovery is supported for 2N Service Group
>> +Only for 2N Service Group, delayed failover recovery supports most 
>> of combination
>> +assignment states (Quiesced/Quiescing/Standby/Active) which are left 
>> over
>> +from headless.
>> +EX: A Standby assignment will transition to Active directly if 
>> required,
>> +a Quiesced/Quiescing assignment will be removed if admin entity is 
>> LOCKED,
>> +or transition to Standby, etc...
>> +
>> +For other Service Group types, Standby assignments are first 
>> removed, and
>> +reassigned as appropriate for the SG.
>> +
>> +* Tolerant timer of SI dependency.
>> +After recovery from headless, if unassigned sponsor SI is detected, 
>> all its
>> +dependent SI(s) assignment are removed regardless tolerant duration. 
>> The time
>> +of sponsor SI becoming unassigned is not recorded so that the new 
>> amfd can
>> +figure how much time is left that the dependent SI(s) can tolerate.
>> +
>>
>


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to